"AndroAnalyzer Dataloader" is a compilation of four publicly available datasets that we used in our paper, including Bataci's dataset, AndroVul dataset, Drebin-215 dataset, Malgenome-215 dataset, and a feature dataset collected by our proposed framework (which includes CSV files and sha256 hashes).
pip install git+https://github.com/NgocTruongNguyen/androanalyzer_dataloader.git
- Get the list of dataset names:
from androanalyzer import dataset
dataset.get_list_dataset()
# ['AndroVul', 'Bataci', 'Drebin-215', 'Malgenome-215', 'AndroAnalyzer']
- Load the "AndroVul" dataset as a dataframe:
dataset.load_data("AndroVul")
- Load the AndroAnalyzer dataset, divided into train and test sets as mentioned in the paper:
dataset.load_data("AndroAnalyzer", train_set=True)
dataset.load_data("AndroAnalyzer", train_set=False)
- Get the list of class names for the collection of APK files we gathered:
dataset.get_classname()
- Get the list of "Banking" files in the training set:
dataset.get_sha256("Banking", train_set=True)