GithubHelp home page GithubHelp logo

kronodroid's Introduction

About the Kronodroid Dataset

Android malware dataset designed to study and explore concept drift and cross-device detection issues. Created and maintained by Dr. Alejandro Guerra Manzanares during his Ph.D. studies.

Features:

  • Labeled (i.e., benign/malware samples)
  • 289 dynamic features (i.e., system calls)
  • 200 static features (i.e., permissions, intent filters, metadata)
  • 4 distinct timestamps per data sample
  • Covering most years of Android history - 2008-2020

Emulator data set is ready to download in CSV format (zip files under emulator folder).

  • 28,745 malicious samples (209 malware families).
  • 35,256 benign samples.

Real Device data set is ready to download in CSV format (zip files under real device folder).

  • 41,382 malware samples (240 malware families)
  • 36,755 benign apps.

Th raw data set is available under request (log files, extracted metadata and APKs)

Log files and raw data are available, not via direct download yet. For now I will share them with you if you request it: [email protected]

Some of the APK files cannot be shared due to restrictions from the original sources. However, the list hashes of all the data set samples can be extracted from each specific data set (i.e., "sha256" column in each CSV file).

If you have any issues or requests that I can be of any help with, email me.

Important Information

The data set is released publicly and with no other restriction than when using the dataset please cite the KronoDroid paper: https://www.sciencedirect.com/science/article/pii/S0167404821002236

LaTeX citation to the data set paper:

@article{Kronodroid_Guerra,
title = {KronoDroid: Time-based Hybrid-featured Dataset for Effective Android Malware Detection and Characterization},
journal = {Computers & Security},
volume = {110},
pages = {102399},
year = {2021},
issn = {0167-4048},
doi = {https://doi.org/10.1016/j.cose.2021.102399},
url = {https://www.sciencedirect.com/science/article/pii/S0167404821002236},
author = {Alejandro Guerra-Manzanares and Hayretdin Bahsi and Sven Nõmm}}

More information and detailed explanation about the dataset and the additional scripts will be posted soon.

Have you used the data set in your work?

If you have used KronoDroid in your work, send us the reference and we will include it in the PUBLICATIONS LIST.

Updates

I am working on improving the data set, adding more recent files, more data and API calls information. If you are interested, contact me. I can share original log files of the system calls and other extracted information with the APK. I can also share part of the APK files if you need them.

Do not hesitate to contact me if you need raw files of the dataset (large size): [email protected]

kronodroid's People

Contributors

aleguma avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.