GithubHelp home page GithubHelp logo

hgrppph / malwaredetection_ Goto Github PK

View Code? Open in Web Editor NEW

This project forked from aceveryday/malwaredetection_

0.0 1.0 0.0 5.45 MB

N -gram based malicious code detection tool

Python 97.48% Batchfile 2.52%

malwaredetection_'s Introduction

MalwareDetection_

A tool to disassemble malicious code.(Based on n-gram model)

说明:一个提取恶意apk的n-gram特征的python工具
(使用时请删除apk_sample与smali文件夹中的DeleteWhenUse文件,之所以上传这些空txt文件是因为github不支持上传空文件夹)
(未反编译的恶意代码和良性代码存放于apk_sample文件夹,反编译后的apk自动存放于n-gramTool\smali\virus或kind,对smali文件的汇总自动存放于n-gramTool\smali\summary\virus或kind)

执行命令顺序
python to_standard.py
python decompiling.py
python make_csv.py
python smali2feature.py 2
python smali2feature.py 3
python smali2feature.py 4
python smali2feature.py 5
python logisticsRegression.py 2

各模块、包的作用
1.decompiling.py:(1)apk重命名(2)调用cmd命令,利用apktool反编译apk
2.make_csv.py: (1)调用工具包中的smali解析模块,汇总smali (2)将各apk的smali字节码文件汇总到data.csv中
3.smali2feature.py: 根据n-gram滑动窗口生成n-gram文件,存放各apk的特征。(n-gram的n作为参数输入在后面)
4.logisticsRegression.py: 使用逻辑回归对训练集数据进行训练,得出准确率
5.DependPackage包: 包括字节码字典(共7个指令集)、smali文件解析(递归文件树汇总smali的OpCode)、特征字典等模块,用于作为上述模块的依赖。

样本来源
1.良性样本:来源于安卓市场
2.恶意样本:见于我fork的'malware~'项目

对data.csv文件的说明
每行的形式为:ApkName,feature(此样本对应的OpCode,用|分割每个模块),isVirus(是为 1,不是为 0)
在Excel中显示的feature(csv中拼写错了,没改..)可能会超过单元格,但实际上还是和文件名是同一行的,无需担心
https://img-blog.csdnimg.cn/20190829130038441.

对n_gram.csv文件的说明
每行的前 7^n 列为操作码对应的个数,最后一列为代码类别(0 / 1,表示是否为恶意代码)
https://img-blog.csdnimg.cn/20190829130240659.

malwaredetection_'s People

Contributors

aceveryday avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.