GithubHelp home page GithubHelp logo

android-malware-analysis's Introduction

Android Malware detection through Network Traffic Analysis

Android is a Linux based operating system it is designed primarily for touch screen mobile devices such as smartphones and tablet computers. The operating system have developed a lot in last 15 years starting from black and white phones to recent smartphones or mini computers. One of the most widely used mobile OS these days is android. The android is software that was founded in Palo Alto of California in 2003. Android has been the best-selling OS worldwide on smartphones since 2011 and on tablets since 2013. As of May 2017, it has over 2 billion monthly active users, the largest installed base of any operating system, and as of December 2018, the Google Play store features over 2.6 million apps.

Growing popularity of Android mobile operating system has not only attracted user community but also the malware developers towards this platform. Large number of malicious apps have been detected in the past years in Google Play Store and third party app markets.

Getting Started

Many detection techniques have been proposed in the for Android malware detection :

- Based on Permissions  
- Based on System Calls  
- Signature Based.

We used a detection technique based on network traffic analysis. We compare network traffic of malwares with that of benign apps and find the features which distinguish both types of traffic. Based upon these features we build a decision tree classifier to detect benign and malicious apps from the testing dataset. Network traffic analysis main objective is to :

-: Analyse the traffic flowing across the network. 

-: Analysis the traffic to detect the unwanted traffic being used by a particular application.

-: Network traffic analysis to catch the Behavioral Patterns for Spotting Suspicious Activities

-: Analysis of traffic to minimise damages caused by the malicious applications (or apps) 

There are many reasons why network traffic analysis is good for detection of android malware some of them are :

๐Ÿ”ธ - Network Traffic Analysis gives you in-depth application monitoring and bandwidth utilisation capabilities.

๐Ÿ”ธ - With network traffic analysis for netflow, you can produce reports that show use of unwanted traffic.

๐Ÿ”ธ - Network Traffic Analysis gives you a ready tool for a quick deep dive into the underlying causes of network slowdowns.

๐Ÿ”ธ - Pattern can be noted that had persisted for a couple of hours. Network Traffic Analysis shows how these patterns are affecting the system.

The proposed model classifies a given apk as malware or benign based on a dynamic analysis of the network traffic generated by it. There are three phases to this operation and are explained as follows:

Step 1 - Data Collection : The first phase focuses on gathering benign and malware apps which are then installed into an emulator and is launched manually. Then we use tcpdump to dump the raw traffic data into a pcap file.

Step 2 - Feature Extraction and Labelling : The second phase involves feature extraction where we open the pcap file in wireshark and calculate the feature values used for our dataset and create a csv file consisting of the required features and a label indicating which apps are malware.

Step 3 - Training and Testing : In the last phase we feed the csv file into a machine learning model where a part of the data set is used to train the model and the rest is for testing. ๏ฟผ

WORKING

We start with capturing network traffic of malicious as well as benign (normal) apps. We used -

Android Studio Emulator QEMU

which acts as a dummy Android Phone where the benign as well as malware applications can be installed. We are using

QVGA 2.7โ€™โ€™ android device with Android Jelly Bean 4.1 version.

alt text

From the dataset of Android malware samples we take one by one a malicious sample,install it on the phone through adb command

( adb install Application-Name )

alt text

and run the application on the android emulator then use the tcpdump command

( tcpdump -w Application-Name )

alt text

to capture the application traffic. These all commands can run through the command line ( command prompt ).

The data captured is now mined. Data mining is the process of extraction of relevant information from a collection of data. Mining of a particular information related to a concept is done on the basis of the feature of the data.

Next step is the feature selection step.

alt text

Feature selection is critical to building a good model for several reasons.

  • One is that feature selection implies some degree of cardinality reduction, to impose a cutoff on the number of attributes that can be considered when building a model.

  • Data almost always contains more information than is needed to build the model, or the wrong kind of information.

  • Not only does feature selection improve the quality of the model, it also makes the process of modeling more efficient.

We selected the following network traffic features which we compared with both types of traffic. These traffic features are selected based upon the information gain, the higher the information gain the more information it can give us. So that attribute is selected as a feature.

alt text

After capturing traffic from both malicious and normal apps we start analysis of traffic in terms of network traffic features. Through Wireshark -

alt text

Next we create a .csv file (dataset) including the data for each and every feature for a particular application and adding a tag (label) mentioning it is a malware or not.

alt text

Next step is to work on the machine learning algorithm. In our project we are using decision tree J48. The general motive of using Decision Tree is to create a training model which can be used to predict class or value of target variables by learning decision rules inferred from prior data (training data). The training set contains 60% of the data.

alt text

After training the data, the next step is to input our test set which is 40% of the total applications into the algorithm decision tree J48 which is an algorithm provided by the WEKA tool to predict the output from the test set and check the accuracy of the predicted set provided as an input against the training set.

alt text alt text

Decision tree will also give a binary tree representation for a clearer and a diagrammatic based understanding of the classifications (predictions) it has done.

According to the algorithm of the decision tree, the attribute (feature) which has the highest information gain becomes the root. Decision tree evaluates and finds the data for splitting.

alt text!

alt text

On each iteration of the algorithm, it iterates through every unused attribute of the test data set (T) and calculates the entropy (or information gain) of that attribute. It then selects the attribute which has the smallest entropy (or largest information gain) value. The set (T) is then split or partitioned by the selected attribute to produce subsets of the data. For example, a node can be split into child nodes based upon the subsets of the malware (label in our dataset). The algorithm continues to recur on each subset.

alt text

android-malware-analysis's People

Contributors

devu-62442 avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar

Forkers

ritikjain833

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.