GithubHelp home page GithubHelp logo

anshu0612 / api_malware_classification Goto Github PK

View Code? Open in Web Editor NEW
0.0 3.0 0.0 34.99 MB

Designed ensembled Seq2Seq models using Keras to detect malware in a sequence of API calls, and achieved a top position on Kaggle

Python 100.00%

api_malware_classification's Introduction

CS5242 Final Project on Dynamic Malware Analysis

Kaggle Competition

The task of this project is to detect the malware based on features extracted from the API calls.

The solution achieved an AUC score of 99.18% on Kaggle's private leadership board

How to train the model and get the test predictions

  1. Download the dataset from Kaggle and keep the extracted data in the project root directory
  2. Do pip install
  3. Run the file kfold_ensemble.py by using the command python kfold_ensemble.py
  4. After training and prediction, output is generated in the file output.csv

Downloading dataset from Kaggle

The easiest way to interact with Kaggle’s dataset is via Kaggle Command-line tool (CLI). Below are the steps to setup Kaggle CLI and use it to download the dataset

The Setup

  1. Install the Kaggle CLI To get started to Kaggle CLI we will need Python, open terminal and type command pip install kaggle
  2. API Credentials Once we have Kaggle installed, type kaggle to check it is installed and we will get an output similar to this

IMAGE

In the above line, we will see the path (highlighted) of where to put your kaggle.json file. To get kaggle.json file go to: https://www.kaggle.com//account

In the API section, click Create New API Token. And copy it the path mentioned in the terminal output.

IMAGE

Type kaggle once again to check. IMAGE

In some case, even after copying the credentials will not work even though the file is placed in the correct location due incorrect permission. Just type the exact command and it will start working

Downloading Dataset via CLI

We can open kaggle help via kaggle -h For getting info on competitions we can type kaggle competitions download -h whatever the Kaggle CLI command is, add -h to get help.

Download Entire Dataset

To download the dataset, go to Data subtab on the competition page. In API section we will find the exact command that we can copy to the terminal to download the entire dataset.

IMAGE

The syntax is like kaggle competitions download <competition name> One the dataset is downloaded extract the dataset and use it.

api_malware_classification's People

Contributors

anshu0612 avatar

Watchers

James Cloos avatar Mohd Sheeraz avatar  avatar

api_malware_classification's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.