GithubHelp home page GithubHelp logo

bigbrobro / threatreportextractor Goto Github PK

View Code? Open in Web Editor NEW

This project forked from jackaduma/threatreportextractor

0.0 0.0 0.0 22.13 MB

Extracting Attack Behavior from Threat Reports

License: GNU General Public License v3.0

JavaScript 14.08% Python 85.92%

threatreportextractor's Introduction

ThreatReportExtractor

standard-readme compliant Donate

中文说明 | English


This code is an implementation for paper: EXTRACTOR: Extracting Attack Behavior from Threat Reports, a nice work on Threat Report Extracting in Cyber Threat Intelligence (CTI) .

  • Environment
    • NLP submodules
    • NLP pretrained models
    • Dependent libraries
  • Usage
    • Example
  • Demo
  • Reference

EXTRACTOR: Extracting Attack Behavior from Threat Reports

The knowledge on attacks contained in Cyber Threat Intelligence (CTI) reports is very important to effectively identify and quickly respond to cyber threats. However, this knowledge is often embedded in large amounts of text, and therefore difficult to use effectively. To address this challenge, we propose a novel approach and tool called EXTRACTOR that allows precise automatic extraction of concise attack behaviors from CTI reports. EXTRACTOR makes no strong assumptions about the text and is capable of extracting attack behaviors as provenance graphs from unstructured text. We evaluate EXTRACTOR using real-world incident reports from various sources as well as reports of DARPA adversarial engagements that involve several attack campaigns on various OS platforms of Windows, Linux, and FreeBSD. Our evaluation results show that EXTRACTOR can extract concise provenance graphs from CTI reports and show that these graphs can successfully be used by cyber-analytics tools in threat-hunting.


Environment

this code supports python3; not support python2

spacy

download model for spacy

python -m spacy download en_core_web_lg 

nltk

download nltk when setting param crf is false

import nltk
nltk.download('averaged_perceptron_tagger')

submodules

cd $PROJECT_HOME
git submodule init
git submodule update

allennlp

download pretrain model for allennlp

wget -c -t 0 https://s3-us-west-2.amazonaws.com/allennlp/models/srl-model-2018.05.25.tar.gz
mv srl-model-2018.05.25.tar.gz srl-model.tar.gz  # in current dir

graphviz

installation

Linux:

Ubuntu: sudo apt install graphviz
Fedora: sudo yum install graphviz
Debian: sudo apt install graphviz
Redhat/Centos: sudo yum install graphviz # Stable and development rpms for Redhat Enterprise, or CentOS systems* available but are out of date.

Mac:

sudo port install graphviz
brew install graphviz

graphviz generate image file

dot xxx.dot -T png -o xxx.png

Usage

Run EXTRACTOR with

python3 main.py [-h] [--asterisk ASTERISK] [--crf CRF] [--rmdup RMDUP] [--elip ELIP] [--gname GNAME] [--input_file INPUT_FILE]

Depending on the usage, each argument helps to provide a different representation of the attack behavior. [--asterisk true] creates abstraction and can be used to replace anything that is not perceived as IOC/system entity into a wild-card. This representation can be used to be searched within the audit-logs.

[--crf true/false] allows activating or deactivating of the co-referencing module.

[--rmdup true/false] enables removal of duplicate nodes-edge.

[--elip true/false] is to choose whether to replace ellipsis subjects using the surrounding subject or not.

[--input_file path/filename.txt] is to pass the text file to the application.

[--gname graph_name] is to specify the name output graph (two files will be created, e.g., graph.pdf and graph.dot).

Example

python3 main.py --asterisk true --crf true --rmdup true --elip true --input_file input.txt --gname mygraph`
python main.py --asterisk false --crf false --rmdup false --input_file input.txt 
python main.py --asterisk false --crf true --rmdup false --input_file input.txt 
python main.py --asterisk true --crf true --rmdup true --elip true --input_file input.txt --gname mygraph 
python main.py --asterisk true --crf false --rmdup true --elip true --input_file input.txt --gname mygraph 

Reference

  1. EXTRACTOR: Extracting Attack Behavior from Threat Reports. Paper
  2. EXTRACTOR. Code
  3. Passive/Active sentence Transformer. Code

Donation

If this project help you reduce time to develop, you can give me a cup of coffee :)

AliPay(支付宝)

ali_pay

WechatPay(微信)

wechat_pay

paypal


License

GPL-3.0 © Kun

threatreportextractor's People

Contributors

jackaduma avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.