GithubHelp home page GithubHelp logo

dreizehnutters / pcapae Goto Github PK

View Code? Open in Web Editor NEW
17.0 2.0 2.0 28.18 MB

convGRU based autoencoder for unsupervised & spatial-temporal anomaly detection in computer network (PCAP) traffic.

Home Page: https://arxiv.org/abs/2205.08953

License: MIT License

Python 13.82% Shell 1.71% Jupyter Notebook 76.84% TeX 7.62%
anomaly-detection autoencoder feature-learning intrusion-detection machine-learning pcap representation-learning network

pcapae's Introduction

  ______  ______  ______  ______     ______  ______ 
 /\  __ \/\  ___\/\  __ \/\  __ \   /\  __ \/\  ___\ 
 \ \  __/\ \ \___\ \  __ \ \  __/   \ \  __ \ \  _\_
  \ \_\   \ \_____\ \_\ \_\ \_\      \ \_\ \_\ \_____\ 
   \/_/    \/_____/\/_/\/_/\/_/       \/_/\/_/\/_____/

Representation Learning for Content-Sensitive Anomaly Detection in Industrial Networks

Using a convGRU-based autoencoder, this thesis proposes a framework to learn spatial-temporal aspects of raw network traffic in an unsupervised and protocol-agnostic manner. The learned representations are used to measure the effect on the results of a subsequent anomaly detection and are compared to the application without the extracted features. The evaluation showed, that the anomaly detection could not effectively be enhanced when applied on compressed traffic fragments for the context of network intrusion detection. Yet, the trained autoencoder successfully generates a compressed representation (code) of the network traffic, which hold spatial and temporal information. Based on the models residual loss, the autoencoder is also capable of detecting anomalies by itself. Lastly, an approach for a kind of model interpretability (LRP) was investigated in order to identify relevant areas within the raw input data, which is used to enrich alerts generated by an anomaly detection method.

Master thesis submitted on 13.11.2021

AE



Milestones

  • tex
    • finished paper
  • talk
    • initial presentation
    • TUC presentation
    • thesis defense
  • src
    • pcap -> dataset
    • dataloader
    • pytorch convGRU AE
    • anomaly detection
    • evaluation
    • visualization
    • demo notebooks
    • experiment helper script
    • LRP
  • experiments
    • baseline
    • SWAT
    • VOERDE

File Structure

thesis
	├── src
	│   ├── lib                          pcapAE Framework
	│   │   ├── CLI.py...................handle argument passing
	│   │   ├── H5Dataset.py.............data loading
	│   │   ├── ConvRNN.py...............convolutional recurrent cell implementation
	│   │   ├── decoder.py...............decoder logic
	│   │   ├── encoder.py...............encoder logic
	│   │   ├── model.py.................network orchestration
	│   │   ├── pcapAE.py................framework interface
	│   │   ├── earlystopping.py.........training heuristic
	│   │   └── utils.py.................helper functions
	│   └── requirements.txt
	│   │
	│   ├── ad                           scikit-learn AD Framework
	│   │	├── AD model blueprints
	│   │	├── AD.py.. .................AD wrappers
	│   │	└── utils.py.................helper functions
	│   │
	│   ├── main.py......................framework interaction
	│   ├── pcap2ds.py...................convert packet captures to datasets
	│   └── test_install.py..............rudimentary installation test
	│
	├── exp                              Experiments
	│   ├── data.ods.....................results
	│   ├── dim_redu.py..................experiment script
	│   └── exp_wrapper.sh...............execute experiments
	│
	├── test             	             Jupyter Notebooks
	│   ├── demo.ipynb...................Jupyter Notebook demo
	│   ├── vius_tool.ipynb..............PCAP analysis
	│   └── LRP.ipynb....................LRP Heatmap generation
	│
	├── tex                              Writing
	│   ├── slides
	│   │	├── init.pdf..................init presentation
	│   │	├── mid.pdf...................mid presentation
	│   │	└── end.pdf...................defense presentation
	│   ├── main.pdf......................thesis paper
	│   └── papers.bib....................sources
	│
	├── LICENSE
	└── README.md


Install

  • tshark>=2.6.20

apt-get install tshark capinfos
  • 3.9>python>=3.7.3

apt-get install python3 pip3
pip install -r requirements.txt
  • torch>=1.7.1

    GPU=>10.2

     pip install torch torchvision

    CPU ONLY

     pip install torch==1.7.1+cpu torchvision==0.8.2+cpu torchaudio==0.7.2 -f https://download.pytorch.org/whl/torch_stable.html
  • Verify Installation

cd src/test/
./test_install.sh

Usage

./test_pcapAE.sh [--CUDA]

[+] generate h5 data set from a set of PCAPs

Arguments

short long default help
-h --help show this help message and exit
-p --pcap None path to pcap or list of pcap pathes to process
-o --out None path to output dir
-m --modus None gradient decent strategy
-g --ground None path to optional evaluation packet level ground truth .csv
-n --name None data set optional name
-t --threads 1 number of threads
--chunk 1024 square number fragment size
--oneD process fragemnts in one dimension
--force force to delete output dir path

Usage

python3 pcap2ds.py -p <some.pcap> -o <out_dir> --chunk 1024 --modus byte [-g <ground_truth.csv>]

[+] pcapAE API wrapper

train an autoencoder with a given h5 data set

Arguments

short long default help options
-h --help show this help message and exit
-t --train path to dataset to learn
-v --vali path to dataset to validate
-f --fit path to data set to fit AD
-p --predict path to data to make a predict on
-m --model path to model to retrain or evaluate
-b --batch_size 128 number of samples per pass [2,32,512,1024]
-lr --learn_rate 0.001 starting learning rate between [1,0)
-fi --finput 1 number input frames [1,3,5]
-o --optim adamW gradient decent strategy [adamW, adam, sgd]
-c --clipping 10.0 gradient clip value [0,10]
--fraction 1 fraction of data to process (0, 1]
-w --workers 0 number of data loader worker threads [0, 8]
--loss MSE loss criterion [MSE]
--scheduler cycle learn rate scheduler [step ; cycle ; plateau]
--cell GRU network cell type [GRU ; LSTM]
--epochs 144 number of epochs
--seed 1994 seed to fixing randomness
--noTensorboard do not start tensorboard
--cuda enable GPU support
--verbose verbose output
--cache cache dataset to GPU
--retrain retrain given model
--name experiment name prefix
--AD use AD framework
--grid_search use AD gridsearch

Usage

# pcapAE training
python3 main.py --train <TRAIN_SET_PATH> --vali <VALI_SET_PATH> [--cuda]

# pcapAE data compression (pcap -> _codes_)
python3 main.py --model <PCAPAE_MODEL> --fit <FIT_SET_PATH> --predict <PREDICT_SET_PATH> [--cuda]

# shallow ML anomaly detection training
python3 main.py --AD --model *.yaml --fit <REDU_FIT_SET_PATH> [--predict <REDU_PREDICT_SET_PATH>] [--grid_search]

# test training AD on new data
python3 main.py --model <AD_MODLE_PATH> --predict <REDU_SET_PATH>

# naive baseline
python3 main.py --baseline pcapAE --model <PCAPAE_MODEL> --predict <PREDICT_SET_PATH>

# raw baseline
python3 main.py --baseline noDL --AD --model ../test/blueprints/base_if.yaml --fit <FIT_SET_PATH> --vali <VALI_SET_PATH> --predict <PREDICT_SET_PATH> 

Hyperparameters

  • Data
    • dataset = [VOERDE, SWaT]
    • preprocessing = [byte, packet, flow]
    • fragment size = [16**2, 32**2]
    • sequence length = [1, 3 ,5]
  • Representation Learning

Tested Hardware

  • Debian 10 | Intel i5-6200U ~200 CUDA Cores
  • Ubuntu 18.04 | AMD EPYC 7552 ~1500 CUDA Cores
  • Cent OS 7.9 | GTX 1080 2560 CUDA Cores
  • Ubuntu 20.04 | RTX 3090 10496 CUDA Cores

pcapae's People

Contributors

dreizehnutters avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

pcapae's Issues

Usage

I wonder if it is possible to use in the industrial process monitoring, and do you have the pytorch version code

The function 'Redefine_nn' doesn't make the network follow LRP rule

Snipaste_2024-05-28_13-19-14
When I debug the program, I find that whether I comment out the below code or not
Snipaste_2024-05-28_13-27-40
I got the same relevance in the function 'relprop'
Snipaste_2024-05-28_13-30-53
But this result doesn't meet the expectation, which we will get different relevance after redefining the network in terms of LRP rule. So I think the function 'redefine' doesn't work following expection.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.