GithubHelp home page GithubHelp logo

pkucp / midas Goto Github PK

View Code? Open in Web Editor NEW

This project forked from stream-ad/midas

0.0 1.0 0.0 30.83 MB

Anomaly Detection on Dynamic (time-evolving) Graphs in Real-time and Streaming manner. Detecting intrusions (DoS and DDoS attacks), frauds, fake rating anomalies.

License: Apache License 2.0

C++ 82.85% Python 13.34% CMake 3.82%

midas's Introduction

MIDAS

C++ implementation of

The old implementation is in another branch OldImplementation, it should be considered as being archived and will hardly receive feature updates.

Table of Contents

Features

  • Finds Anomalies in Dynamic/Time-Evolving Graph: (Intrusion Detection, Fake Ratings, Financial Fraud)
  • Detects Microcluster Anomalies (suddenly arriving groups of suspiciously similar edges e.g. DoS attack)
  • Theoretical Guarantees on False Positive Probability
  • Constant Memory (independent of graph size)
  • Constant Update Time (real-time anomaly detection to minimize harm)
  • Up to 55% more accurate and 929 times faster than the state of the art approaches
  • Experiments are performed using the following datasets:

Demo

If you use Windows:

  1. Open a Visual Studio developer command prompt, we want their toolchain
  2. cd to the project root MIDAS/
  3. cmake -DCMAKE_BUILD_TYPE=Release -GNinja -S . -B build/release
  4. cmake --build build/release --target Demo
  5. cd to MIDAS/build/release/
  6. .\Demo.exe

If you use Linux/macOS:

  1. Open a terminal
  2. cd to the project root MIDAS/
  3. cmake -DCMAKE_BUILD_TYPE=Release -S . -B build/release
  4. cmake --build build/release --target Demo
  5. cd to MIDAS/build/release/
  6. ./Demo

The demo runs on MIDAS/data/DARPA/darpa_processed.csv, which has 4.5M records, with the filtering core (MIDAS-F).

The scores will be exported to MIDAS/temp/Score.txt, higher means more anomalous.

All file paths are absolute and "hardcoded" by CMake, but it's suggested NOT to run by double clicking on the executable file.

Customization

Switch Cores

Cores are instantiated at MIDAS/example/Demo.cpp:67-69, uncomment the chosen one.

Custom Dataset + Demo.cpp

You need to prepare three files:

  • Meta file
    • Only includes an integer N, the number of records in the dataset
    • Use its path for pathMeta
    • E.g. MIDAS/data/DARPA/darpa_shape.txt
  • Data file
    • A header-less csv format file of shape [N,3]
    • Columns are sources, destinations, timestamps
    • Use its path for pathData
    • E.g. MIDAS/data/DARPA/darpa_processed.csv
  • Label file
    • A header-less csv format file of shape [N,1]
    • The corresponding label for data records
      • 0 means normal record
      • 1 means anomalous record
    • Use its path for pathGroundTruth
    • E.g. MIDAS/data/DARPA/darpa_ground_truth.csv

Custom Dataset + Custom Runner

  1. Include the header MIDAS/src/NormalCore.hpp, MIDAS/src/RelationalCore.hpp or MIDAS/src/FilteringCore.hpp
  2. Instantiate cores with required parameters
  3. Call operator() on individual data records, it returns the anomaly score for the input record

Other Files

example/

Experiment.cpp

The code we used for experiments.
It will try to use Intel TBB or OpenMP for parallelization.
You should comment all but only one runner function call in the main() as most results are exported to MIDAS/temp/Experiiment.csv together with many intermediate files.

Reproducible.cpp

Similar to Demo.cpp, but with all random parameters hardcoded and always produce the same result.
It's for other developers and us to test if the implementation in other languages can produce acceptable results.

util/

DeleteTempFile.py, EvaluateScore.py and ReproduceROC.py will show their usage and a short description when executed without any argument.

PreprocessData.py

The code to process the raw dataset into an easy-to-read format. Datasets are always assumed to be in a folder in MIDAS/data/.
It can process the following dataset(s)

  • DARPA/darpa_original.csv -> DARPA/darpa_processed.csv, DARPA/darpa_ground_truth.csv, DARPA/darpa_shape.txt

In Other Languages

  1. Python: Rui Liu's MIDAS.Python, Ritesh Kumar's pyMIDAS
  2. Golang: Steve Tan's midas
  3. Ruby: Andrew Kane's midas
  4. Rust: Scott Steele's midas_rs
  5. R: Tobias Heidler's MIDASwrappeR
  6. Java: Joshua Tokle's MIDAS-Java
  7. Julia: Ashrya Agrawal's MIDAS.jl

Online Coverage

  1. ACM TechNews
  2. AIhub
  3. Hacker News
  4. KDnuggets
  5. Microsoft
  6. Towards Data Science

Citation

If you use this code for your research, please consider citing our arXiv preprint

@misc{bhatia2020realtime,
    title={Real-Time Streaming Anomaly Detection in Dynamic Graphs},
    author={Siddharth Bhatia and Rui Liu and Bryan Hooi and Minji Yoon and Kijung Shin and Christos Faloutsos},
    year={2020},
    eprint={2009.08452},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

or our AAAI paper

@inproceedings{bhatia2020midas,
    title="MIDAS: Microcluster-Based Detector of Anomalies in Edge Streams",
    author="Siddharth {Bhatia} and Bryan {Hooi} and Minji {Yoon} and Kijung {Shin} and Christos {Faloutsos}",
    booktitle="AAAI 2020 : The Thirty-Fourth AAAI Conference on Artificial Intelligence",
    year="2020"
}

midas's People

Contributors

liurui39660 avatar bhatiasiddharth avatar ritesh99rakesh avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.