GithubHelp home page GithubHelp logo

fred-xue / skab Goto Github PK

View Code? Open in Web Editor NEW

This project forked from waico/skab

0.0 0.0 0.0 9.21 MB

SKAB - Skoltech Anomaly Benchmark. This repository contains data for evaluating Anomaly Detection algorithms.

License: GNU General Public License v3.0

Jupyter Notebook 99.41% Python 0.59%

skab's Introduction

skab

❗️❗️❗️The current version of SKAB (v0.9) contains 34 datasets with collective anomalies. But the upcoming update to v1.0 (probably up to the summer of 2021) will contain 300+ additional files with point and collective anomalies. It will make SKAB one of the largest changepoint-containing benchmarks, especially in the technical field.

About SKAB Maintenance DOI License: GPL v3.0

We propose the Skoltech Anomaly Benchmark (SKAB) designed for evaluating the anomaly detection algorithms. SKAB allows working with two main problems (there are two markups for anomalies):

  • Outlier detection (anomalies considered and marked up as single-point anomalies)
  • Changepoint detection (anomalies considered and marked up as collective anomalies)

SKAB consists of the following artifacts:

  • Datasets.
  • Leaderboard (scoreboard).
  • Python modules for algorithms’ evaluation.
  • Notebooks: python notebooks with anomaly detection algorithms.

The IIot testbed system is located in the Skolkovo Institute of Science and Technology (Skoltech). All the details regarding the testbed and the experimenting process are presented in the following artifacts:

Datasets

The SKAB v0.9 corpus contains 35 individual data files in .csv format. Each file represents a single experiment and contains a single anomaly. The dataset represents a multivariate time series collected from the sensors installed on the testbed. The data folder contains datasets from the benchmark. The structure of the data folder is presented in structure file. Columns in each data file are following:

  • datetime - Represents dates and times of the moment when the value is written to the database (YYYY-MM-DD hh:mm:ss)
  • Accelerometer1RMS - Shows a vibration acceleration (Amount of g units)
  • Accelerometer2RMS - Shows a vibration acceleration (Amount of g units)
  • Current - Shows the amperage on the electric motor (Ampere)
  • Pressure - Represents the pressure in the loop after the water pump (Bar)
  • Temperature - Shows the temperature of the engine body (The degree Celsius)
  • Thermocouple - Represents the temperature of the fluid in the circulation loop (The degree Celsius)
  • Voltage - Shows the voltage on the electric motor (Volt)
  • RateRMS - Represents the circulation flow rate of the fluid inside the loop (Liter per minute)
  • anomaly - Shows if the point is anomalous (0 or 1)
  • changepoint - Shows if the point is a changepoint for collective anomalies (0 or 1)

Leaderboard (Scoreboard)

Here we propose the leaderboard for SKAB v0.9 both for outlier and changepoint detection problems. You can also present and evaluate your algorithm using SKAB on kaggle. The results in the tables are calculated in the python notebooks from the baselines folder.

Outlier detection problem

Sorted by FAR, both for FAR and MAR less is better
<<<<<<< HEAD

Algorithm FAR, % MAR, % F1
Perfect detector 0 0 100
Null detector 0 100 0
T-squared+Q (PCA) 5.09 86.1 х
Isolation forest 6.86 72.09 х
Autoencoder 7.56 66.57 х
T-squared 12.14 52.56 х
LSTM 14.4 40.44 х
=======
Algorithm FAR, % MAR, %
--- --- ---
Perfect detector 0 0
Null detector 0 100
T-squared+Q (PCA) 5.09 86.1
Isolation forest 6.86 72.09
Autoencoder 7.56 66.57
T-squared 12.14 52.56
LSTM 14.4 40.44
MSCRED 25.17 18.03

5d281aa8d57d4ca89895acc0266780aae63e568d

Changepoint detection problem

Sorted by NAB (standart), for all metrics bigger is better

Algorithm NAB (standart) NAB (lowFP) NAB (LowFN)
Perfect detector 100 100 100
Isolation forest 37.53 17.09 45.02
LSTM 25.82 9.06 31.83
MSCRED 18.67 14.92 20.14
T-squared 17.87 3.44 23.2
ArimaFD 16.06 14.03 17.12
Autoencoder 15.59 0.78 20.91
T-squared+Q (PCA) 5.83 4.8 6.1
Null detector 0 0 0

Notebooks

The Notebooks folder contains python notebooks with the code for the proposed leaderboard results reproducing.

We have calculated the results for five quite common anomaly detection algorithms:

  • Hotelling's T-squared statistics;
  • Hotelling's T-squared statistics + Q statistics based on PCA;
  • Isolation forest;
  • LSTM-based NN;
  • Feed-Forward Autoencoder.

Additionaly to the repository were added the results of the following algorithms:

  • ArimaFD;
  • MSCRED.

Citation

Please cite our project in your publications if it helps your research.

Iurii D. Katser and Vyacheslav O. Kozitsin, “Skoltech Anomaly Benchmark (SKAB).” Kaggle, 2020, doi: 10.34740/KAGGLE/DSV/1693952.

Notable mentions

SKAB is acknowledged by some ML resources.

List of links

skab's People

Contributors

waico avatar ykatser avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.