GithubHelp home page GithubHelp logo

tbohne / oscillogram_classification Goto Github PK

View Code? Open in Web Editor NEW
0.0 3.0 1.0 12.57 MB

ANN-based anomaly detection for vehicle components using oscilloscope recordings.

License: MIT License

Python 80.36% Jupyter Notebook 19.64%
anomaly-detection deep-learning neural-network time-series classification xai

oscillogram_classification's Introduction

Oscillogram Classification

unstable License: MIT

Neural network based anomaly detection for vehicle components using oscilloscope recordings.

Example of the time series data to be considered (voltage over time - $z$-normalized):

The task comes down to binary univariate time series classification.

FCN Architecture

Note: See ResNet architecture in img/ResNet.png

Dependencies

  • for Python requirements, cf. requirements.txt
  • Apache Jena Fuseki: SPARQL server hosting / maintaining the knowledge graph

Installation

$ git clone https://github.com/tbohne/oscillogram_classification.git
$ cd oscillogram_classification/
$ pip install .

WandB Setup

$ touch config/api_key.py  # enter: wandb_api_key = "YOUR_KEY"

Config

Hyperparameter configuration in config/run_config.py, e.g.:

hyperparameter_config = {
    "batch_size": 4,
    "learning_rate": 0.001,
    "optimizer": "keras.optimizers.Adam",
    "epochs": 100,
    "model": "FCN",
    "loss_function": "binary_crossentropy",
    "accuracy_metric": "binary_accuracy",
    "trained_model_path": "best_model.h5",
    "save_best_only": True,
    "monitor": "val_loss",
    "ReduceLROnPlateau_factor": 0.5,
    "ReduceLROnPlateau_patience": 20,
    "ReduceLROnPlateau_min_lr": 0.0001,
    "EarlyStopping_patience": 50,
    "validation_split": 0.2
}

WandB sweep config in config/sweep_config.py, e.g.:

sweep_config = {
    "batch_size": {"values": [4, 16, 32]},
    "learning_rate": {"values": [0.01, 0.0001]},
    "optimizer": {"value": "keras.optimizers.Adam"},
    "epochs": {"values": [10, 30, 50, 100]},
    "model": {"values": ["FCN", "ResNet"]}
}

Select the model based on the training data

Currently supported models: FCN, ResNet, RandomForest, MLP, DecisionTree

  • If training on feature vectors (non-Euclidean data), e.g., generated by tsfresh:
    • MLP, RandomForest
  • If training on (raw) time series (Euclidean data):
    • FCN, ResNet

Usage

Preprocessing

$ python oscillogram_classification/preprocess.py --norm {none | z_norm | min_max_norm | dec_norm | log_norm} [--feature_extraction] [--feature_list] --path /DATA --type {training | validation | test}

Note: In the event of feature_extraction, in addition to the actual generated records, csv files (e.g. training_complete_features.csv) are generated, which contain the list of the features considered in each case.

Manual Feature Selection

When training the model using feature vectors, it is critical that the test, validation, and finally the application data contain the same set of features as those used for training. This can be achieved by manual feature selection, which is shown in the following example:

The training datasets were created with the --feature_extraction option, resulting in the following files:

training_complete_feature_vectors.npz
training_filtered_feature_vectors.npz
training_complete_features.csv
training_filtered_features.csv

Now the model is to be trained using the filtered features. The validation dataset should correspond to this feature selection and thus be generated as follows:

$ python oscillogram_classification/preprocess.py --norm {none | z_norm | min_max_norm | dec_norm | log_norm} --path /VALIDATION_DATA --feature_extraction --feature_list data/training_filtered_features.csv --type validation

This in turn leads to a set of files corresponding to the different feature vectors. In the described scenario, the file to be used for training would be validation_manually_filtered_feature_vectors.npz. The generation of the test dataset works analogously.

Training

$ python oscillogram_classification/train.py --train_path TRAIN_DATA.npz --val_path VAL_DATA.npz --test_path TEST_DATA.npz

Note: Before training, a consistency check is performed, which is particularly relevant for training on feature vectors. It is checked whether each of the datasets (train, test, validation) contains exactly the same features in the same order.

Class Activation / Saliency Map Generation

$ python oscillogram_classification/cam.py [--znorm] [--overlay] --method {gradcam | hirescam | tf-keras-gradcam | tf-keras-gradcam++ | tf-keras-scorecam | tf-keras-layercam | tf-keras-smoothgrad | all} --sample_path SAMPLE.csv --model_path MODEL.h5

Note: Using all as method results in a side-by-side plot of all methods.

HiResCAM Example

All Heatmap Generation Methods Side-by-Side

WandB Sweeps (Hyperparameter Optimization)

"Hyperparameter sweeps provide an organized and efficient way to conduct a battle royale of models and pick the most accurate model. They enable this by automatically searching through combinations of hyperparameter values (e.g. learning rate, batch size, number of hidden layers, optimizer type) to find the most optimal values." - wandb.ai

$ python oscillogram_classification/run_sweep.py --train_path TRAIN_DATA.npz --val_path VAL_DATA.npz --test_path TEST_DATA.npz

Clustering and Sub-ROI Patch Classification

As an alternative to the above classification of entire ROIs (Regions of Interest), we implemented another approach based on the determination of sub-regions, i.e., patches that make up the ROIs. An ROI detection algorithm provides the input for the clustering of the cropped sub-ROIs. The ROIs are divided into the following five categories for the battery signals:

The five categories are practically motivated, based on semantically meaningful regions that an expert would look at when searching for anomalies. Afterwards, the patches are clustered and for each patch type, i.e., cluster, a model is trained that classifies samples of the corresponding patch type. The following example shows the result of such a clustering, where each cluster is annotated (red) with the represented patch type from the above battery signal:

In this example, DBA k-means was able to correctly cluster 29/30 patches. The one misclassified patch actually shares many characteristics with the cluster to which it was assigned.

Results of DBA k-means:

cluster distribution: [7, 6, 6, 6, 5]
ground truth per cluster: [[1, 3, 1, 1, 1, 1, 1], [4, 4, 4, 4, 4, 4], [2, 2, 2, 2, 2, 2], [5, 5, 5, 5, 5, 5], [3, 3, 3, 3, 3]])

Clustering usage (with .csv patches):

$ python oscillogram_classification/cluster.py --norm {none | z_norm | min_max_norm | dec_norm | log_norm} --path PATH_TO_PATCHES [--clean_patches]

Using Predetermined Clusters for Comparison with Newly Recorded Samples

The idea is to compute the distance between the new time series sample and each of the predetermined cluster centroids. After computing the distances, the cluster with the smallest distance (configurable metric) is selected as the best match for the new sample.

Classify single recording with ground truth label (type: patch0):

$ python oscillogram_classification/clustering_application.py --samples SAMPLE_patch0.csv

Classify set of recordings with ground truth labels (dir of patch0 type .csv files):

$ python oscillogram_classification/clustering_application.py --samples /patch0/

Sample output:

-------------------------------------------------------------------------
test sample excerpt: [10.129, 10.137, 10.137, 10.153, 10.161, 10.153, 10.153]
best matching cluster for new sample: 0 ( [0, 2, 0, 0, 0, 0, 0] )
ground truth: 0
SUCCESS: ground truth ( 0 ) matches most prominent entry in cluster ( 0 )
-------------------------------------------------------------------------

The options without ground truth labels work equivalently, just without the patch type in the file / dir name.

$k$-NN Classification

$ python oscillogram_classification/knn.py --train_path /TRAIN_DATA --test_path /TEST_DATA --norm {none | z_norm | min_max_norm | dec_norm | log_norm}

Positive (1) and Negative (0) Sample for each Component

Normalized Battery Voltage (Engine Starting Process)

Training and Validation Accuracy of Selected Models

TBD.

Related Publications

@inproceedings{10.1145/3587259.3627546,
    author = {Bohne, Tim and Windler, Anne-Kathrin Patricia and Atzmueller, Martin},
    title = {A Neuro-Symbolic Approach for Anomaly Detection and Complex Fault Diagnosis Exemplified in the Automotive Domain},
    year = {2023},
    isbn = {9798400701412},
    publisher = {Association for Computing Machinery},
    address = {New York, NY, USA},
    url = {https://doi.org/10.1145/3587259.3627546},
    doi = {10.1145/3587259.3627546},
    booktitle = {Proceedings of the 12th Knowledge Capture Conference 2023},
    pages = {35โ€“43},
    numpages = {9},
    location = {Pensacola, FL, USA},
    series = {K-CAP '23}
}

oscillogram_classification's People

Contributors

tbohne avatar awindler avatar

Watchers

 avatar Kostas Georgiou avatar  avatar

Forkers

aw40

oscillogram_classification's Issues

Implement downsampling option

  • naive heuristic: take every n-th value (more elaborate methods later)
  • --sampling_rate flag
  • possibly two versions:
    • 1.) bring all samples to same sampling rate
    • 2.) apply downsampling with same factor to each sample
  • assumes a format that contains time information

Investigation of uncertain predictions

  • in certain cases it could be interesting to plot several heatmaps for one sample, e.g. for each class
  • it would be interesting for very unsure predictions
  • add a check for certainty of a prediction -> if it reaches a lower bound, the different CAMs are shown in order to understand what the model is looking at to "think" of the sample of being assigned to one class or the other
  • again, this assumes that there is an option to plot several heatmaps for one sample (see #1)

Consider derivative of oscillograms

  • could be interesting to train / classify the derivatives of the oscilloscope recordings (changes in voltage over time)
  • comparable to what the e-field sensor will be recording

Optional validation dataset

  • the specified (required) validation dataset is currently not used in case of feature_extraction, as there is no way to manually pass such a set to the sklearn classifiers
  • the sklearn classifiers use a defined fraction of the training samples
  • as this is also possible with the keras models, the validation dataset should be optional

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.