GithubHelp home page GithubHelp logo

ecg_anomalydetection's Introduction

Introduction to Autoencoder

An Autoencoder is a widely used neural network architecture in deep learning for tasks such as dimensionality reduction, data compression, and anomaly detection. It's a type of unsupervised neural network that has the ability to learn effective representations of input data by discovering patterns and important features within the data. An interesting application of Autoencoders is in anomaly detection, where they can be trained to reconstruct normal data and then identify anomalies based on the difference between input data and the reconstruction.

Components of an Autoencoder

The Autoencoder consists of three main components:

Encoder:

  1. The encoder is the part of the neural network responsible for transforming input data into a lower-dimensional latent representation. It performs data compression by capturing important features. Latent Space:

  2. The latent space is where the lower-dimensional representation is held. This representation contains essential information about the input data and is the central point of the Autoencoder. Decoder:

  3. The decoder is responsible for reconstructing the data from the latent representation. It tries to generate an output that is a close reconstruction of the original input data.

AutoEncoder Architecture

AutoEncoder Architecture

Anomaly Detection with Autoencoder

To detect anomalies using an Autoencoder, the general procedure is as follows:

  1. Train the Autoencoder with normal data, i.e., data that does not contain anomalies.

  2. Use the trained Autoencoder to reconstruct the input data.

  3. Calculate the difference between the input data and the reconstructions.

  4. Set a threshold for this difference. Any input that has a difference above this threshold is considered an anomaly.

Advantages of Autoencoders in Anomaly Detection

  • Autoencoders are capable of capturing complex and nonlinear patterns in data, making them effective for detecting anomalies.

  • It is an unsupervised approach, meaning it does not require anomaly labels in the training dataset, making it useful when anomalous data is rare or unknown.

  • The flexibility of Autoencoders allows adaptation to different types of data and anomaly detection tasks.

import numpy as np
import pandas as pd
import tensorflow as tf
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix, ConfusionMatrixDisplay

from tensorflow.keras.layers import (
    ReLU,
    Dense,
    Activation,
    BatchNormalization,
)

sns.set_style("darkgrid")
import warnings
warnings.filterwarnings("ignore", "is_categorical_dtype")
warnings.filterwarnings("ignore", "use_inf_as_na")
# Download the dataset

df = pd.read_csv('http://storage.googleapis.com/download.tensorflow.org/data/ecg.csv', header=None)
raw_data = df.values

df.head()
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
0 1 2 3 4 5 6 7 8 9 ... 131 132 133 134 135 136 137 138 139 140
0 -0.112522 -2.827204 -3.773897 -4.349751 -4.376041 -3.474986 -2.181408 -1.818286 -1.250522 -0.477492 ... 0.792168 0.933541 0.796958 0.578621 0.257740 0.228077 0.123431 0.925286 0.193137 1.0
1 -1.100878 -3.996840 -4.285843 -4.506579 -4.022377 -3.234368 -1.566126 -0.992258 -0.754680 0.042321 ... 0.538356 0.656881 0.787490 0.724046 0.555784 0.476333 0.773820 1.119621 -1.436250 1.0
2 -0.567088 -2.593450 -3.874230 -4.584095 -4.187449 -3.151462 -1.742940 -1.490659 -1.183580 -0.394229 ... 0.886073 0.531452 0.311377 -0.021919 -0.713683 -0.532197 0.321097 0.904227 -0.421797 1.0
3 0.490473 -1.914407 -3.616364 -4.318823 -4.268016 -3.881110 -2.993280 -1.671131 -1.333884 -0.965629 ... 0.350816 0.499111 0.600345 0.842069 0.952074 0.990133 1.086798 1.403011 -0.383564 1.0
4 0.800232 -0.874252 -2.384761 -3.973292 -4.338224 -3.802422 -2.534510 -1.783423 -1.594450 -0.753199 ... 1.148884 0.958434 1.059025 1.371682 1.277392 0.960304 0.971020 1.614392 1.421456 1.0

5 rows ร— 141 columns

# The labels are in the last column
labels = raw_data[:, -1]

# The ECG data
data = raw_data[:, 0:-1]

# Split the data
train_data, test_data, train_labels, test_labels = train_test_split(
    data, labels, test_size=0.2, random_state=21
)
# Normalize data (values between 0, 1)
min_val = tf.reduce_min(train_data)
max_val = tf.reduce_max(train_data)

train_data = (train_data - min_val) / (max_val - min_val)
test_data = (test_data - min_val) / (max_val - min_val)

train_data = tf.cast(train_data, tf.float32)
test_data = tf.cast(test_data, tf.float32)
# Separete data into Normal data and Anomalous Data (Because the model is training only on Normal data)
train_labels = train_labels.astype(bool)
test_labels = test_labels.astype(bool)

normal_train_data = train_data[train_labels]
normal_test_data = test_data[test_labels]

anomalous_train_data = train_data[~train_labels]
anomalous_test_data = test_data[~test_labels]
# Plot Normal ECG Data
sns.lineplot(normal_train_data[0])
plt.title("Normal ECG")
plt.show()

png

# Plot Anomalous ECG Data
sns.lineplot(anomalous_train_data[0])
plt.title("Anomalous ECG")
plt.show()

png

Building the model

class AnomalyDetector(tf.keras.Model):
    def __init__(self):
        super(AnomalyDetector, self).__init__()

        self.encoder = self.get_encoder()
        self.decoder = self.get_decoder()

    def get_encoder(self):
        """Return the encoder arch"""
        encoder = tf.keras.models.Sequential([

            Dense(32),
            BatchNormalization(),
            ReLU(),

            Dense(16),
            BatchNormalization(),
            ReLU(),

            Dense(8),
            BatchNormalization(),
            ReLU(),
        ])
        return encoder
    
    def get_decoder(self):
        decoder = tf.keras.models.Sequential([
            Dense(16),
            BatchNormalization(),
            ReLU(),

            Dense(32),
            BatchNormalization(),
            ReLU(),

            Dense(64),
            BatchNormalization(),
            ReLU(),

            Dense(140),
            BatchNormalization(),
            Activation("sigmoid")
        ])
        return decoder
    
    def call(self, ecp_inp):
        encoder = self.encoder(ecp_inp)
        decoder = self.decoder(encoder)
        return decoder

Training the Model

autoencoder = AnomalyDetector()
autoencoder.compile(optimizer='adam', loss='mae')
history = autoencoder.fit(normal_train_data, normal_train_data, 
          epochs=40, 
          batch_size=256,
          validation_data=(test_data, test_data),
          shuffle=True,)
Epoch 1/40
10/10 [==============================] - 3s 29ms/step - loss: 0.1450 - val_loss: 0.0540
Epoch 2/40
10/10 [==============================] - 0s 6ms/step - loss: 0.1027 - val_loss: 0.0522
Epoch 3/40
10/10 [==============================] - 0s 6ms/step - loss: 0.0775 - val_loss: 0.0504
Epoch 4/40
10/10 [==============================] - 0s 7ms/step - loss: 0.0644 - val_loss: 0.0487
Epoch 5/40
10/10 [==============================] - 0s 6ms/step - loss: 0.0560 - val_loss: 0.0472
Epoch 6/40
10/10 [==============================] - 0s 6ms/step - loss: 0.0510 - val_loss: 0.0457
Epoch 7/40
10/10 [==============================] - 0s 6ms/step - loss: 0.0487 - val_loss: 0.0444
Epoch 8/40
10/10 [==============================] - 0s 6ms/step - loss: 0.0461 - val_loss: 0.0432
Epoch 9/40
10/10 [==============================] - 0s 6ms/step - loss: 0.0451 - val_loss: 0.0421
Epoch 10/40
10/10 [==============================] - 0s 6ms/step - loss: 0.0428 - val_loss: 0.0411
Epoch 11/40
10/10 [==============================] - 0s 6ms/step - loss: 0.0427 - val_loss: 0.0404
Epoch 12/40
10/10 [==============================] - 0s 6ms/step - loss: 0.0410 - val_loss: 0.0398
Epoch 13/40
10/10 [==============================] - 0s 6ms/step - loss: 0.0405 - val_loss: 0.0393
Epoch 14/40
10/10 [==============================] - 0s 6ms/step - loss: 0.0389 - val_loss: 0.0389
Epoch 15/40
10/10 [==============================] - 0s 7ms/step - loss: 0.0376 - val_loss: 0.0385
Epoch 16/40
10/10 [==============================] - 0s 6ms/step - loss: 0.0385 - val_loss: 0.0382
Epoch 17/40
10/10 [==============================] - 0s 6ms/step - loss: 0.0368 - val_loss: 0.0380
Epoch 18/40
10/10 [==============================] - 0s 6ms/step - loss: 0.0369 - val_loss: 0.0378
Epoch 19/40
10/10 [==============================] - 0s 6ms/step - loss: 0.0364 - val_loss: 0.0377
Epoch 20/40
10/10 [==============================] - 0s 6ms/step - loss: 0.0349 - val_loss: 0.0375
Epoch 21/40
10/10 [==============================] - 0s 7ms/step - loss: 0.0346 - val_loss: 0.0370
Epoch 22/40
10/10 [==============================] - 0s 6ms/step - loss: 0.0331 - val_loss: 0.0375
Epoch 23/40
10/10 [==============================] - 0s 6ms/step - loss: 0.0329 - val_loss: 0.0381
Epoch 24/40
10/10 [==============================] - 0s 7ms/step - loss: 0.0331 - val_loss: 0.0383
Epoch 25/40
10/10 [==============================] - 0s 6ms/step - loss: 0.0332 - val_loss: 0.0375
Epoch 26/40
10/10 [==============================] - 0s 6ms/step - loss: 0.0322 - val_loss: 0.0385
Epoch 27/40
10/10 [==============================] - 0s 6ms/step - loss: 0.0313 - val_loss: 0.0415
Epoch 28/40
10/10 [==============================] - 0s 6ms/step - loss: 0.0305 - val_loss: 0.0408
Epoch 29/40
10/10 [==============================] - 0s 6ms/step - loss: 0.0314 - val_loss: 0.0401
Epoch 30/40
10/10 [==============================] - 0s 6ms/step - loss: 0.0306 - val_loss: 0.0384
Epoch 31/40
10/10 [==============================] - 0s 6ms/step - loss: 0.0296 - val_loss: 0.0358
Epoch 32/40
10/10 [==============================] - 0s 6ms/step - loss: 0.0300 - val_loss: 0.0358
Epoch 33/40
10/10 [==============================] - 0s 6ms/step - loss: 0.0293 - val_loss: 0.0355
Epoch 34/40
10/10 [==============================] - 0s 6ms/step - loss: 0.0277 - val_loss: 0.0349
Epoch 35/40
10/10 [==============================] - 0s 6ms/step - loss: 0.0282 - val_loss: 0.0346
Epoch 36/40
10/10 [==============================] - 0s 6ms/step - loss: 0.0281 - val_loss: 0.0347
Epoch 37/40
10/10 [==============================] - 0s 6ms/step - loss: 0.0284 - val_loss: 0.0346
Epoch 38/40
10/10 [==============================] - 0s 6ms/step - loss: 0.0276 - val_loss: 0.0344
Epoch 39/40
10/10 [==============================] - 0s 6ms/step - loss: 0.0279 - val_loss: 0.0342
Epoch 40/40
10/10 [==============================] - 0s 7ms/step - loss: 0.0262 - val_loss: 0.0341
plt.plot(history.history["loss"], label="Training Loss")
plt.plot(history.history["val_loss"], label="Validation Loss")
plt.legend()
<matplotlib.legend.Legend at 0x17cfc349a50>

png

Evaluating

encoded_data = autoencoder.encoder(normal_test_data).numpy()
decoded_data = autoencoder.decoder(encoded_data).numpy()

plt.plot(normal_test_data[0], 'b')
plt.plot(decoded_data[0], 'r')
plt.fill_between(np.arange(140), decoded_data[0], normal_test_data[0], color='lightcoral')
plt.legend(labels=["Input", "Reconstruction", "Error"])
plt.show()

png

encoded_data = autoencoder.encoder(anomalous_test_data).numpy()
decoded_data = autoencoder.decoder(encoded_data).numpy()

plt.plot(anomalous_test_data[1], 'b')
plt.plot(decoded_data[1], 'r')
plt.fill_between(np.arange(140), decoded_data[1], anomalous_test_data[1], color='lightcoral')
plt.legend(labels=["Input", "Reconstruction", "Error"])
plt.show()

png

Defining a Threshold

losses = []
labels = []

reconstructions = autoencoder.predict(normal_train_data)
normal_loss = tf.keras.losses.mae(reconstructions, normal_train_data)

[losses.append(i) for i in normal_loss.numpy()]
[labels.append(1) for _ in normal_loss.numpy()]

plt.hist(normal_loss[None,:], bins=50)
plt.xlabel("Normal ECG MAE error")
plt.ylabel("No of examples")
plt.show()
74/74 [==============================] - 0s 951us/step

png

reconstructions = autoencoder.predict(anomalous_test_data)
anomalous_loss = tf.keras.losses.mae(reconstructions, anomalous_test_data)

[losses.append(i) for i in anomalous_loss.numpy()]
[labels.append(0) for _ in anomalous_loss.numpy()]

plt.hist(anomalous_loss[None, :], bins=50)
plt.xlabel("Anomalous ECG MAE error")
plt.ylabel("No of examples")
plt.show()
14/14 [==============================] - 0s 2ms/step

png

hist_data = pd.DataFrame({"Loss":losses, "Label":labels})
sns.histplot(data=hist_data, x="Loss", hue="Label")
plt.xlim(0, 0.1)
(0.0, 0.1)

png

sns.histplot(data=hist_data, x="Loss", hue="Label")
plt.xlim(0, 0.1)
threshold = 0.04
plt.axvline(threshold)
<matplotlib.lines.Line2D at 0x17cfcd2fb20>

png

def predict(model, data, threshold):
  reconstructions = model(data)
  loss = tf.keras.losses.mae(reconstructions, data)
  return tf.math.less(loss, threshold)

def print_stats(predictions, labels):
  print(f"Accuracy = {round(accuracy_score(labels, predictions),3)}")
  print(f"Precision = {round(precision_score(labels, predictions),3)}")
  print(f"Recall = {round(recall_score(labels, predictions),3)}")
  print(f"F1-Score = {round(f1_score(labels, predictions),3)}")
  
  cm = confusion_matrix(labels, predictions)
  disp = ConfusionMatrixDisplay(cm)
  disp.plot()
  plt.grid(False)

preds = predict(autoencoder, test_data, threshold)
print_stats(preds, test_labels)
Accuracy = 0.964
Precision = 0.976
Recall = 0.959
F1-Score = 0.968

png

ecg_anomalydetection's People

Contributors

milancalegari avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.