ruiking04 / coca Goto Github PK

Deep Contrastive One-Class Time Series Anomaly Detection

Python 100.00%

anomaly-detection time-series pytorch deep-learning timeseries

coca's Introduction

Deep Contrastive One-Class Time Series Anomaly Detection

This repository provides the implementation of the Deep Contrastive One-Class Time Series Anomaly Detection method, called COCA bellow.

The implementation uses the Merlion and the Tsaug libraries.

Abstract

The accumulation of time-series data and the absence of labels make time-series Anomaly Detection (AD) a self-supervised deep learning task. Single-normality-assumptionbased methods, which reveal only a certain aspect of the whole normality, are incapable of tasks involved with a large number of anomalies. Specifically, Contrastive Learning (CL) methods distance negative pairs, many of which consist of both normal samples, thus reducing the AD performance. Existing multi-normality-assumption-based methods are usually two-staged, firstly pre-training through certain tasks whose target may differ from AD, limiting their performance. To overcome the shortcomings, a deep Contrastive One-Class Anomaly detection method of time series (COCA) is proposed by authors, following the normality assumptions of CL and one-class classification. It treats the origin and reconstructed representations as the positive pair of negative-samples-free CL, namely “sequence contrast”. Next, invariance terms and variance terms compose a contrastive one-class loss function in which the loss of the assumptions is optimized by invariance terms simultaneously and the “hypersphere collapse” is prevented by variance terms. In addition, extensive experiments on two real-world time-series datasets show the superior performance of the proposed method achieves state-of-the-art.

Citation

Link to our paper here.

If you use this code for your research, please cite our paper:

@inproceedings{wang2023deep,
  title={Deep Contrastive One-Class Time Series Anomaly Detection},
  author={Wang, Rui and Liu, Chongwei and Mou, Xudong and Gao, Kai and Guo, Xiaohui and Liu, Pin and Wo, Tianyu and Liu, Xudong},
  booktitle={Proceedings of the 2023 SIAM International Conference on Data Mining (SDM)},
  pages={694--702},
  year={2023},
  organization={SIAM}
}

Installation

This code is based on Python 3.8, all requirements are written in requirements.txt. Additionally, we should install saleforce-merlion v1.1.1 and ts_dataset as Merlion suggested.

git clone https://github.com/salesforce/Merlion.git
cd Merlion
pip install salesforce-merlion==1.1.1
pip install IPython
pip install -r requirements.txt

The COCA repository already includes the merlion's data loading package ts_datasets. Please unzip the data/iops_competition/phase2.zip before running the program.

Repository Structure

`conf`

This directory contains experiment parameters for all models on IOpsCompetition, UCR datasets.

`models`

Source code of COCA model.

`results`

Directory where the experiment results and checkpoint are saved.

Usage

python coca.py --selected_dataset UCR --selected_model COCA --device cuda --seed 2
python coca.py --selected_dataset IOpsCompetition --selected_model COCA --device cuda --seed 2

# COCA Variants
python coca.py --selected_dataset IOpsCompetition --selected_model COCA_no_aug --device cuda --seed 2
python coca.py --selected_dataset IOpsCompetition --selected_model COCA_no_cl --device cuda --seed 2
python coca.py --selected_dataset IOpsCompetition --selected_model COCA_no_oc --device cuda --seed 2
python coca.py --selected_dataset IOpsCompetition --selected_model COCA_no_var --device cuda --seed 2
python coca.py --selected_dataset IOpsCompetition --selected_model COCA_view --device cuda --seed 2

Disclosure

This implementation is based on affiliation-metrics.

coca's People

Contributors

Stargazers

Watchers

Forkers

chandan-iiti jc0624 skydzl qin8948050 mufiye yurkar2333 luciferjason gopikrishnansrikumar eelfgnc

coca's Issues

CPC - FileNotFoundError: [Errno 2] No such file or directory: './results/cpc/cpc-model_best.pth'

Hello,

I run this code: python -u ./baseline.py --dataset UCR --model CPC --debug > stdout_base_CPC 2>stderr_base_CPC

And I get this error:

Traceback (most recent call last):
File "./baseline.py", line 761, in
main()
File "./baseline.py", line 701, in main
train_model(
File "./baseline.py", line 334, in train_model
train_scores, test_scores = evaluator.get_predict(
File ".../ENV_COCA-main-December182023/lib/python3.8/site-packages/merlion/evaluate/anomaly.py", line 433, in get_predict
train_result, result = super().get_predict(
File ".../ENV_COCA-main-December182023/lib/python3.8/site-packages/merlion/evaluate/base.py", line 173, in get_predict
train_result = self._train_model(train_vals, **full_train_kwargs)
File ".../ENV_COCA-main-December182023/lib/python3.8/site-packages/merlion/evaluate/base.py", line 129, in _train_model
return self.model.train(train_vals, **train_kwargs)
File ".../COCA-main-December182023/COCA-main/models/CPC/DetectorModel.py", line 129, in train
self._train(train_df.values)
File ".../COCA-main-December182023/COCA-main/models/CPC/DetectorModel.py", line 74, in _train
snapshot(self.config.logging_dir, 'cpc', {
File ".../COCA-main-December182023/COCA-main/models/CPC/network/training_v1.py", line 78, in snapshot
torch.save(state, snapshot_file)
File ".../ENV_COCA-main-December182023/lib/python3.8/site-packages/torch/serialization.py", line 376, in save
with _open_file_like(f, 'wb') as opened_file:
File ".../ENV_COCA-main-December182023/lib/python3.8/site-packages/torch/serialization.py", line 230, in _open_file_like
return _open_file(name_or_buffer, mode)
File ".../ENV_COCA-main-December182023/lib/python3.8/site-packages/torch/serialization.py", line 211, in init
super(_open_file, self).init(open(name, mode))
FileNotFoundError: [Errno 2] No such file or directory: './results/cpc/cpc-model_best.pth'

Must I run another code before this? Please advise.

Marcia

Figure 3

Hello,

Please could you let me know where in the code you create "Figure 3 - AD results of COCA on AIOps and UCR".

Thanks

Running error

Hello, I follow the command provided in your readme: python coca py --selected_ Dataset UCR -- device cuda -- seed 2, but the following results are obtained. How can I solve this problem?

How to run several Datasets

Hello,

I am looking at Table 2 from your paper.

Please could you let me know how I would run the following datasets:

SR-CNN
CPC-AD

Thanks,
Marcia

Hyperparameters Search

Hello,

You suggested to play with: window_size, time_step, batch_size, scale_ratio, and jitter_ratio. Should I try other variables too?

I would like to ask you what is acceptable ranges for each of the above values.

I would also like to know how they are related to each other. For example, I believe that window_size must be proportional to batch_size.

Thanks,
Marcia

Results of baseline.py are all the same

Hello,

Please could you give me any insight as to why Accuracy, Precision, and Recall are all the same for the different models as per below (from log.txt). I don't know why this is happening. I did not change the code much.

['IsolationForest'] UCR Reasonable Metrics
['IsolationForest'] UCR {'accuracy': 0.9411764705882353}
['IsolationForest'] UCR Affiliation Metrics
['IsolationForest'] UCR {'precision': 0.00011550719361908644, 'recall': 0.007922012085960176}
['RandomCutForest'] UCR Reasonable Metrics
['RandomCutForest'] UCR {'accuracy': 0.9411764705882353}
['RandomCutForest'] UCR Affiliation Metrics
['RandomCutForest'] UCR {'precision': 0.00011550719361908644, 'recall': 0.007922012085960176}
['SpectralResidual'] UCR Reasonable Metrics
['SpectralResidual'] UCR {'accuracy': 0.9411764705882353}
['SpectralResidual'] UCR Affiliation Metrics
['SpectralResidual'] UCR {'precision': 0.00011550719361908644, 'recall': 0.007922012085960176}
['LSTMED'] UCR Reasonable Metrics
['LSTMED'] UCR {'accuracy': 0.9411764705882353}
['LSTMED'] UCR Affiliation Metrics
['LSTMED'] UCR {'precision': 0.00011550719361908644, 'recall': 0.007922012085960176}
['DAGMM'] UCR Reasonable Metrics
['DAGMM'] UCR {'accuracy': 0.9411764705882353}
['DAGMM'] UCR Affiliation Metrics
['DAGMM'] UCR {'precision': 0.00011550719361908644, 'recall': 0.007922012085960176}
['CPC'] UCR Reasonable Metrics
['CPC'] UCR {'accuracy': 0.9411764705882353}
['CPC'] UCR Affiliation Metrics
['CPC'] UCR {'precision': 0.00011550719361908644, 'recall': 0.007922012085960176}
['OCSVM'] UCR Reasonable Metrics
['OCSVM'] UCR {'accuracy': 0.9411764705882353}
['OCSVM'] UCR Affiliation Metrics
['OCSVM'] UCR {'precision': 0.00011550719361908644, 'recall': 0.007922012085960176}

Results Files

What is the difference between log.txt and UCR_summary.csv?

Please could you explain the values in UCR_summary.csv?

Why are the Precision/Recall different from each other (in the log.txt and UCR_summary.csv files)?

Is the "F1" value the same as "Affiliation F1" (Table 2)?

In UCR_summary, what do the numbers next to the model mean? For example: DAGMM_0.7066189827640826

Thanks,
Marcia

TS_TCC Error

Hello,

When trying to run TS_TCC (Self Supervised), I get this error:

Traceback (most recent call last):
File "./ts_tcc_main.py", line 197, in
train_dl, valid_dl, test_dl = data_generator1(time_series, time_series_label, configs, training_mode)
File ".../COCA-main/models/TS_TCC/dataloader/dataset.py", line 107, in data_generator1
train_dataset = Load_Dataset(train_dat_dict, configs, training_mode)
File ".../COCA-main/models/TS_TCC/dataloader/dataset.py", line 37, in init
self.aug1, self.aug2 = DataTransform(self.x_data, config)
File ".../COCA-main/models/TS_TCC/dataloader/augmentations.py", line 8, in DataTransform
strong_aug = jitter(permutation(sample, max_segments=config.augmentation.max_seg), config.augmentation.jitter_ratio)
File ".../COCA-main-December182023/COCA-main/models/TS_TCC/dataloader/augmentations.py", line 42, in permutation
warp = np.concatenate(np.random.permutation(splits)).ravel()
File "mtrand.pyx", line 4703, in numpy.random.mtrand.RandomState.permutation
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (
7,) + inhomogeneous part.

Plesae could you let me know how you solved this?

Thanks,
Marcia

COCA Visualization Plots

The plots that I do by hand look like this:

The plot generated by COCA looks like this:

The plots are not the same. In fact, the COCA is much smaller than my plot.

Please could you let me know what I should do to make the visualizations to work properly.

Calculation of Standard Deviation

Hello,

Please could you let me know how many seeds you ran in order to determine your standard deviation?

Did you run each model 5 times? 10 times?

Thanks

Merlion folders

I downloaded the project from scratch and did all the processes detailed in GitHub.

However, I keep getting these messages that it does not recognize "merlion.". I changed everything to "Merlion.merlion." but this also gave me problems. An error message is as below:

Traceback (most recent call last):
File "coca.py", line 7, in
from models.TS_TCC.TS_utils import logger
File "...\COCA-main\models_init.py", line 1, in
from .OCSVM import OCSVMConf, OCSVM
File "...\COCA-main\models\OCSVM_init_.py", line 1, in
from .DetectorConfig import OCSVMConf
File "...\COCA-main\models\OCSVM\DetectorConfig.py", line 1, in
from merlion.models.anomaly.base import DetectorConfig
ModuleNotFoundError: No module named 'merlion.models'

Please share with me your experience with this.

Thanks,
Marcia

COCA - Anomaly Detection

Hello,

Please could you let me know how the Anomaly threshold is calculated? In your paper, you mention it is predefined. But how is it predefined?

Thanks

Baseline Parameters

Hello,

You tweak variables for COCA and Varients such as Time_Step, Batch_Size, Epoch_Number, etc.

Do you also tweak variables for Baselines such as OCSVM, DeepSVDD, etc?

demo.py

请问一下作者是否可以上传一下demo.py

Table 2 - Calculating the Standard Deviation

Hello,

In Table 2 of yor paper, you have "Affiliation F1" and "Accuracy".

I would like to ask you how do you determine what the standard deviation is?

For example, in COCA's Accuracy is 66.12 +- 2.62 and COCA's Affiliation F1 is 79.15 +- 1.27 .

How do you calculate "2.62" and "1.27"?

Thanks,
Marcia

COCA - NoCL

Please could you let me know wht is the formula for COCA-NoCL. To me, I understand it as being equation (3.3) without Lsim.

In your paper you state: "NoCL removes the contastive learning of COCA to optimize the similarity of representations and one-class center." Please could you expand this? I do not fully understand.

data_generator4 - where is this?

Hello,

In both coca_no_var.py and coca_view.py, there is a reference to data_generator4 .

I searched in the folder "dataloader" and found data_generator1 and data_generator2.

Please could you let me know what I need to do to get data_generator4 to work?

Thanks,
Marcia

TS_TCC - Anomaly Detection

Hello,

I am working on ts_tcc_main.py file. I try to run the "Anomaly Detection" option with my dataset and I get the following error:

Traceback (most recent call last):
File ".../COCA-main/./ts_tcc_main.py", line 120, in
chkpoint = torch.load(os.path.join(load_from, "ckp_last.pt"), map_location=device)
File "../Jupy_ENV/lib/python3.9/site-packages/torch/serialization.py", line 791, in load
with _open_file_like(f, 'rb') as opened_file:
File ".../Jupy_ENV/lib/python3.9/site-packages/torch/serialization.py", line 271, in _open_file_like
return _open_file(name_or_buffer, mode)
File "../Jupy_ENV/lib/python3.9/site-packages/torch/serialization.py", line 252, in init
super().init(open(name, mode))
FileNotFoundError: [Errno 2] No such file or directory: 'experiments_logs/Exp2/run1/self_supervised_seed_5/UCR/_0/saved_models/ckp_last.pt'

I checked the ".../COCA-main/experiments_logs/Exp2/run1/self_supervised_seed_5/" folder and it does not have "UCR/_0/saved_models/ckp_last.pt".

Please let me know how I could correct this.

Thanks,
Marcia

The loss function in the COCA/models/COCA/coca_trainer/trainer.py is not same as paper.

Detailed at line 184 and 185 of models/COCA/coca_trainer/trainer.py, as following:
sigma_aug1 = torch.sqrt(feature1.var([0]) + 0.0001)
sigma_aug2 = torch.sqrt(distance_dec1.var([0]) + 0.0001)

Refer to the paper, it should be :
sigma_aug1 = torch.sqrt(distance1.var([0]) + 0.0001)
sigma_aug2 = torch.sqrt(distance_dec1.var([0]) + 0.0001)

Am I right?

requirements.txt中依赖拼写错误

saleforce-merlion 应修正为 salesforce-merlion

Number of GPUs

Hello,

Please could you let me know the maximum number of GPUs COCA can use?

Thanks

Interpretation

Hello,

I ran your models against my dataset. This dataset consists of 17 files with non-trivial anomalies (derived from the UCR anomalies).

I was expecting for COCA to perform the best overall. However, I have found that the top 5 performers in order are: OCSVM, DAGMM, Isolation Forest, LSTM-ED, and COCA - No Aug.

I am thinking that for my dataset, COCA is not necessary.

Please let me know your thoughts.

关于标签预处理的疑问

作者你好，感谢提供源码！关于窗口异常标签有一个小疑问，为什么只需要考虑每个窗口的前configs.time_step个时刻，而不是用整个窗口内是否存在异常来衡量。

Result error

Hello, when I run the model according to the default command your readme, the result is always 0. How should I solve the problem?

confusion of the threshold part

Hi, I'm reading your work recently, it's an interesting work and inspired me a lot, thanks very much!

And I have some questions, could you please help me to understand?

I just focus on the UCR dataset, because my situation is same: without anomalous in training dataset, and I'm very new about these kinds of dataset, because you take each element of one time-series sample as a single entity, but in my case, I usually take one time-series sample as one single entity. So, I cannot understand some part of your implementations.

about the threshold part, each time-series sample has one score vector, you choose the highest score as the threshold, however, there are many time-series samples, which result in corresponding number of highest scores and then thresholds, in this case, how to define the final threshold? or how to obtain the threshold for test dataset without knowing the test data?

my tuition thinks it should relates to these lines:
test_affiliation, test_score, predict = ad_predict(test_target, test_score_origin,
config.threshold_determine, config.detect_nu)
score_reasonable = tsad_reasonable(test_target, predict, config.time_step)
however, the input is test_score_origin, obtained from test dataset. In the evalution mode, you didn't split some batches, it means the whole test dataset sample number should smaller than batchsize (512), but I still cannot understand how to determine the threshold for several samples in the test dataset.
Or, in UCR dataset, there is just one time-series?

It's a little abstract for me to understand this part, could you please provide some explainations?

when you use MeanVarNormalize to do the standardization process, you also involved test data, this operation will not cause data leakage?
mvn.train(train_time_series_ts + test_time_series_ts)

merlion'

How to train or validate a model from scratch？

Sorry, I would like to ask how to start training a model. I can't get more information from the README.

Baseline.py - Getting Zeros

Hello,

My Baseline models are all getting results of zero.

Please could you let me know in which file the results are stored.

I'm thinking that I am looking at the wrong file since I almost did not change the code at all and I am using the original UCR dataset.

Thanks,
Marcia

baseline.py - append

Hello,

I am having troubles with baseline.py. I did the following replacements and the code runs:

        changed from: ts_df = train_scores.to_pd().append(test_scores.to_pd())
        changed to:     ts_df = pd.concat([train_scores.to_pd(), test_scores.to_pd()], ignore_index=True)

        changed from: df.append(ts_df, ignore_index=True)
        changed to:     df = pd.concat([df, ts_df], ignore_index=True)

I also did configuration involving Java and py4j. The code seems to run okay without errors.

However, in regards to the results, when I do IsolationForest, all my results are zero. I am using 17 custom datasets that were derived following the UCR datasets. I don't know why the results are zero and if this is an okay results.

Please let me know how I could correct this.

Thanks,
Marcia

ruiking04 / coca Goto Github PK

coca's Introduction

Deep Contrastive One-Class Time Series Anomaly Detection

Abstract

Citation

Installation

Repository Structure

conf

models

results

Usage

Disclosure

coca's People

Contributors

Stargazers

Watchers

Forkers

coca's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs

`conf`

`models`

`results`