enyandai / ganf Goto Github PK
View Code? Open in Web Editor NEWOffical implementation of "Graph-Augmented Normalizing Flows for Anomaly Detection of Multiple Time Series" (ICLR 2022)
Offical implementation of "Graph-Augmented Normalizing Flows for Anomaly Detection of Multiple Time Series" (ICLR 2022)
There is a bug in the load_water(..)
function:
root = 'data/SWaT_Dataset_Attack_v0.csv'
data = pd.read_csv(root)
data = data.rename(columns={"Normal/Attack":"label"})
data.label[data.label!="Normal"]=1
data.label[data.label=="Normal"]=0
ts_format = pd.to_datetime(data["Timestamp"], format="%d/%m/%Y %I:%M:%S %p")
ts_no_format = pd.to_datetime(data["Timestamp"])
In the above code block, the dataframes ts_format
and ts_no_format
should be identical. However, since ts_no_format
does not see the format, it treats the string 2/1/2016 7:00:00 AM
as Feb 1st 2016 instead of the TRUE date Jan 2nd 2016.
The format specified in the format
argument matches the format of the string timestamp. This can be easily verified by checking the format of any string timestamp with date >12.
Not sure how much this bug would affect the performance, but would be nice if the authors could fix it.
Thank you very much for sharing the code. In your paper, you used CNF based on MAF to evaluate the conditional probability distribution of each time series variable. But MAF should mask some of the dimensions of hidden variables and then perform autoregression. How does MAF work for this one-dimensional univariate time series?
Line 29 in d207f7e
There is a problem when using pandas to read csv.
pandas.errors.ParserError: Error tokenizing data. C error: Expected 1 fields in line 3, saw 2
How to deal with the missing values
Hello,
Thanks for sharing the code! I found the python code for baselines in the models folder, but I am not sure how to use them. Could you please give some instructions about how to use them to reproduce the result of baselines when you have a moment?
Thank you in advance! My email is [email protected].
Thank you!
I'm following your great work. However, when I run the DeepSVDD provided in the code, I get 84% AUROC on SWaT, better than GANF. It seems that DeepSVDD overfits the dataset. How can I solve this? The settings are as follows. I wish you can provide the setting of DeepSVDD so that I can continue following your great work. Thank you very much.
epochs = 40
input_feature = 51
hidden_size = 64
If possible, could you send your trianing code for baseline models to my email [email protected]
Hello, thank you for making the code available, it's a very nice work. When I run train_water.py, I have a confusion on line 66 of code in GANF.py. In the paper, the log(p(x)) in equation (10) is sum() but in the code it becomes mean(). I can't understand why this change was introduced. I have tried to change mean() to sum(), I find the result about the best roc_test is 0.7875 that satisfies 79.6±0.9 in Table 1. But if I don't change it, the result will be better that the roc_test will be 0.79 or 0.80. I can't understand why this change was introduced. Thank you!
I think log_prob = log_prob.sum(dim=1) are more reasonable.
class GANF(nn.Module):
def __init__ (self, n_blocks, input_size, hidden_size, n_hidden ,dropout = 0.1, model="MAF", batch_norm=True):
super(GANF, self).__init__()
self.rnn = nn.LSTM(input_size=input_size,hidden_size=hidden_size,batch_first=True, dropout=dropout)
self.gcn = GNN(input_size=hidden_size, hidden_size=hidden_size)
if model=="MAF":
self.nf = MAF(n_blocks, input_size, hidden_size, n_hidden, cond_label_size=hidden_size, batch_norm=batch_norm,activation='tanh')
else:
self.nf = RealNVP(n_blocks, input_size, hidden_size, n_hidden, cond_label_size=hidden_size, batch_norm=batch_norm)
def forward(self, x, A):
return self.test(x, A).mean()
def test(self, x, A):
# x: N X K X L X D
full_shape = x.shape
# reshape: N*K, L, D
x = x.reshape((x.shape[0]*x.shape[1], x.shape[2], x.shape[3]))
h,_ = self.rnn(x)
# resahpe: N, K, L, H
h = h.reshape((full_shape[0], full_shape[1], h.shape[1], h.shape[2]))
h = self.gcn(h, A)
# reshappe N*K*L,H
h = h.reshape((-1,h.shape[3]))
x = x.reshape((-1,full_shape[3]))
log_prob = self.nf.log_prob(x,h).reshape([full_shape[0],-1])#*full_shape[1]*full_shape[2]
log_prob = log_prob.mean(dim=1)
return log_prob
Hello, I would like to ask if the code has an effect on univariate time series, and I would appreciate it if you could answer me.
Hi, the dataset of train GANF in train_water.py is SWaT_Dataset_Attack_v0.csv. When running train_water.py, SWaT_Dataset_Attack_v0.csv is splitted train/val/test dataloader. I can't understand why this model was trained in SWaT_Dataset_Attack_v0.csv. I think this model is more reasonable to train on SWaT_Dataset_Normal_v1.csv that is not attacked, and to test on SWaT_Dataset_Attack_v0.csv. I think this training method will make the attacked points more likely to be located in areas of low probability density. Thank you very much!
I am trying to run the baseline DeepSAD and DeepSVDD that you provided, however, I don't know what the formal parameters delta_t and sigma of the test function refer to, or what parameters to pass into the test function. Can you answer my confusion?I hope you can reply soon!
Great work!
I was wondering, after the following line, 30 more epoches are trained again, why introducing this training step?
Line 188 in d207f7e
Thanks in advance!
Hi,
Great work from the authors and thank you for making the code available. I was wondering whether the training code for the baselines could also be shared? I am working on a variant problem for which I would like to try one of the baselines DeepSVDD or DeepSAD. Since the codebase provides a nice framework to build on, I would highly appreciate it if the baseline training code could also be made available.
My email id is [email protected] for further communication.
Thanks a lot.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.