iancovert / neural-gc Goto Github PK
View Code? Open in Web Editor NEWGranger causality discovery for neural networks.
License: MIT License
Granger causality discovery for neural networks.
License: MIT License
Hi, thanks for your awesome work! I have a question about cMLP codes, there are two models including cMLP and cMLPSparse, the former aims to get GC and the second one provide the prediction, why not use only one model to do it? Is there any difference between these two models? I would appreciate it if you could answer my question! : )
I am trying to use Neural-GC for 8 variable time series having a length of 5840 each. I have tried both cmlp and clstm on the dataset. The training goes fine but the GC-est values are all zero. Can you help me with this. Following is the log at the start and end
Start
----------Iter = 100----------
Loss = 0.713952
Variable usage = 100.00%
----------Iter = 200----------
Loss = 0.697342
Variable usage = 100.00%
----------Iter = 300----------
Loss = 0.684179
Variable usage = 100.00%
End
Loss = 0.025705
Variable usage = 100.00%
----------Iter = 49800----------
Loss = 0.025695
Variable usage = 100.00%
----------Iter = 49900----------
Loss = 0.025685
Variable usage = 100.00%
----------Iter = 50000----------
Loss = 0.025675
Variable usage = 100.00%
Hi authors,
I ran the linear simulation data with p=30 using cMLP. The GC_est shows some 'nan's. Could you explain why and how to fix it?
Hi,
It's a great implementation, but I noted the Loss function in the code is the MSE + ridge + nonsmooth, and in the paper, the Loss function seems to be the MSE + nonsmooth.
What is the ridge? I saw the comment"Apply ridge penalty at linear layer" in clstm.py and "Apply ridge penalty at all subsequent layers" in cmlp.py, but still not sure why the ridge penalty is needed?
Thanks,
Yunjin
Can you please provide the code you used to plot Figure 5 in the paper?
Hey! Do you have any advice on finding the optimal regularization parameter for the cMLP? For my dataset, I had to increase the lambda and lambda_ridge parameters a lot to get Granger Causal coefficients near zero and was a bit concerned. Should I evaluate the model on the lower loss achievable during training or set up a test set?
Hi,
What is a good way to optimize lambda here in cases where the true answer is unknown? I tried using the MSE of a validation set as a guide but upon testing it leads to many false detections. Is there a better way to optimize lambda?
Thank you,
Suryadi
HI I am using this for non-linear GC for some brain time series, which looks like this:
This is what it looks like on training with the params: CRNN context=10, lam=10.0, lam_ridge=1e-2, lr=1e-3, max_iter=20
----------Iter = 50----------
Loss = 151.557373
Variable usage = 99.95%
----------Iter = 100----------
Loss = nan
Variable usage = 57.95%
----------Iter = 150----------
Loss = nan
Variable usage = 50.97%
----------Iter = 200----------
Loss = nan
Variable usage = 42.81%
----------Iter = 250----------
Loss = nan
Variable usage = 36.39%
----------Iter = 300----------
Loss = nan
Variable usage = 38.50%
Stopping early
The estimated GC is also all 1.
Any intuition will be helpful!
Hi Ian,
Can I modify the RNN to take in 2D series instead of 1D, is it feasible?
Or is there some other solution which think is possible, like concatenating the values somehow?
My current solution is just taking the mean across the 2nd dimension so that it becomes 1D.
Appreciate any help!
Dear developer,
Thanks for sharing this amazing repository about the Neural Granger Causality project. I am dealing with a dataset containing 126 duplicates of small time series datasets representing data from 126 different cities, each of them having around 30 features and 100+ time steps. I am unsure how to apply cMLP or cLSTM for the whole dataset. I am thinking of separately training 126 models, but this would be too time-consuming. So I would appreciate it if there would be any suggestion or demo for how to implement the model for a challenge similar to DREAM datasets in the paper. Thank you very much!
Best,
I am wondering the reason you are calculating gradient only on smooth loss in the train_model_ista
function, even when we are making the proximal gradient step.
My doubt is shouldn't we call joint_loss.backward()
, where joint_loss=smooth + nonsmooth
? And this might be a little trivial, but can you explain the reasoning for taking mean_loss
by dividing the 'smooth + nonsmooth' term with no. of features
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.