rajatsen91 / deepglo Goto Github PK

This repository contains code for the paper: https://arxiv.org/abs/1905.03806. It also contains scripts to reproduce the results in the paper.

License: Other

Python 98.61% Shell 1.39%

deepglo's People

Contributors

Stargazers

Watchers

deepglo's Issues

Initializing Factors.....

hi， i have a question 。
if i run the code python3.5 run_scripts/run_pems.py --normalize True

(228, 12672)
Initializing Factors.....
This goes on for days. Why? What should I do，thanks

how to properly preprocess the raw data?

Hey guys, really impressive work and thanks for sharing the code.

We're trying to use DeepGLO to process datasets other than the four used in the paper, and kind of got stuck at the preprocessing stage. It would be great if you could share any specification or scripts about how to properly preprocess the raw data from the public datasets used in the paper.

It seems there's much difference between the original data and processed data (eg. electricy.npy, etc.). For example, I have downloaded raw electricity data from https://archive.ics.uci.edu/ml/datasets/ElectricityLoadDiagrams20112014, and did resample and fillna as follows.

df = raw_data.resample('1H',label = 'left',closed = 'right').sum()
df.fillna(0, inplace=True)

The last 10 data points of the 1st series, i.e. "MT_001" in the original dataset looks below:

2014-09-07 14:00:00    63.451777
2014-09-07 15:00:00    60.913706
2014-09-07 16:00:00    58.375635
2014-09-07 17:00:00    62.182741
2014-09-07 18:00:00    77.411168
2014-09-07 19:00:00    36.802030
2014-09-07 20:00:00    13.959391
2014-09-07 21:00:00    46.954315
2014-09-07 22:00:00    65.989848
2014-09-07 23:00:00    65.989848

On the other hand, the last 10 datapoints of the 1st series in the "electricity.npy" looks like below. Apparently the values are much different from the original time series values.

array([3.8071, 3.8071, 5.0761, 6.3452, 6.3452, 7.6142, 7.6142, 7.6142,
       7.6142, 7.6142])

Maybe I've missed something here...
It would be really helpful if you could share how this electricity.npy is processed from the raw data as above.

[question] please tell me how to generate electricity.npy

Thanks for releasing your source code.

I have a question about how to generate dataset.
When we download electricity dataset from link in your paper,
there is txt file(LD2011_2014.txt) with data every 15 minutes from 2011-01-01 to 2015-01-01.

So, I wonder how to convert this .txt file to .npy in your google drive.
Please tell me how to generate electricity.npy.

Best Regards.

How to use covariates in this model

It seems that the repository dosen't support covariates in inputs.

What if I want to add covariates in the inputs.

Queries on this Model

I have used the below python file
https://github.com/intel-analytics/analytics-zoo/blob/master/pyzoo/zoo/zouwu/examples/run_electricity.py
to process the traffic.npy which was taken from the below link
https://github.com/rajatsen91/deepglo/blob/master/datasets/download-data.sh
But the npy files doesn’t have any headers and contains only array data. Also from the python file we could not conclude which parameters it do a training. Could you explain more on this or any available document to share for that ?
When we executed training for the traffic model available https://github.com/rajatsen91/deepglo/tree/master/
python run_scripts/run_traffic.py --normalize True it throws that error “RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx” due to NVIDIA graphics card was not available in my system. Is there any other hooks available to train this model ?

param represent

Hi, I have used the below python file
https://github.com/rajatsen91/deepglo/blob/master/DeepGLO/DeepGLO.py

I don't understand the code in the method "step_factF_loss" on the line 255 and 256:

r = loss.detach() / l2.detach()
loss = loss + r * reg * l2

1: why use loss devide l2
2: what does the param reg means? It always be 0 in the code.

Many thanks!

which predictions are the final predictions?

In the paper, the authors commented that the final global predictions can be made with FX(te) where X(te) is forecasted with the local model. Are these final global predictions also the DeepGLO predictions? That is the proposed model?
The confusion is because in the code, there is Wape global and only wape, which is the DeepGLO wape?

[question] creating global covariates

Hi, thanks for sharing your code.

I have a question about creating global covariates.

In the prediction, global covariates are calculated using F*Tx(X) as stated in the paper.

deepglo/DeepGLO/DeepGLO.py

Lines 619 to 630 in 54e0644

 yc = self.predict_global( 

 ind=ind, 

 last_step=last_step, 

 future=future, 

 cpu=cpu, 

 normalize=False, 

 bsize=bsize, 

 ) 

 if self.period is None: 

 ycovs = np.zeros(shape=[yc.shape[0], 1, yc.shape[1]]) 

 if self.forward_cov: 

 ycovs[:, 0, 0:-1] = yc[:, 1::]

However, In training, global covariate seems to be generated by Tx using input sequence directly instead of using factorized F and X (i.e. F*X). Is there any reason for this?

https://github.com/rajatsen91/deepglo/blob/54e0644d764f1ead65d4203b72c8634e2f6ea25e/DeepGLO/DeepGLO.py#L510-520

Best Regards.

Cannot download datasets

Could you please upload the datasets to github?

Piece of Code

Hey ,
Would yo tell what is this :
self.val_index = np.random.randint(0, n - self.vbsize - 5)
Because in each batch the indexing is failed.

Factor training losses not contracting

Hi, this is an impressive project, and thanks for sharing w/ community!

I've been trying to learn the model with my test data. The test data has about 70 samples, each with about 2300 timesteps.

However, in final stage, the Recovery Loss in Rolling Validation is getting bigger and bigger each round, and early stopped at 0.308, which caused the much worse wape and wape_global metrics than the baseline:
{'wape': 0.39331427, 'mape': 0.36864823, 'smape': 0.4937316, 'mae': 5.487852, 'rmse': 8.775283, 'nrmse': 0.47228432, 'wape_global': 0.582235, 'mape_global': 0.56812644, 'smape_global': 0.84549224, 'mae_global': 8.123833, 'rmse_global': 11.685119, 'nrmse_global': 0.47228432, 'baseline_wape': 0.11834013, 'baseline_mape': 0.11296055, 'baseline_smape': 0.11496856}

Could you provide some insights how can I improve the training and get better resulsts?

Thanks!!!

===========================================================

Last round of Recovery Loss stats:
GLO: rolling_validation(): Current window wape: 0.5014139
GLO: recover_future_X(): Recovery Loss(0/100000): 1.002367615699768
GLO: recover_future_X(): Recovery Loss(1000/100000): 0.628299355506897
GLO: recover_future_X(): Recovery Loss(2000/100000): 0.4282535910606384
GLO: recover_future_X(): Recovery Loss(3000/100000): 0.3461550176143646
GLO: recover_future_X(): Recovery Loss(4000/100000): 0.3201618790626526
GLO: recover_future_X(): Recovery Loss(5000/100000): 0.310817688703537
GLO: recover_future_X(): Recovery Loss(6000/100000): 0.3080223500728607
GLO: recover_future_X(): Recovery Loss(7000/100000): 0.3077664375305176
GLO: rolling_validation(): Current window wape: 0.45383096

In addition, Factorization Loss F, Factorization Loss X, Validation Loss ended at (0.214, 0.205, 0.294) level, early stopped, while Temporal Loss hovered around 0.017 level.

Training of Xseq and Yseq, (training loss, validation loss) progress down to (0.074, 0.052) and (0.054, 0.021) respectively.

	yc = self.predict_global(
	ind=ind,
	last_step=last_step,
	future=future,
	cpu=cpu,
	normalize=False,
	bsize=bsize,
	)
	if self.period is None:
	ycovs = np.zeros(shape=[yc.shape[0], 1, yc.shape[1]])
	if self.forward_cov:
	ycovs[:, 0, 0:-1] = yc[:, 1::]

rajatsen91 / deepglo Goto Github PK

deepglo's People

Contributors

Stargazers

Watchers

Forkers

deepglo's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs