sjvasquez / web-traffic-forecasting Goto Github PK

View Code? Open in Web Editor NEW

660.0 30.0 239.0 920 KB

Kaggle | Web Traffic Forecasting 📈

Python 100.00%

time-series forecasting convolutional-neural-networks tensorflow

web-traffic-forecasting's People

Stargazers

Watchers

Forkers

wuqixiaobai puremath86 serignecisse awesome-python rspadim githubbayes lucius-yu benwu232 aihill plantsgo lyang24 anyuray rvaughan yunxileo roxw jeffstahler prob1995 mohsinkhn awasthimaddy ashishlal sunnymarkliu lancifollia joconnor-ml kzhoulatte vuongnm xuelun ptiwaree vgoklani pablomarin zhilangtaosha fujiyuu75 alonegu zs167275 johnpateha ab-be rickoclausen harisyammnv gourmentic selvamshan kylinlin aidsj leoleon506 hengqujushi simonsleo kesjien sdmhans ericperbos gustavocarita shaqbari 5up3rc zhiquanchen rabitw markedmondson1234 satadru5 kagglesolutions beatrice111 busizshen wllidr blueroutecn jerusalemsbell labssec cloudandml theobserverofone justinjm feng-1985 rahasayantan zeyu-h tony32769 chenxingqiang david931229 waldstein1983 linsamtw phil-u-u stevenlol jdoe68877 manqiaoyue ledata cnzjhdx huasanyelao jkhlot valeman ringwraith kwin-wang ahmed16 chou852ishare gerenuk shellsec letsdodatascience wangguangya60 samithaj esigh roushan2016 snowmasaya haha00gou xxyy1 yxhappy songquanwang antbean aigaosheng hijuly

web-traffic-forecasting's Issues

Decode Features

In the decode features, why are we passing the one hot encoded values of the categorical variables?

        self.decode_features = tf.concat([
            tf.one_hot(decode_idx, self.num_decode_steps),
            tf.tile(tf.reshape(self.log_x_encode_mean, (-1, 1, 1)), (1, self.num_decode_steps, 1)),
            tf.tile(tf.expand_dims(tf.one_hot(self.project, 9), 1), (1, self.num_decode_steps, 1)),
            tf.tile(tf.expand_dims(tf.one_hot(self.access, 3), 1), (1, self.num_decode_steps, 1)),
            tf.tile(tf.expand_dims(tf.one_hot(self.agent, 2), 1), (1, self.num_decode_steps, 1)),
        ], axis=2)

Two errors occurred while running cnn.py

For anaconda python 3.6 version:
1.
File "D:\Anaconda\lib\site-packages\tensorflow\python\framework\tensor_shape.py", line 32, in init
self._value = int(value)
TypeError: int() argument must be a string, a bytes-like object or a number, not 'Tensor'

File "D:\Anaconda\lib\site-packages\tensorflow\python\framework\tensor_util.py", line 302, in _AssertCompatible
(dtype.name, repr(mismatch), type(mismatch).name))
TypeError: Expected int32, got 1.0 of type 'float' instead.

What is the hierarchy of the codes/files in this repo?

Hi,
Is there anybody that can help me to figure out how can I run the repo codes in order? I cannot figure out the hierarchy of the codes/files in the repo that I can run them step by step to produce the results.
Thanks

Data folder is empty

The data folder does not contain train and test dataset or processed folder, and the train dataset from Kaggle is train_1 and train_2. How can we use these?

sequence smape loss function

zero_loss = 2.0*tf.ones_like(smape)
nonzero_loss = smape
smape = tf.where(tf.logical_or(tf.equal(y, 0.0), tf.equal(y_hat, 0.0)), zero_loss, nonzero_loss)

There is 'or' condition. What if y !=0.0 and y_hat=0.0. Sequence smape will still give value of zero loss.

It should be 'and' condition.

Code not running -Tensorflow gather_nd bounds problem

I was trying to get this code running on my local system --> I am facing this error-

Traceback (most recent call last):
File "cnn.py", line 414, in
nn.fit()
File "/Users/srikanthjammy/Documents/midterm/tf_base_model.py", line 142, in fit
feed_dict=val_feed_dict
File "/Users/srikanthjammy/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 895, in run
run_metadata_ptr)
File "/Users/srikanthjammy/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1124, in _run
feed_dict_tensor, options, run_metadata)
File "/Users/srikanthjammy/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1321, in _do_run
options, run_metadata)
File "/Users/srikanthjammy/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1340, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: flat indices[15493, :] = [121, -1] does not index into param (shape: [128,486,32]).
[[Node: GatherNd_23 = GatherNd[Tindices=DT_INT32, Tparams=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](add_24, stack_23)]]

Caused by op u'GatherNd_23', defined at:
File "cnn.py", line 412, in
num_decode_steps=64,
File "cnn.py", line 121, in init
super(cnn, self).init(**kwargs)
File "/Users/srikanthjammy/Documents/midterm/tf_base_model.py", line 99, in init
self.graph = self.build_graph()
File "/Users/srikanthjammy/Documents/midterm/tf_base_model.py", line 344, in build_graph
self.loss = self.calculate_loss()
File "cnn.py", line 366, in calculate_loss
y_hat_decode = self.decode(y_hat_encode, conv_inputs, features=self.decode_features)
File "cnn.py", line 265, in decode
slices = tf.reshape(tf.gather_nd(conv_input, idx), (batch_size, dilation, shape(conv_input, 2)))
File "/Users/srikanthjammy/.local/lib/python2.7/site-packages/tensorflow/python/ops/gen_array_ops.py", line 1338, in gather_nd
name=name)
File "/Users/srikanthjammy/.local/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
op_def=op_def)

On searching online, looks like it is a tensorflow bug -- tensorflow/tensorflow#12608
Did you face this issue?
I'm running the data on CPU btw.
versions of numpy, pandas, scikit and tensorflow are as you had mentioned.

line 342 in cnn.py

(next_finished, emit_output, state_queues) = loop_fn(time, initial_input, state_queues)
this code that call loop_fun with initial_input,so,I think the initial_input parameter is not update in all loop。can you explain this for me?

cnn.py line260 queue_begin_time = self.encode_len - dilation - 1

I think the code in this line should be self.encode_len - dilation . for example [0,1,2,3,4,5,6,7,8,9] dilation=4 idx=10-4=6 ,
slices = tf.reshape(tf.gather_nd(conv_input, idx), (batch_size, dilation, shape(conv_input, 2)))

should be [6,7,8,9] .(the last dilation of th seq).or you will loss the last day value

how to understand seperate parameters handling the accumulating ?

WaveNet was trained using next step prediction, so errors can accumulate as the model generates long sequences in the absence of conditioning information. To remedy this, we trained the model to minimize the loss when unraveled for 64 steps. We adopt a sequence to sequence approach where the encoder and decoder do not share parameters. This allows the decoder to handle the accumulating noise when generating long sequences.

above said that using seperate parameters the accumulating noise will not be a big issue, basically the encoder part still accumulating the noise then transfer to the decoder part. I think I may miss something for better understanding the picture, can you please tell us more about it ?

padding seems wrong

In the function temporal_convolution_layer.
shift = (kernel_size // 2) + (int(dilation_rate - 1) // 2)

In Keras and some other implementations. The equation is like this
shift = dilation_rate * (kernel_size - 1)

If it is wrong here, you may use some future information.

shift should plus 1?

    if causal:
        shift = int((convolution_width / 2) + (int(dilation_rate[0] - 1) / 2))
        pad = tf.zeros([tf.shape(inputs)[0], shift, inputs.shape.as_list()[2]])
        inputs = tf.concat([pad, inputs], axis=1)

shift may should plus 1

Always uses initial_input for loop_fn

Hi,

Thanks so much for sharing your perfect work. But I was confused in the decode part:

web-traffic-forecasting/cnn.py

Lines 342 to 349 in 6cb4a91

 def body(time, elements_finished, emit_ta, *state_queues): 

 (next_finished, emit_output, state_queues) = loop_fn(time, initial_input, state_queues) 

 emit = tf.where(elements_finished, tf.zeros_like(emit_output), emit_output) 

 emit_ta = emit_ta.write(time, emit) 

 elements_finished = tf.logical_or(elements_finished, next_finished) 

 return [time + 1, elements_finished, emit_ta] + list(state_queues)

In line 343, function loop_fn, always takes initial_input as the parameter current_input.

I wonder why we don't use previous prediction for loop_fn? Just likes:

def body(time, elements_finished, emit_ta, *state_queues):
    current_input = tf.cond(time == 0, initial_input, emit_ta.read(time - 1)
    (next_finished, emit_output, state_queues) = loop_fn(time, current_input, state_queues)
    ...

During training, train loss and validation loss became nan, does this matter?

When I ran cnn.py, during training, train loss and validation loss became nan after step 50, is this normal? I wonder why losses remains nan……

	def body(time, elements_finished, emit_ta, *state_queues):
	(next_finished, emit_output, state_queues) = loop_fn(time, initial_input, state_queues)

	emit = tf.where(elements_finished, tf.zeros_like(emit_output), emit_output)
	emit_ta = emit_ta.write(time, emit)

	elements_finished = tf.logical_or(elements_finished, next_finished)
	return [time + 1, elements_finished, emit_ta] + list(state_queues)

sjvasquez / web-traffic-forecasting Goto Github PK

web-traffic-forecasting's People

Stargazers

Watchers

Forkers

web-traffic-forecasting's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs