GithubHelp home page GithubHelp logo

convlstm-for-caffe's People

Contributors

agethen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

convlstm-for-caffe's Issues

the rusult"output" is error.

hi agethen:
i use your "encode-decode.prototxt", max_iter: 20000. the train image size is [20,16,16,16,16],the num is 625. the 20000' cross_entropy_loss is 9806.
when i check the output ,i find the output image is error.
the input data range is [0,1] ,but the output data range is [-35.96~1.91].
the output range should also be[0,1]?

How to use ConvLSTM after Conv layers?

Hi,

I wonder can I use the outputs of several Conv layers (e.g., T Conv layers) as the input of this ConvLSTM layer (here, T can be regarded as the time-steps)? In such case, do I need to prepare other inputs for the ConvLSTM layer? In your example, there are lots of inputs, which seems very complex.

Any suggestions would be appreciated. Thanks!

how to generate the"data/train.txt"

hi agethen:
I have some difficult in generating the "data/train.txt" in "encode-decode.prototxt"。what does "data/train.txt"contain.and how to generate it . i already have many mnist data format .gif(64*64)。
i think train.txt maybe contains train1.h5 train2.h5 train3.h5. right or not ?
thank you so much!

Incorrect gradient backpropagation?

A user reported that gradients to underlying layers appear to be all zero. Two possible reasons may be:

  • are the convolutional layers inside ConvLSTM sharing their gradients correctly? (backpropagation enabled)
  • Is there a general bug in recurrent_layer.cpp (which shares data and gradients with the outside network)

screenshot 2016-10-26 22 28 03

Gradient backpropagation in Encoder-Decoder model

Due to limitations in Jeff Donahue's recurrent_layer.cpp code, we cannot backpropagate a gradient via blobs created by 'expose_hidden'.

Possible solution:

  • Add h_T and c_T as actual output blobs to ConvLSTM layer

Some question about Sequence markers.

Thank you for sharing your work, and I have some questiones about the sequence markers.

According to build_hdf5.py, the "sequence" should has the shape T*N, where T =20 and N =16. But in the prototxt, "batch_size: 19 # T + (T-1)", so why is not 20?

And after "slice_seq" layer, the "seq_enc" has the shape 10*16, which the first row is 0 and the other is 1, and the "seq_dec" is all 1, right? So what is the shape of "seq_dec"?

When testing, what the squence markers should be?

Any suggestions would be appreciated. Thanks!

Error in blob.cpp (Syncedmem not initialized)

Hello,

I was able to successfully build the latest version of caffe with convLSTM layer. I copied the src and include files, modified the caffe.proto file and made the required modifications in the makefile. I am using cuda-8.0.

When I run my model (sort of an AlexNet version of the model described in this paper), the test phase runs okay and so does the forward run of the first train iteration. During the backward path I get the following error:

I0915 18:07:56.650460 1749 solver.cpp:397] Test net output #0: accuracy = 0.124847
I0915 18:07:56.650481 1749 solver.cpp:397] Test net output #1: loss = 1.9918 (* 1 = 1.9918 loss)
I0915 18:08:19.280226 1749 solver.cpp:218] Iteration 0 (0 iter/s, 88.3729s/10 iters), loss = 0.555055
I0915 18:08:19.280261 1749 solver.cpp:237] Train net output #0: loss = 0 (* 1 = 0 loss)
I0915 18:08:19.280269 1749 sgd_solver.cpp:105] Iteration 0, lr = 0.0001
F0915 18:08:19.288806 1749 blob.cpp:195] Syncedmem not initialized.
*** Check failure stack trace: ***
@ 0x7fafacbbc5cd google::LogMessage::Fail()
@ 0x7fafacbbe433 google::LogMessage::SendToLog()
@ 0x7fafacbbc15b google::LogMessage::Flush()
@ 0x7fafacbbee1e google::LogMessageFatal::~LogMessageFatal()
@ 0x7fafad38212b caffe::Blob<>::Update()
@ 0x7fafad1e7d95 caffe::Net<>::Update()
@ 0x7fafad33feee caffe::SGDSolver<>::ApplyUpdate()
@ 0x7fafad395ed6 caffe::Solver<>::Step()
@ 0x7fafad39694a caffe::Solver<>::Solve()
@ 0x40abf9 train()
@ 0x40743e main
@ 0x7fafac1cc830 __libc_start_main
@ 0x407c99 _start
@ (nil) (unknown)

Any idea on what exactly is causing the problem?

Computer Crashes after 5 Hours of Training ConvLSTM Network

I am training a network with convolutional LSTM cells. After about 5 hours of training, my computer crashes. It has happened twice. The first time the error said that GPU ID 0 could not be found, which was the GPU I was training on. The second time, the error was 'packet_write_wait : broken pipe'. Does this have to do with a bug in the convolutional LSTM source code?

How to take 2 timesteps as input

Hi, agethen.
Thanks for your work! I am trying to re-implement Polygon-RNN, a net takes two frames and two states before as input. The RNN part looks like below.
c9562f76-4e76-2eb3-450f-50f81e945fbc
I have totally no idea how to write the prototxt. Could you please give some advice? Any suggestion would be appreciate. Thanks a lot!

Still have problems with preparing dataset?

I followed #4 to prepare Moving-mnist dataset.
This is my code to generate .h5 file for training.

import numpy
import h5py

# Load data
file = numpy.load( "moving-mnist-train.npz" )
# file = numpy.load( "moving-mnist-test.npz" )

# Select data field
data = file['input_raw_data']

# As you can see, the shape is N*T x C x H x W, so we need to change that
print data.shape

# Reshape to N x T x C x H x W
tmp = numpy.reshape( data, (data.shape[0]/20, 20, 1, 64, 64) )

# Swap T and N
res = numpy.swapaxes( tmp, 0, 1 )
i = 0
for idx in range( 0, res.shape[1], 16):
  print "File ", i
  # Pick the n-th item along N axis (while keeping shape)
  datum_fir = res[0:10,idx:idx+16]

  # Open a file handle
  h5file = h5py.File( "train/file_" + str( i ).zfill(5) + ".h5", 'w' )
  # h5file = h5py.File( "test/file_" + str( i ).zfill(5) + ".h5", 'w' )

  # Create a dataset with name "data"
  input_data = h5file.create_dataset( "input", shape = datum_fir.shape, dtype = numpy.float32 )
  # Copy data
  input_data[:] = datum_fir
  
  datum_sec = res[10:20,idx+16:idx+32]
  # Create a dataset with name "data"
  match_data = h5file.create_dataset( "match", shape = datum_sec.shape, dtype = numpy.float32 )
  # Copy data
  match_data[:] = datum_sec

  i += 1
  # Close file
  h5file.close()

print "Done!"

Assuming T=20(10 for encode, 10 for decode), N=16(16 indeodent samples), the size of input and match should be 10x16x16x16x16?

This is part of error information:

I1213 00:08:17.193084 30426 net.cpp:217] h_t=1_encode1_unit_t=1_1_split needs backward computation.
I1213 00:08:17.193086 30426 net.cpp:217] c_t=1_encode1_unit_t=1_0_split needs backward computation.
I1213 00:08:17.193089 30426 net.cpp:217] encode1_unit_t=1 needs backward computation.
I1213 00:08:17.193091 30426 net.cpp:217] encode1_gate_input_1 needs backward computation.
I1213 00:08:17.193094 30426 net.cpp:217] encode1_concat_hadamard_t=1 needs backward computation.
I1213 00:08:17.193099 30426 net.cpp:219] encode1_hadamard_gat_t=1 does not need backward computation.
I1213 00:08:17.193104 30426 net.cpp:217] encode1_hadamard->output_t=0 needs backward computation.
I1213 00:08:17.193106 30426 net.cpp:217] encode1_hadamard->forget_t=0 needs backward computation.
I1213 00:08:17.193109 30426 net.cpp:217] encode1_hadamard->input_t=0 needs backward computation.
I1213 00:08:17.193111 30426 net.cpp:217] encode1_hidden->transform->0 needs backward computation.
I1213 00:08:17.193114 30426 net.cpp:217] encode1_h_conted_t=0 needs backward computation.
I1213 00:08:17.193116 30426 net.cpp:217] encode1_dummy_forward_h0 needs backward computation.
I1213 00:08:17.193120 30426 net.cpp:217] c_t=0_encode1_dummy_forward_c0_0_split needs backward computation.
I1213 00:08:17.193121 30426 net.cpp:217] encode1_dummy_forward_c0 needs backward computation.
I1213 00:08:17.193125 30426 net.cpp:219] cont_t=10_encode1_cont_slice_9_split does not need backward computation.
I1213 00:08:17.193127 30426 net.cpp:219] cont_t=9_encode1_cont_slice_8_split does not need backward computation.
I1213 00:08:17.193130 30426 net.cpp:219] cont_t=8_encode1_cont_slice_7_split does not need backward computation.
I1213 00:08:17.193132 30426 net.cpp:219] cont_t=7_encode1_cont_slice_6_split does not need backward computation.
I1213 00:08:17.193135 30426 net.cpp:219] cont_t=6_encode1_cont_slice_5_split does not need backward computation.
I1213 00:08:17.193137 30426 net.cpp:219] cont_t=5_encode1_cont_slice_4_split does not need backward computation.
I1213 00:08:17.193140 30426 net.cpp:219] cont_t=4_encode1_cont_slice_3_split does not need backward computation.
I1213 00:08:17.193143 30426 net.cpp:219] cont_t=3_encode1_cont_slice_2_split does not need backward computation.
I1213 00:08:17.193145 30426 net.cpp:219] cont_t=2_encode1_cont_slice_1_split does not need backward computation.
I1213 00:08:17.193148 30426 net.cpp:219] cont_t=1_encode1_cont_slice_0_split does not need backward computation.
I1213 00:08:17.193152 30426 net.cpp:219] encode1_cont_slice does not need backward computation.
I1213 00:08:17.193155 30426 net.cpp:217] encode1_W_xc_x_slice needs backward computation.
I1213 00:08:17.193157 30426 net.cpp:219] encode1_input->cell_hidden does not need backward computation.
I1213 00:08:17.193159 30426 net.cpp:217] encode1_x->transform needs backward computation.
I1213 00:08:17.193161 30426 net.cpp:219] encode1_ does not need backward computation.
I1213 00:08:17.193163 30426 net.cpp:261] This network produces output c_t=T_pseudoloss
I1213 00:08:17.193167 30426 net.cpp:261] This network produces output h_pseudoloss
I1213 00:08:17.193171 30426 net.cpp:261] This network produces output h_t=T_pseudoloss
I1213 00:08:17.214619 30426 net.cpp:274] Network initialization done.
I1213 00:08:17.214865 30426 recurrent_layer.cpp:150] Adding parameter 0: x_transform
I1213 00:08:17.214870 30426 recurrent_layer.cpp:150] Adding parameter 1: 0
I1213 00:08:17.214872 30426 recurrent_layer.cpp:150] Adding parameter 2: 0
I1213 00:08:17.214874 30426 recurrent_layer.cpp:150] Adding parameter 3: h->transform
I1213 00:08:17.214876 30426 recurrent_layer.cpp:150] Adding parameter 4: h->transform_bias
I1213 00:08:17.214879 30426 recurrent_layer.cpp:150] Adding parameter 5: hadamard.input
I1213 00:08:17.214879 30426 recurrent_layer.cpp:150] Adding parameter 6: hadamard.forget
I1213 00:08:17.214881 30426 recurrent_layer.cpp:150] Adding parameter 7: hadamard.output
I1213 00:08:17.214884 30426 recurrent_layer.cpp:150] Adding parameter 53: 0
I1213 00:08:17.214885 30426 recurrent_layer.cpp:150] Adding parameter 54: 0
F1213 00:08:17.326793 30426 recurrent_layer.cpp:216] Check failed: recur_input_blobs_[j]->shape() == bottom[i]->shape() bottom[2] shape must match hidden state input shape: 1 16 256 64 64 (16777216)
*** Check failure stack trace: ***
    @     0x7fc7555375cd  google::LogMessage::Fail()
    @     0x7fc755539433  google::LogMessage::SendToLog()
    @     0x7fc75553715b  google::LogMessage::Flush()
    @     0x7fc755539e1e  google::LogMessageFatal::~LogMessageFatal()
    @     0x7fc755ba71d0  caffe::RecurrentLayer<>::Reshape()
    @     0x7fc755c7665c  caffe::ConvLSTMLayer<>::Reshape()
    @     0x7fc755ced11e  caffe::Net<>::Init()
    @     0x7fc755ceec41  caffe::Net<>::Net()
    @     0x7fc755caed4a  caffe::Solver<>::InitTrainNet()
    @     0x7fc755caf2b7  caffe::Solver<>::Init()
    @     0x7fc755caf64a  caffe::Solver<>::Solver()
    @     0x7fc755ccbb29  caffe::Creator_RMSPropSolver<>()
    @           0x40ad89  train()
    @           0x407704  main
    @     0x7fc753feb830  __libc_start_main
    @           0x407eb9  _start
    @              (nil)  (unknown)
Aborted (core dumped)

the output result

Hi,agethen:
I use your model to train the moving-mnist datasets. After training, I tested the model ,I plot the 2D Image using the blob "out_sigm" ,but the result is bad .
should I increase the convLstm layer( in Shixingjian's paper ,he used three ConvLstm layers)?
or thers is something I did wrong? Could you give me some advice? I would really appreciate that ! thank you very much!
1

2
3

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.