Comments (6)
- Use can think about pixels in MNIST as probabilities so cross entropy loss measures the distance between predicted and ground-truth probability distributions. You can try using MSE for MNIST not sure how it is going to work :)
- You are right. This paper http://arxiv.org/abs/1506.03099 addresses what you are saying.
- 10 mil steps is an arbitrary number. Don't remember exact number of steps we used but it converged fast.
from unsupervised-videos.
Regarding 2:
I'm right with what? To train directly on previously predicted frames, not ground truth frames?
Because by looking in your code, you are using ground truth frames in training. See lines 79-89 in lstm_combo.py:
# Fprop through future predictor.
for t in xrange(self.future_seq_length_):
this_init_state = init_state if t == 0 else []
if self.is_conditional_fut_ and t > 0:
if train:
t2 = self.enc_seq_length_ + t - 1
input_frame=self.v_.col_slice(t2 * self.num_dims_, (t2+1) * self.num_dims_)
else:
# Instead of conditioning on true frame, condition on the generated frame at the test time
t2 = t - 1
input_frame=self.v_fut_.col_slice(t2 * self.num_dims_, (t2+1) * self.num_dims_)
if self.binary_data_:
input_frame.apply_sigmoid()
elif self.relu_data_:
input_frame.lower_bound(0)
else:
input_frame = None
self.lstm_stack_fut_.Fprop(input_frame=input_frame, init_state=this_init_state,
output_frame=self.v_fut_.col_slice(t * self.num_dims_, (t+1) * self.num_dims_), copy_init_state=self.future_copy_init_state_)
In the paper, you are writing on page 6:
Next, we change the future predictor by making it conditional. We can see that this model makes sharper predictions.
But there is no hint if it conditions on ground truth frames, or previously predicted frames.
EDIT
I implemented a network similar to yours to predict future frames (without the reconstruction branch, consequenctly no combo-model. Additinally, I'm using LSTMConv2D cells without peephole connections and squared error as loss function) in TensorFlow. I'm getting kind of the same results, that when I condition on the ground truth frame during training, it looks like learning no motion at all. But it works quite good when I condition on the previously predicted frame during training.
Check out these two videos:
videos.tar.gz
My personal guess for this is that when we train on ground truth frames, the network is only trained on sharp edges, because all images in MovingMNIST have a high contrast and sharp edges. When we validate/test this model, the first predicted image looks very good and only is a little bit blurry. But starting from here, the future predictor is getting inputs of blurry images that it has never seen before. Hence, it can not predict these frame correctly.
In contrast, when we train the model using previously predicted frames even while training the model, it also learns how to handle and predict based on blurry input images.
What do you think about that?
from unsupervised-videos.
But there is no hint if it conditions on ground truth frames, or previously predicted frames.
As far as I remember we conditioned on ground truth frames. Yeah difference between distribution of ground-truth and predicted frames is causing this issue. I also suggest at the beginning of training to condition on ground-truth and then slowly change them to previously predicted frames as in http://arxiv.org/abs/1506.03099
Btw, how far in the future are you predicting ? It looks like more that 10 frames.
from unsupervised-videos.
Btw, how far in the future are you predicting ? It looks like more that 10 frames.
During training, I predicted 10 frames using 1-layer LSTMConv2D cells. I trained for 50k iterations and the batch size was I guess 24 on each of the 4 Titan X GPUs. So an effective batch size of 96.
After the model more or less converged, I created this video using the Test-set and enlarged to future predictor to 50 frames, just to see how it behaves after it's learned range of 50 frames.
I think you are doing the same on your (old) website and predict into the future for a very long time (100?):
http://www.cs.toronto.edu/~nitish/unsupervised_video/ (The gif in the top)
I'm just trying to get kind of the same results as yours, but can not reproduce it with my own model in TensorFlow, as well as with your code.
I'll try a another run of your code, and in case the validation set is not converging again, I can post a screenshot right here...
Last but not least: Thank you for your time! :)
Best regards from Munich
from unsupervised-videos.
As promised, here is the screenshot:
The screenshot was taken after 40k iterations. I used the 1layer combo model. All params are unchanged and as in the repository.
As you kan see, the validation loss is 2600+. I know 40k might not be enough training, but the last time I ran the code with about 650k iterations, the loss was about 2595. There it seams to get stuck somehow.
Edit:
Another one after 114.5k iterations:
from unsupervised-videos.
Yeah difference between distribution of ground-truth and predicted frames is causing this issue. I also suggest at the beginning of training to condition on ground-truth and then slowly change them to previously predicted frames as in http://arxiv.org/abs/1506.03099
Thank you so much for suggesting this paper. I just read it, at this is exactly what I was looking for and is highly valuable for my thesis! :)
from unsupervised-videos.
Related Issues (20)
- CUDAMatException HOT 2
- lstm classifier examples
- CUBLAS error HOT 3
- invalid device function{cm.CUDAMatrix.init_random(42)} cudamat.cudamat.CUDAMatException: CUDA error: no error HOT 10
- Weights for frame prediction used in the paper HOT 5
- KeyError: "Unable to open object (Object 'lstm_1_enc:w_dense' doesn't exist)"
- Data format HOT 1
- no eps decay? HOT 1
- 1 input -> next predicted output
- Questions about LSTM_classifier HOT 1
- no kernel image is available for execution on the device HOT 2
- runtest failed:[runtest] segmentation fault (core dumped)
- Error while giving the command for training HOT 6
- Extrpolating matrices
- Training with new dataset HOT 1
- Make the file Makefile in the folder cudamat HOT 2
- Request for the script to generate moving mnist video dataset HOT 1
- Videos
- -![image](https://user-images.githubusercontent.com/101527858/159179877-0df83b60-09df-4a31-834b-d59002aba969.jpeg)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from unsupervised-videos.