tengdahan / dpc Goto Github PK
View Code? Open in Web Editor NEWVideo Representation Learning by Dense Predictive Coding. Tengda Han, Weidi Xie, Andrew Zisserman.
License: MIT License
Video Representation Learning by Dense Predictive Coding. Tengda Han, Weidi Xie, Andrew Zisserman.
License: MIT License
Hi there,
It would be nice to fix the link (http://www.robots.ox.ac.uk/~htd/dpc/ucf101-rgb-128_resnet18_dpc.pth.tar) to the pretrained model on UCF101.
Thanks in advance and keep up the good work!
Hi, I wonder if you can provide the model checkpoints for pre-training on UCF-101, and the pre-train hyperparameters/instruction of it.
Thank you so much!
Hi, Tengda.
I'm trying to reimplement your promising result using the small dataset UCF101. Can you provide the hyper-parameter setting of the Table 1 in your paper? Such as input size, training epochs, etc. Very thanks.
As far as I kown, the state of the art on UCF101 is more than 98%, for example I3D. Also two stream got more than 88%. But DPC got about 65% reported in the paper (even with supervised learning as these methods). What do I miss in the paper?
While loading the pretrained networks for fine-tuning, it cannot find the running mean and variance for the batch normalization layers. Why are these not saved in the model that was exported? track_running_stats is true in resnet_2d3d?
Weights not loaded into new model:
module.backbone.bn1.running_mean
module.backbone.bn1.running_var
module.backbone.bn1.num_batches_tracked
module.backbone.layer1.0.bn1.running_mean
module.backbone.layer1.0.bn1.running_var
module.backbone.layer1.0.bn1.num_batches_tracked
module.backbone.layer1.0.bn2.running_mean
module.backbone.layer1.0.bn2.running_var
module.backbone.layer1.0.bn2.num_batches_tracked
module.backbone.layer1.1.bn1.running_mean
module.backbone.layer1.1.bn1.running_var
module.backbone.layer1.1.bn1.num_batches_tracked
module.backbone.layer1.1.bn2.running_mean
module.backbone.layer1.1.bn2.running_var
module.backbone.layer1.1.bn2.num_batches_tracked
module.backbone.layer2.0.bn1.running_mean
in dpc/model_3d.py
feature_inf = feature_inf_all[:, N-self.pred_step::, :].contiguous()
N is supposed to be the number of sequences, don't we aim to predict the last samples of each sequence?
Hi, I'm a little confused regarding the notation in your paper and your code and hope you could provide some clarification. In your paper, the aggregation function has two outputs, one is c_t, and the other one is the hidden state. In this part of your code (in DPC/model_3d.py):
pred = []
for i in range(self.pred_step):
# sequentially pred future
p_tmp = self.network_pred(hidden)
pred.append(p_tmp)
_, hidden = self.agg(self.relu(p_tmp).unsqueeze(1), hidden.unsqueeze(0))
hidden = hidden[:,-1,:]
pred = torch.stack(pred, 1) # B, pred_step, xxx
del hidden
where the variable hidden
is fed into function phi(.). Then hidden
and phi(hidden)
are fed into the aggregation function. Does it mean c_t is just the hidden state? What might be the 'context representation' mentioned in the paper in that case?
Hi Tengda, Thank you, for sharing the code and giving detailed supplementary material for this amazing work!
I am also working on a similar self-supervised task, I see my self-supervised task loss is converging, however, it is severely hurting the supervised classification performance in the earlier epochs (~20 epochs) and still performing worse by 5-6% than the supervised baseline at the same epoch. Does this mean my self-supervised task is not good or shall I wait to converge?
Let me know about your experience with UCF101 classification, how training loss and validation accuracy differs from the baseline at initial epochs? Also, please, tell me what performance improvement you get after removing shortcuts in your self-supervised task, is shortcut removal is super-mandatory or just a cherry on the cake?
Hope to hear from you soon.
-Ishan
Hi @TengdaHan,
Thanks for great work,
I haven't found the NCE Loss in your source code, Could you please show me where the implement of NCE Loss :)
Hello. I don't understand what the letter SL stands for in the above? Is it stride length? And do B, N, and C stand for batch, sequence number, and channel?
I've been running the codebase with one of the latest torch versions (18.1) with only 3 small compatibility problems coming up, per my usage.
dpc/main.py
, lines 215 and 267:
target_flattened
raises 2 errors:
cuda
when created, which leads to an error when criterion()
is called.criterion()
boolean
tensor, and argmax()
now raises an error with boolean tensors.int
before calling argmax()
.utils/utils.py
, line 53:
.view(-1)
in correct_k = correct[:k].view(-1).float().sum(0)
raises a runtime error (message: view size is not compatible with input tensor's size and stride...).reshape(-1)
or make a call to .contiguous()
before using .view(-1)
.The bugs are quite minor with easy solutions that shouldn't cause a problem for previous torch versions. I am happy to make a pull request if helpful!
Thanks! :)
Hi, I wonder which model (and setting) can we use to reproduce the result for HMDB51/UCF101? I used model 3D-ResNet18-Kinetics400-128x128 but only got 1% top-1 accuracy on HMDB51. Thanks for your help.
Hello. I read in your paper that you did an ablation study where you removed the sequential prediction and replaced it with parallel prediction where you predicted all three time steps with a different fully connected layer. I understand parallel prediction but I don't understand what you really mean by sequential prediction.
By sequential prediction do you mean that at every time step you use the same fully connected layer to do the prediction or that you use one fully connected layer at the final time step only?
I've tried to decompress the pretrained weights files using a few different methods - built-in macOS decompressor in Finder, terminal based tar utility, as well as the Unarchiver app on macOS. I've also tried using the CLI tar utility on Google Colab where I keep running into a similar error:
macOS tar CLI: tar: Error opening archive: Unrecognized archive format
On Google Colab CLI: tar: This does not look like a tar archive
I'm trying to use the UCF101 dataset and weights. Are these weights compressed using a special method? I'd appreciate any and all help with this.
Hi,
I noticed in the implementation, if you specify batch_size as 256 and use 4 GPUs for training in parallel, effectively, the loss is the sum of 4 minibatches of size 64 (also mentioned in the paper). Could you please confirm this is correct?
If so, is there a specific reason to do this? Intuitively, we could re-collect all predicted features from all 256 samples, obtaining an N x N similarity matrix as opposed to N x (N/4) right (where N = B * pred_step * spatial_size**2)?
Thanks!
Hello. I wanted to know whether you ran any experiments with just one image at every time step (so, seq_len is now 1 and not 5) instead of more than one. Actually, in my application (predicting the trajectory of cars in terms of bounding box coordinates based on a number of input frames), the consecutive frames are not so different and so I wanted to know whether extracting only spatial features at every time step (instead of also taking into account the temporal ones) will make some drastic difference in terms of using your network?
I understand that you used your network for action classification but I think the part where you train a predictor to give input to the decoder is very useful. Please let me know what you think about such a change in your network.
Why did you set tracking_running_stats to False for the BN in backbone?
I saw in your code that you shuffle your images around (main.py). Don't you want to preserve the sequence?
Hi,
Thank you for your code sharing and issuing solving work, it really helps me a lot!
Moreover, I want to ask is that necessary to split data into train/val sets? In other word, is that enough to use the train loss or accuracy to decide the best trained model? Since many unsupervised algorithms e.g. simCLR do not use validation set to decide the best pretrained model. In the MoCo implementation, Kaiming even only uses the last epoch result as explained in here and here.
Thank you!
How much time did it take to train the initial self supervised model using UCF101 and fine tuning again with UCF101 using 124*124 resolution?
I am having a plan to train it but availability of GPU for the required time is a constraint.
Hi,
I don't understand where the negative samples are determined.
Can someone explain me?
Thanks
Hi there,
Thanks for sharing the code. When training DPC, I found that the data-loading process is taking very long time especially for large dataset. I'm wondering if you have any suggestions regarding how to optimize the dataloader in DPC?
Thanks!
Hi, I find that you actually don't perform grayscale, but the channel split in .augmentation.py
Do you know the performances between these two kinds of augmentation methods? Thanks!
Hello. I am using your network for self-supervision representation on the kitti dataset. Even after 22 epochs the loss function and top1 accuracy barely change. What possible reasons could explain this?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.