tengdahan / dpc Goto Github PK

View Code? Open in Web Editor NEW

250.0 250.0 34.0 868 KB

Video Representation Learning by Dense Predictive Coding. Tengda Han, Weidi Xie, Andrew Zisserman.

License: MIT License

Python 100.00%

dpc's People

Contributors

Stargazers

Watchers

dpc's Issues

Link to pretrained model on UCF101 not working

Hi there,

It would be nice to fix the link (http://www.robots.ox.ac.uk/~htd/dpc/ucf101-rgb-128_resnet18_dpc.pth.tar) to the pretrained model on UCF101.

Thanks in advance and keep up the good work!

Pre-training on UCF101

Hi, I wonder if you can provide the model checkpoints for pre-training on UCF-101, and the pre-train hyperparameters/instruction of it.

Thank you so much!

Experimental details of Table 1?

Hi, Tengda.
I'm trying to reimplement your promising result using the small dataset UCF101. Can you provide the hyper-parameter setting of the Table 1 in your paper? Such as input size, training epochs, etc. Very thanks.

The accuracy of SOTA on UCF101 is more than 98%, why DPC is worse?

As far as I kown, the state of the art on UCF101 is more than 98%, for example I3D. Also two stream got more than 88%. But DPC got about 65% reported in the paper (even with supervised learning as these methods). What do I miss in the paper?

Pretrained Networks missing some layer values

While loading the pretrained networks for fine-tuning, it cannot find the running mean and variance for the batch normalization layers. Why are these not saved in the model that was exported? track_running_stats is true in resnet_2d3d?

Weights not loaded into new model:
module.backbone.bn1.running_mean
module.backbone.bn1.running_var
module.backbone.bn1.num_batches_tracked
module.backbone.layer1.0.bn1.running_mean
module.backbone.layer1.0.bn1.running_var
module.backbone.layer1.0.bn1.num_batches_tracked
module.backbone.layer1.0.bn2.running_mean
module.backbone.layer1.0.bn2.running_var
module.backbone.layer1.0.bn2.num_batches_tracked
module.backbone.layer1.1.bn1.running_mean
module.backbone.layer1.1.bn1.running_var
module.backbone.layer1.1.bn1.num_batches_tracked
module.backbone.layer1.1.bn2.running_mean
module.backbone.layer1.1.bn2.running_var
module.backbone.layer1.1.bn2.num_batches_tracked
module.backbone.layer2.0.bn1.running_mean

Not understanding why you take the last sequences and not the last samples of the sequences

in dpc/model_3d.py
feature_inf = feature_inf_all[:, N-self.pred_step::, :].contiguous()
N is supposed to be the number of sequences, don't we aim to predict the last samples of each sequence?

Context and hidden state

Hi, I'm a little confused regarding the notation in your paper and your code and hope you could provide some clarification. In your paper, the aggregation function has two outputs, one is c_t, and the other one is the hidden state. In this part of your code (in DPC/model_3d.py):

pred = []
for i in range(self.pred_step):
      # sequentially pred future
      p_tmp = self.network_pred(hidden)
      pred.append(p_tmp)
       _, hidden = self.agg(self.relu(p_tmp).unsqueeze(1), hidden.unsqueeze(0))
      hidden = hidden[:,-1,:]
pred = torch.stack(pred, 1) # B, pred_step, xxx
del hidden

where the variable hidden is fed into function phi(.). Then hidden and phi(hidden) are fed into the aggregation function. Does it mean c_t is just the hidden state? What might be the 'context representation' mentioned in the paper in that case?

Sanity check: Classification after Self-Supervised training

Hi Tengda, Thank you, for sharing the code and giving detailed supplementary material for this amazing work!

I am also working on a similar self-supervised task, I see my self-supervised task loss is converging, however, it is severely hurting the supervised classification performance in the earlier epochs (~20 epochs) and still performing worse by 5-6% than the supervised baseline at the same epoch. Does this mean my self-supervised task is not good or shall I wait to converge?

Let me know about your experience with UCF101 classification, how training loss and validation accuracy differs from the baseline at initial epochs? Also, please, tell me what performance improvement you get after removing shortcuts in your self-supervised task, is shortcut removal is super-mandatory or just a cherry on the cake?

Hope to hear from you soon.

-Ishan

How to compute the self-supervised acc?

About the Contrastive Loss

Hi @TengdaHan,
Thanks for great work,

I haven't found the NCE Loss in your source code, Could you please show me where the implement of NCE Loss :)

Dimensions [B, N, C, SL, W, H] of input block?

Hello. I don't understand what the letter SL stands for in the above? Is it stride length? And do B, N, and C stand for batch, sequence number, and channel?

Allow compatibility with latest torch (torch 18.1)

I've been running the codebase with one of the latest torch versions (18.1) with only 3 small compatibility problems coming up, per my usage.

dpc/main.py, lines 215 and 267:

target_flattened raises 2 errors:
- It is not automatically placed on cuda when created, which leads to an error when criterion() is called.
  => Solution: explicitly place on cuda before the call to criterion()
- It is a boolean tensor, and argmax() now raises an error with boolean tensors.
  => Solution: convert to int before calling argmax().

utils/utils.py, line 53:

.view(-1) in correct_k = correct[:k].view(-1).float().sum(0) raises a runtime error (message: view size is not compatible with input tensor's size and stride...)
=> Solution: either use .reshape(-1) or make a call to .contiguous() before using .view(-1).

The bugs are quite minor with easy solutions that shouldn't cause a problem for previous torch versions. I am happy to make a pull request if helpful!

Thanks! :)

Reproducing the paper's results on HMDB51/UCF101

Hi, I wonder which model (and setting) can we use to reproduce the result for HMDB51/UCF101? I used model 3D-ResNet18-Kinetics400-128x128 but only got 1% top-1 accuracy on HMDB51. Thanks for your help.

What do you mean by sequential prediction in your paper?

Hello. I read in your paper that you did an ablation study where you removed the sequential prediction and replaced it with parallel prediction where you predicted all three time steps with a different fully connected layer. I understand parallel prediction but I don't understand what you really mean by sequential prediction.

By sequential prediction do you mean that at every time step you use the same fully connected layer to do the prediction or that you use one fully connected layer at the final time step only?

pretrained weight files aren't able to decompress

I've tried to decompress the pretrained weights files using a few different methods - built-in macOS decompressor in Finder, terminal based tar utility, as well as the Unarchiver app on macOS. I've also tried using the CLI tar utility on Google Colab where I keep running into a similar error:

macOS tar CLI: tar: Error opening archive: Unrecognized archive format
On Google Colab CLI: tar: This does not look like a tar archive

I'm trying to use the UCF101 dataset and weights. Are these weights compressed using a special method? I'd appreciate any and all help with this.

Batch size per GPU

Hi,

I noticed in the implementation, if you specify batch_size as 256 and use 4 GPUs for training in parallel, effectively, the loss is the sum of 4 minibatches of size 64 (also mentioned in the paper). Could you please confirm this is correct?

If so, is there a specific reason to do this? Intuitively, we could re-collect all predicted features from all 256 samples, obtaining an N x N similarity matrix as opposed to N x (N/4) right (where N = B * pred_step * spatial_size**2)?

Thanks!

Using only one image at every time step instead of more than one?

Hello. I wanted to know whether you ran any experiments with just one image at every time step (so, seq_len is now 1 and not 5) instead of more than one. Actually, in my application (predicting the trajectory of cars in terms of bounding box coordinates based on a number of input frames), the consecutive frames are not so different and so I wanted to know whether extracting only spatial features at every time step (instead of also taking into account the temporal ones) will make some drastic difference in terms of using your network?

I understand that you used your network for action classification but I think the part where you train a predictor to give input to the decoder is very useful. Please let me know what you think about such a change in your network.

Tracking running statistics for BN

Why did you set tracking_running_stats to False for the BN in backbone?

Why are you shuffling the sequence of images?

I saw in your code that you shuffle your images around (main.py). Don't you want to preserve the sequence?

The necessary of train/validation split

Hi,

Thank you for your code sharing and issuing solving work, it really helps me a lot!

Moreover, I want to ask is that necessary to split data into train/val sets? In other word, is that enough to use the train loss or accuracy to decide the best trained model? Since many unsupervised algorithms e.g. simCLR do not use validation set to decide the best pretrained model. In the MoCo implementation, Kaiming even only uses the last epoch result as explained in here and here.

Thank you!

Training Time

How much time did it take to train the initial self supervised model using UCF101 and fine tuning again with UCF101 using 124*124 resolution?
I am having a plan to train it but availability of GPU for the required time is a constraint.

negative examples

Hi,
I don't understand where the negative samples are determined.
Can someone explain me?
Thanks

Code is very slow on Google Colab

Hi. Your code is very slow on Google Colab but on my personal laptop with a GTX1050ti it runs pretty fast. Below I am attaching a picture that shows your progress bar on Google Colab:

Optimizing dataloading

Hi there,

Thanks for sharing the code. When training DPC, I found that the data-loading process is taking very long time especially for large dataset. I'm wondering if you have any suggestions regarding how to optimize the dataloader in DPC?

Thanks!

gray scale performance

Hi, I find that you actually don't perform grayscale, but the channel split in .augmentation.py

Do you know the performances between these two kinds of augmentation methods? Thanks!

Possible reasons for loss function not going down

Hello. I am using your network for self-supervision representation on the kitti dataset. Even after 22 epochs the loss function and top1 accuracy barely change. What possible reasons could explain this?

tengdahan / dpc Goto Github PK

dpc's People

Contributors

Stargazers

Watchers

Forkers

dpc's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs