GithubHelp home page GithubHelp logo

tengdahan / dpc Goto Github PK

View Code? Open in Web Editor NEW
250.0 250.0 34.0 868 KB

Video Representation Learning by Dense Predictive Coding. Tengda Han, Weidi Xie, Andrew Zisserman.

License: MIT License

Python 100.00%

dpc's People

Contributors

colleenjg avatar tengdahan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dpc's Issues

Pre-training on UCF101

Hi, I wonder if you can provide the model checkpoints for pre-training on UCF-101, and the pre-train hyperparameters/instruction of it.

Thank you so much!

Experimental details of Table 1?

Hi, Tengda.
I'm trying to reimplement your promising result using the small dataset UCF101. Can you provide the hyper-parameter setting of the Table 1 in your paper? Such as input size, training epochs, etc. Very thanks.

Pretrained Networks missing some layer values

While loading the pretrained networks for fine-tuning, it cannot find the running mean and variance for the batch normalization layers. Why are these not saved in the model that was exported? track_running_stats is true in resnet_2d3d?

Weights not loaded into new model:
module.backbone.bn1.running_mean
module.backbone.bn1.running_var
module.backbone.bn1.num_batches_tracked
module.backbone.layer1.0.bn1.running_mean
module.backbone.layer1.0.bn1.running_var
module.backbone.layer1.0.bn1.num_batches_tracked
module.backbone.layer1.0.bn2.running_mean
module.backbone.layer1.0.bn2.running_var
module.backbone.layer1.0.bn2.num_batches_tracked
module.backbone.layer1.1.bn1.running_mean
module.backbone.layer1.1.bn1.running_var
module.backbone.layer1.1.bn1.num_batches_tracked
module.backbone.layer1.1.bn2.running_mean
module.backbone.layer1.1.bn2.running_var
module.backbone.layer1.1.bn2.num_batches_tracked
module.backbone.layer2.0.bn1.running_mean

Context and hidden state

Hi, I'm a little confused regarding the notation in your paper and your code and hope you could provide some clarification. In your paper, the aggregation function has two outputs, one is c_t, and the other one is the hidden state. In this part of your code (in DPC/model_3d.py):

pred = []
for i in range(self.pred_step):
      # sequentially pred future
      p_tmp = self.network_pred(hidden)
      pred.append(p_tmp)
       _, hidden = self.agg(self.relu(p_tmp).unsqueeze(1), hidden.unsqueeze(0))
      hidden = hidden[:,-1,:]
pred = torch.stack(pred, 1) # B, pred_step, xxx
del hidden

where the variable hidden is fed into function phi(.). Then hidden and phi(hidden) are fed into the aggregation function. Does it mean c_t is just the hidden state? What might be the 'context representation' mentioned in the paper in that case?

Sanity check: Classification after Self-Supervised training

Hi Tengda, Thank you, for sharing the code and giving detailed supplementary material for this amazing work!

I am also working on a similar self-supervised task, I see my self-supervised task loss is converging, however, it is severely hurting the supervised classification performance in the earlier epochs (~20 epochs) and still performing worse by 5-6% than the supervised baseline at the same epoch. Does this mean my self-supervised task is not good or shall I wait to converge?

Let me know about your experience with UCF101 classification, how training loss and validation accuracy differs from the baseline at initial epochs? Also, please, tell me what performance improvement you get after removing shortcuts in your self-supervised task, is shortcut removal is super-mandatory or just a cherry on the cake?

Hope to hear from you soon.

-Ishan

Allow compatibility with latest torch (torch 18.1)

I've been running the codebase with one of the latest torch versions (18.1) with only 3 small compatibility problems coming up, per my usage.

dpc/main.py, lines 215 and 267:

  • target_flattened raises 2 errors:
    • It is not automatically placed on cuda when created, which leads to an error when criterion() is called.
      => Solution: explicitly place on cuda before the call to criterion()
    • It is a boolean tensor, and argmax() now raises an error with boolean tensors.
      => Solution: convert to int before calling argmax().

utils/utils.py, line 53:

  • .view(-1) in correct_k = correct[:k].view(-1).float().sum(0) raises a runtime error (message: view size is not compatible with input tensor's size and stride...)
    => Solution: either use .reshape(-1) or make a call to .contiguous() before using .view(-1).

The bugs are quite minor with easy solutions that shouldn't cause a problem for previous torch versions. I am happy to make a pull request if helpful!

Thanks! :)

What do you mean by sequential prediction in your paper?

Hello. I read in your paper that you did an ablation study where you removed the sequential prediction and replaced it with parallel prediction where you predicted all three time steps with a different fully connected layer. I understand parallel prediction but I don't understand what you really mean by sequential prediction.

By sequential prediction do you mean that at every time step you use the same fully connected layer to do the prediction or that you use one fully connected layer at the final time step only?

pretrained weight files aren't able to decompress

I've tried to decompress the pretrained weights files using a few different methods - built-in macOS decompressor in Finder, terminal based tar utility, as well as the Unarchiver app on macOS. I've also tried using the CLI tar utility on Google Colab where I keep running into a similar error:

macOS tar CLI: tar: Error opening archive: Unrecognized archive format
On Google Colab CLI: tar: This does not look like a tar archive

I'm trying to use the UCF101 dataset and weights. Are these weights compressed using a special method? I'd appreciate any and all help with this.

Batch size per GPU

Hi,

I noticed in the implementation, if you specify batch_size as 256 and use 4 GPUs for training in parallel, effectively, the loss is the sum of 4 minibatches of size 64 (also mentioned in the paper). Could you please confirm this is correct?

If so, is there a specific reason to do this? Intuitively, we could re-collect all predicted features from all 256 samples, obtaining an N x N similarity matrix as opposed to N x (N/4) right (where N = B * pred_step * spatial_size**2)?

Thanks!

Using only one image at every time step instead of more than one?

Hello. I wanted to know whether you ran any experiments with just one image at every time step (so, seq_len is now 1 and not 5) instead of more than one. Actually, in my application (predicting the trajectory of cars in terms of bounding box coordinates based on a number of input frames), the consecutive frames are not so different and so I wanted to know whether extracting only spatial features at every time step (instead of also taking into account the temporal ones) will make some drastic difference in terms of using your network?

I understand that you used your network for action classification but I think the part where you train a predictor to give input to the decoder is very useful. Please let me know what you think about such a change in your network.

The necessary of train/validation split

Hi,

Thank you for your code sharing and issuing solving work, it really helps me a lot!

Moreover, I want to ask is that necessary to split data into train/val sets? In other word, is that enough to use the train loss or accuracy to decide the best trained model? Since many unsupervised algorithms e.g. simCLR do not use validation set to decide the best pretrained model. In the MoCo implementation, Kaiming even only uses the last epoch result as explained in here and here.

Thank you!

Training Time

How much time did it take to train the initial self supervised model using UCF101 and fine tuning again with UCF101 using 124*124 resolution?
I am having a plan to train it but availability of GPU for the required time is a constraint.

negative examples

Hi,
I don't understand where the negative samples are determined.
Can someone explain me?
Thanks

Code is very slow on Google Colab

Hi. Your code is very slow on Google Colab but on my personal laptop with a GTX1050ti it runs pretty fast. Below I am attaching a picture that shows your progress bar on Google Colab:

colab

Optimizing dataloading

Hi there,

Thanks for sharing the code. When training DPC, I found that the data-loading process is taking very long time especially for large dataset. I'm wondering if you have any suggestions regarding how to optimize the dataloader in DPC?

Thanks!

gray scale performance

Hi, I find that you actually don't perform grayscale, but the channel split in .augmentation.py

Do you know the performances between these two kinds of augmentation methods? Thanks!

Possible reasons for loss function not going down

Hello. I am using your network for self-supervision representation on the kitti dataset. Even after 22 epochs the loss function and top1 accuracy barely change. What possible reasons could explain this?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.