GithubHelp home page GithubHelp logo

timecycle's Introduction

TimeCycle

Code for Learning Correspondence from the Cycle-consistency of Time (CVPR 2019, Oral). The code is developed based on the PyTorch framework, in version PyTorch 0.4 with Python 2. It also runs smoothly with PyTorch 1.0. This repo includes the training code for learning semi-dense correspondence from unlabeled videos, and testing code for applying this correspondence on segmentation mask tracking in videos.

Citation

If you use our code in your research or wish to refer to the baseline results, please use the following BibTeX entry.

@inproceedings{CVPR2019_CycleTime,
    Author = {Xiaolong Wang and Allan Jabri and Alexei A. Efros},
    Title = {Learning Correspondence from the Cycle-Consistency of Time},
    Booktitle = {CVPR},
    Year = {2019},
}

Model and Result

Our trained model can be downloaded from here. The tracking performance on DAVIS-2017 for this model (without training on DAVIS-2017) is:

cropSize J_mean J_recall J_decay F_mean F_recall F_decay
320 x 320 0.419 0.409 0.272 0.394 0.336 0.328
400 x 400 0.430 0.437 0.296 0.426 0.413 0.356
480 x 480 0.464 0.500 0.332 0.500 0.480 0.379

Note that one can easily improve the results in test time by increasing the input image size "cropSize" in the script. The training and testing procedures for this model are described as follows.

Converting Our Model to Standard Pytorch ResNet-50

Please see convert_model.ipynb for converting our model here to standard Pytorch ResNet-50 model format.

Dataset Preparation

Please read DATASET.md for downloading and preparing the VLOG dataset for training and DAVIS dataset for testing.

Training

Replace the input list in train_video_cycle_simple.py in the home folder as:

    params['filelist'] = 'YOUR_DATASET_FOLDER/vlog_frames_12fps.txt'

Then run the following code:

    python train_video_cycle_simple.py --checkpoint pytorch_checkpoints/release_model_simple

Testing

Replace the input list in test_davis.py in the home folder as:

    params['filelist'] = 'YOUR_DATASET_FOLDER/davis/DAVIS/vallist.txt'

Set up the dataset path YOUR_DATASET_FOLDER in run_test.sh . Then run the testing and evaluation code together:

    sh run_test.sh

Acknowledgements

weakalign by Ignacio Rocco, Relja Arandjelović and Josef Sivic.

inflated_convnets_pytorch by Yana Hasson.

pytorch-classification by Wei Yang.

timecycle's People

Contributors

xiaolonw avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

timecycle's Issues

missing davis/DAVIS/vallist.txt

I am trying to test my trained network, but the script can't seem to find vallist.txt at the location it expects it to be (DATASET_FOLDER/davis/DAVIS/vallist.txt).

I dowloaded Test-Dev 2017 and Test-Challenge 2017 and couldn't find the file in either of the two. Which of the two are we expected to use to test (to get the same results as in the paper)?

Any ideas? Thanks!

Missing file 0_9_mask.png

When I tried to test the pre-trained model on the DAVIS dataset using sh run_test.sh, it cannot find file 0_9_mask.png. It seems that this file is not included in the DAVIS dataset and this repo.

Questions about the training from scratch

Hi. I used the provided code to train TimeCycle on some other video datasets. Finetuning the network with the provided checkpoint_14.pth.tar works fine. But when I training the network from scratch, both the inlier loss and theta loss did not decrease. Is there any training tips when training TimeCycle from scratch?

Error while converting checkpoint_14.pth.tar model

The following error is hit while converting the released model (checkpoint_14.pth.tar) to resnet50.

KeyError                                  Traceback (most recent call last)
<ipython-input-3-6df704e20364> in <module>()
     15     kk = k.replace('module.encoderVideo.', '')
     16     tmp = model_state[k]
---> 17     if net_state[kk].shape != model_state[k].shape and net_state[kk].dim() == 4 and model_state[k].dim() == 5:
     18         tmp = model_state[k].squeeze(2)
     19     net_state[kk][:] = tmp[:]

KeyError: 'conv1.weight'

Do I need a specific version of pytorch to convert the model?

How many GPU memory it Need?

Hi,
I run test_davis.py, It always report CUDA out of memory, My GPU has 6GB, it is not enough?
I set the batch_size=1, it also report CUDA out of memory, why?

And if you could add a demo to run tracking on general video, it well be fine, thank you!

Reproducing DeepCluster DAVIS Evaluation Results

Hello,

Thank you for providing this repo! I have had some trouble reproducing the exact DeepCluster performance numbers on DAVIS-2017 from your paper. Could you confirm whether the network you used is the VGG16-PyTorch pretrained model from the DeepCluster repo (https://github.com/facebookresearch/deepcluster)?

In addition, during evaluation do you extract the feature map directly before maxpool-4 in the VGG16 model, or which feature map output do you use from the pretrained model?

Thanks!

transform_trans_out

I was looking at the transform_trans_out in model_simple.py function and I noticed that the 2D rotation matrix is multiplied by 1/3. Could you please help figure out the reason for that? Shouldn't the 2D matrix be correct as is?

trans_out1_0 = 1.0 / 3.0 * torch.cos(trans_out1_theta).unsqueeze(1)
trans_out1_1 = - 1.0 / 3.0 * torch.sin(trans_out1_theta).unsqueeze(1)
trans_out1_3 = 1.0 / 3.0 * torch.sin(trans_out1_theta).unsqueeze(1)
trans_out1_4 = 1.0 / 3.0 * torch.cos(trans_out1_theta).unsqueeze(1)

out of memory

i try to test on davis dataset with two GPU ,11GB,but it got error,please help me to solve it, thanks.
batchSize: 1
temperature: 1.0
gridSize: 9
classNum: 49
videoLen: 8
cropSize: 320
cropSize2: 80
0,1,2,3
False
self.T: 0.04419417382415922
Total params: 26.01M
==> Resuming from checkpoint..

Evaluation only
gridx: 4 gridy: 4
total_frame_num: 77
(77, 320, 320, 3)
[array([0, 0, 0], dtype=uint8), array([ 0, 128, 0], dtype=uint8), array([128, 0, 0], dtype=uint8)]
[85088, 10181, 7129]
20.661283493041992 relabel 0.456728458404541 label
0
Traceback (most recent call last):
File "test_davis.py", line 458, in
test_loss = test(val_loader, model, 1, use_cuda)
File "test_davis.py", line 238, in test
corrfeat2_now = model(imgs_tensor, target_tensor)
File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/opt/conda/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 119, in forward
inputs, kwargs = self.scatter(inputs, kwargs, self.device_ids)
File "/opt/conda/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 130, in scatter
return scatter_kwargs(inputs, kwargs, device_ids, dim=self.dim)
File "/opt/conda/lib/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 35, in scatter_kwargs
inputs = scatter(inputs, target_gpus, dim) if inputs else []
File "/opt/conda/lib/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 28, in scatter
return scatter_map(inputs)
File "/opt/conda/lib/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 15, in scatter_map
return list(zip(*map(scatter_map, obj)))
File "/opt/conda/lib/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 13, in scatter_map
return Scatter.apply(target_gpus, None, dim, obj)
File "/opt/conda/lib/python3.6/site-packages/torch/nn/parallel/_functions.py", line 87, in forward
outputs = comm.scatter(input, ctx.target_gpus, ctx.chunk_sizes, ctx.dim, streams)
File "/opt/conda/lib/python3.6/site-packages/torch/cuda/comm.py", line 142, in scatter
return tuple(torch._C._scatter(tensor, devices, chunk_sizes, dim, streams))
RuntimeError: CUDA error: out of memory (allocate at /opt/conda/conda-bld/pytorch_1532579805626/work/aten/src/THC/THCCachingAllocator.cpp:510)

How to reduce the GPU memory needs?

During the first epoch, I get the following out of memory error

Traceback (most recent call last):                                                                                                                            
  File "train_video_cycle_simple.py", line 352, in <module>                                                                                                   
    main()                                                                                                                                                    
  File "train_video_cycle_simple.py", line 232, in main                                                                                                       
    train_loss, theta_loss, theta_skip_loss = train(train_loader, model, criterion, optimizer, epoch, use_cuda, args)                                         
  File "train_video_cycle_simple.py", line 290, in train                                                                                                      
    outputs = model(imgs, patch2, img, theta)                                                                                                                 
  File "C:\Logiciels\Anaconda3\envs\torch\lib\site-packages\torch\nn\modules\module.py", line 493, in __call__                                                
    result = self.forward(*input, **kwargs)                                                                                                                   
  File "C:\Logiciels\Anaconda3\envs\torch\lib\site-packages\torch\nn\parallel\data_parallel.py", line 150, in forward                                         
    return self.module(*inputs[0], **kwargs[0])                                                                                                               
  File "C:\Logiciels\Anaconda3\envs\torch\lib\site-packages\torch\nn\modules\module.py", line 493, in __call__                                                
    result = self.forward(*input, **kwargs)                                                                                                                   
  File "C:\Users\root\Projects\TimeCycle\models\videos\model_simple.py", line 203, in forward                                                                 
    r50_feat1, r50_feat1_pre, r50_feat1_norm = self.forward_base(videoclip1)                                                                                  
  File "C:\Users\root\Projects\TimeCycle\models\videos\model_simple.py", line 164, in forward_base                                                            
    x_pre = self.encoderVideo(x)                                                                                                                              
  File "C:\Logiciels\Anaconda3\envs\torch\lib\site-packages\torch\nn\modules\module.py", line 493, in __call__                                                
    result = self.forward(*input, **kwargs)                                                                                                                   
  File "C:\Users\root\Projects\TimeCycle\models\videos\inflated_resnet.py", line 35, in forward                                                               
    x = self.layer1(x)                                                                                                                                        
  File "C:\Logiciels\Anaconda3\envs\torch\lib\site-packages\torch\nn\modules\module.py", line 493, in __call__                                                
    result = self.forward(*input, **kwargs)                                                                                                                   
  File "C:\Logiciels\Anaconda3\envs\torch\lib\site-packages\torch\nn\modules\container.py", line 92, in forward                                               
    input = module(input)                                                                                                                                     
  File "C:\Logiciels\Anaconda3\envs\torch\lib\site-packages\torch\nn\modules\module.py", line 493, in __call__                                                
    result = self.forward(*input, **kwargs)                                                                                                                   
  File "C:\Users\root\Projects\TimeCycle\models\videos\inflated_resnet.py", line 95, in forward                                                               
    out = self.conv3(out)                                                                                                                                     
  File "C:\Logiciels\Anaconda3\envs\torch\lib\site-packages\torch\nn\modules\module.py", line 493, in __call__                                                
    result = self.forward(*input, **kwargs)                                                                                                                   
  File "C:\Logiciels\Anaconda3\envs\torch\lib\site-packages\torch\nn\modules\conv.py", line 476, in forward                                                   
    self.padding, self.dilation, self.groups)                                                                                                                 
RuntimeError: CUDA out of memory. Tried to allocate 508.00 MiB (GPU 0; 8.00 GiB total capacity; 5.63 GiB already allocated; 362.97 MiB free; 41.09 MiB cached)
> c:\logiciels\anaconda3\envs\torch\lib\site-packages\torch\nn\modules\conv.py(476)forward()                                                                  
-> self.padding, self.dilation, self.groups)                                                                                                                  
(Pdb)                                                                                                                                                         

The settings used are the default ones

4                               
batchSize: 36                   
temperature: 0.04419417382415922
gridSize: 9                     
classNum: 49                    
videoLen: 4                     
0,1,2,3                         
False                           
self.T: 0.04419417382415922     
    Total params: 26.01M        
weight_decay: 0.0               
beta1: 0.5                      

What do I need to change to reduce the needs of GPU memory?

AffineGridGenV3

What is the reason for AffineGridGenV3? It seems like the only change from it and V2 was that you moved the creation of the grids to the forward pass instead of the initialization and made them Tensors instead of FloatTensors. What was the reasoning behind these changes?

In test_davis.py line 459

Should "hid = ids / width_dim " be "hid = ids / /width_dim" ? Otherwise the hid is not int, then you can not use it as the index.

Error in class VlogSet

The default videoLen is set to 4 in the config. Now there are videos where the number of frames is less than 4.

Within the file models/dataset/vlog_train.py, line 124 tries to read at least 4 frames for each video and the dataloader crashes for the case where the number of frames is less than 4.

Crash at line 132 while reading the image

img = load_image(img_path)

Error:

File "/beegfs/ahj265/self_supervised_tracking/models/dataset/vlog_train.py", line 175, in __getitem__
    img = load_image(img_path)  # CxHxW
  File "/beegfs/ahj265/self_supervised_tracking/utils/imutils2.py", line 23, in load_image
    img = img.astype(np.float32)
AttributeError: 'NoneType' object has no attribute 'astype'

Is there a preprocessing step I am missing where you filter out such videos?

How to generate the 'davis/DAVIS/ImageSets/2017/val.txt' file?

I'm trying to test my trained network on Davis, but the test_davis.py script wants to read the file davis/DAVIS/ImageSets/2017/val.txt which doesn't exist.

I downloaded the dataset manually on the website (https://davischallenge.org/davis2017/code.html) and I got the Annotations, ImageSets and JPEGImages folders, but in the ImageSets/2017 folder, I the only file I got is test-dev.txt.

I can't find any info on how to generate that file. Can someone help me?

Weird loss progression

Since I am training the model on VLOG with a very small batch size, the training is going to take forever (8 days). And because I don't want to wait that long, I'll stop the training before 30 epochs. But the losses shown in the logs seem odd to me. Can someone provide me the log of a complete training so I can compare the losses and see if my early results are normal or not? Thanks

Learning Rate	Train Loss	Theta Loss	Theta Skip Loss	
0.000200	-0.002401	0.366067	0.331109	
0.000200	-0.002381	0.369635	0.328924	
0.000200	-0.001740	0.402181	0.374113	
0.000200	-0.001929	0.378956	0.342752

Out-of-bounds indexing while training, parameter explanations

Hi, I am in the process of training on a custom dataset. I have 12 videos, each with 250 jpeg images and the appropriate .txt file for the dataset which specifies the paths to the dataset during training. I am running into the same issue from a closed issue:

File "models/dataset/vlog_train.py", line 175, in getitem
img = load_image(img_path) # CxHxW

The path to the image is trying to index 000250.jpg which is out of bounds (since there are only 250 images, 0-indexed). I think this has something to do with what the parameters videoLen and frame_gap are for the dataset. I see that fnums is the # of jpeg images for the given folder, so what are the videoLen and frame_gap parameters used for?

In models/dataset/vlog_train.py, there is also a line:

current_len = (self.videoLen + self.predDistance) * frame_gap

and later on a check that says:

if fnum >= current_len:
> diffnum = fnum - current_len
> startframe = random.randint(0, diffnum)
> future_idx = startframe + current_len - 1

What do videoLen, predDistance, and frame_gap represent in this context?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.