xiaolonw / timecycle Goto Github PK

Learning Correspondence from the Cycle-consistency of Time (CVPR 2019)

Python 98.06% Shell 0.49% Jupyter Notebook 1.45%

timecycle's Introduction

TimeCycle

Code for Learning Correspondence from the Cycle-consistency of Time (CVPR 2019, Oral). The code is developed based on the PyTorch framework, in version PyTorch 0.4 with Python 2. It also runs smoothly with PyTorch 1.0. This repo includes the training code for learning semi-dense correspondence from unlabeled videos, and testing code for applying this correspondence on segmentation mask tracking in videos.

Citation

If you use our code in your research or wish to refer to the baseline results, please use the following BibTeX entry.

@inproceedings{CVPR2019_CycleTime,
    Author = {Xiaolong Wang and Allan Jabri and Alexei A. Efros},
    Title = {Learning Correspondence from the Cycle-Consistency of Time},
    Booktitle = {CVPR},
    Year = {2019},
}

Model and Result

Our trained model can be downloaded from here. The tracking performance on DAVIS-2017 for this model (without training on DAVIS-2017) is:

cropSize	J_mean	J_recall	J_decay	F_mean	F_recall	F_decay
320 x 320	0.419	0.409	0.272	0.394	0.336	0.328
400 x 400	0.430	0.437	0.296	0.426	0.413	0.356
480 x 480	0.464	0.500	0.332	0.500	0.480	0.379

Note that one can easily improve the results in test time by increasing the input image size "cropSize" in the script. The training and testing procedures for this model are described as follows.

Converting Our Model to Standard Pytorch ResNet-50

Please see convert_model.ipynb for converting our model here to standard Pytorch ResNet-50 model format.

Dataset Preparation

Please read DATASET.md for downloading and preparing the VLOG dataset for training and DAVIS dataset for testing.

Training

Replace the input list in train_video_cycle_simple.py in the home folder as:

    params['filelist'] = 'YOUR_DATASET_FOLDER/vlog_frames_12fps.txt'

Then run the following code:

    python train_video_cycle_simple.py --checkpoint pytorch_checkpoints/release_model_simple

Testing

Replace the input list in test_davis.py in the home folder as:

    params['filelist'] = 'YOUR_DATASET_FOLDER/davis/DAVIS/vallist.txt'

Set up the dataset path YOUR_DATASET_FOLDER in run_test.sh . Then run the testing and evaluation code together:

    sh run_test.sh

Acknowledgements

weakalign by Ignacio Rocco, Relja Arandjelović and Josef Sivic.

inflated_convnets_pytorch by Yana Hasson.

pytorch-classification by Wei Yang.

timecycle's People

Contributors

Stargazers

Watchers

Forkers

hyzcn trantorrepository ml-lab shuidongliu archive-git-repo cclauss andytang15 peterzhousz ankitshah009 adolfoeliazat satoshirobatofujimoto hdjsjyl cy-xu wynmew wishgale luoweizhou sandbox3aster batermj liuzheng081 zhuty94 kleinxin hjffily molyswu chengmuni66 slzhly aimeng100 wikipedia2008 yushenxiang jacklongking tortoising stephenjia locussam peterzs heqins liuwenhaha himario simonsroad anroxi guangshengshi wufenggit baoqiancherry dongfangduoshou123 tengfeihou eternityzy fengsijia korhanpolat z1277724487 filippoaleotti guoleming zepingz erictseng610 hscha zhuysheng tengsz lizolson yonghoonkwon lxtgh menglaili hzhang57 jiaweihe1996 akira2009999 zhenzhenxiang fegonda lovelyqian neophack liangqh lemingguo lvyifanlvlv priyasundaresan bjtuwzl123 yo-maxwill zyc08 bupt-zjie zhiqinzhan bitbeyhub cooparation peterhan91 soccergame hey-shenhui mike-shvets sumanvid97 yidadaa jackxu0 lj894 lv-tuan xthzhjwzyc cshnai sundarammuthu saoruy xujinglin catmasteryip shuuchen suixiaodan ruppeshnalwaya1993 lanlbn guoshengcv shhzhoullxie deasonyuan xueluli theomoutakanni

timecycle's Issues

Trained model cannot download now.

download model from https://www.dropbox.com/s/txsj62dp9nuxs6h/checkpoint_14.pth.tar?dl=0 failed as:
Too many requests
Sorry, this link has been automatically turned off for now. Learn more about traffic limits.

How to generate the 'davis/DAVIS/ImageSets/2017/val.txt' file?

I'm trying to test my trained network on Davis, but the test_davis.py script wants to read the file davis/DAVIS/ImageSets/2017/val.txt which doesn't exist.

I downloaded the dataset manually on the website (https://davischallenge.org/davis2017/code.html) and I got the Annotations, ImageSets and JPEGImages folders, but in the ImageSets/2017 folder, I the only file I got is test-dev.txt.

I can't find any info on how to generate that file. Can someone help me?

A parameter is not well understood

Hi, I have a question about a parameter set in https://github.com/xiaolonw/TimeCycle/blob/master/models/videos/model_simple.py#L82 ,why set the offset_factor to 227/210.

How about the transfered performance on object detection by using the learned representation?

Hi, I am interested in the learned representatin's transfered performance on object detection task. I notice that you fine-tuned the learned network on PASCAL VOC 2007 in your past two paper, but you didn't do such adaptation in this paper. I believe the representations learned by cycle-consistency own the potential of detecting objects. Is that true?

What is the expected duration of training?

The paper mentions 30 epochs. What time duration is that for the authors? And at what point do results appear reasonable?

Can not open model file

Can you provide Baidu Cloud Link?I'd download from google drive,but i can't open it

AffineGridGenV3

What is the reason for AffineGridGenV3? It seems like the only change from it and V2 was that you moved the creation of the grids to the forward pass instead of the initialization and made them Tensors instead of FloatTensors. What was the reasoning behind these changes?

Questions about the training from scratch

Hi. I used the provided code to train TimeCycle on some other video datasets. Finetuning the network with the provided checkpoint_14.pth.tar works fine. But when I training the network from scratch, both the inlier loss and theta loss did not decrease. Is there any training tips when training TimeCycle from scratch?

Missing file 0_9_mask.png

When I tried to test the pre-trained model on the DAVIS dataset using sh run_test.sh, it cannot find file 0_9_mask.png. It seems that this file is not included in the DAVIS dataset and this repo.

May I get the code for pose keypoint propagation on JHMDB?

Source codes for VLOG and Davis are helpful to reproduce results on the datasets.

May I get the source code for JHMDB task?

out of memory

i try to test on davis dataset with two GPU ,11GB,but it got error,please help me to solve it, thanks.
batchSize: 1
temperature: 1.0
gridSize: 9
classNum: 49
videoLen: 8
cropSize: 320
cropSize2: 80
0,1,2,3
False
self.T: 0.04419417382415922
Total params: 26.01M
==> Resuming from checkpoint..

Evaluation only
gridx: 4 gridy: 4
total_frame_num: 77
(77, 320, 320, 3)
[array([0, 0, 0], dtype=uint8), array([ 0, 128, 0], dtype=uint8), array([128, 0, 0], dtype=uint8)]
[85088, 10181, 7129]
20.661283493041992 relabel 0.456728458404541 label
0
Traceback (most recent call last):
File "test_davis.py", line 458, in
test_loss = test(val_loader, model, 1, use_cuda)
File "test_davis.py", line 238, in test
corrfeat2_now = model(imgs_tensor, target_tensor)
File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/opt/conda/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 119, in forward
inputs, kwargs = self.scatter(inputs, kwargs, self.device_ids)
File "/opt/conda/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 130, in scatter
return scatter_kwargs(inputs, kwargs, device_ids, dim=self.dim)
File "/opt/conda/lib/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 35, in scatter_kwargs
inputs = scatter(inputs, target_gpus, dim) if inputs else []
File "/opt/conda/lib/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 28, in scatter
return scatter_map(inputs)
File "/opt/conda/lib/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 15, in scatter_map
return list(zip(*map(scatter_map, obj)))
File "/opt/conda/lib/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 13, in scatter_map
return Scatter.apply(target_gpus, None, dim, obj)
File "/opt/conda/lib/python3.6/site-packages/torch/nn/parallel/_functions.py", line 87, in forward
outputs = comm.scatter(input, ctx.target_gpus, ctx.chunk_sizes, ctx.dim, streams)
File "/opt/conda/lib/python3.6/site-packages/torch/cuda/comm.py", line 142, in scatter
return tuple(torch._C._scatter(tensor, devices, chunk_sizes, dim, streams))
RuntimeError: CUDA error: out of memory (allocate at /opt/conda/conda-bld/pytorch_1532579805626/work/aten/src/THC/THCCachingAllocator.cpp:510)

missing davis/DAVIS/vallist.txt

I am trying to test my trained network, but the script can't seem to find vallist.txt at the location it expects it to be (DATASET_FOLDER/davis/DAVIS/vallist.txt).

I dowloaded Test-Dev 2017 and Test-Challenge 2017 and couldn't find the file in either of the two. Which of the two are we expected to use to test (to get the same results as in the paper)?

Any ideas? Thanks!

evaluating with pretrained_imagenet gives lower J_mean and F_mean than reported in paper

I tried evaluating using imagenet pre-trained resent50 model by removing the --resume flag and providing --pretrained-imagenet flag while running test_davis.py. I got a J_mean of 45.2 and F_mean of 48.8 which is lower than what is reported in the paper Table 1 (50.3 and 49.0). How can I reproduce these numbers?

Error in class VlogSet

The default videoLen is set to 4 in the config. Now there are videos where the number of frames is less than 4.

Within the file models/dataset/vlog_train.py, line 124 tries to read at least 4 frames for each video and the dataloader crashes for the case where the number of frames is less than 4.

Crash at line 132 while reading the image

img = load_image(img_path)

Error:

File "/beegfs/ahj265/self_supervised_tracking/models/dataset/vlog_train.py", line 175, in __getitem__
    img = load_image(img_path)  # CxHxW
  File "/beegfs/ahj265/self_supervised_tracking/utils/imutils2.py", line 23, in load_image
    img = img.astype(np.float32)
AttributeError: 'NoneType' object has no attribute 'astype'

Is there a preprocessing step I am missing where you filter out such videos?

Reproducing DeepCluster DAVIS Evaluation Results

Hello,

Thank you for providing this repo! I have had some trouble reproducing the exact DeepCluster performance numbers on DAVIS-2017 from your paper. Could you confirm whether the network you used is the VGG16-PyTorch pretrained model from the DeepCluster repo (https://github.com/facebookresearch/deepcluster)?

In addition, during evaluation do you extract the feature map directly before maxpool-4 in the VGG16 model, or which feature map output do you use from the pretrained model?

Thanks!

How many GPU memory it Need?

Hi,
I run test_davis.py, It always report CUDA out of memory, My GPU has 6GB, it is not enough?
I set the batch_size=1, it also report CUDA out of memory, why?

And if you could add a demo to run tracking on general video, it well be fine, thank you!

Out-of-bounds indexing while training, parameter explanations

Hi, I am in the process of training on a custom dataset. I have 12 videos, each with 250 jpeg images and the appropriate .txt file for the dataset which specifies the paths to the dataset during training. I am running into the same issue from a closed issue:

File "models/dataset/vlog_train.py", line 175, in getitem
img = load_image(img_path) # CxHxW

The path to the image is trying to index 000250.jpg which is out of bounds (since there are only 250 images, 0-indexed). I think this has something to do with what the parameters videoLen and frame_gap are for the dataset. I see that fnums is the # of jpeg images for the given folder, so what are the videoLen and frame_gap parameters used for?

In models/dataset/vlog_train.py, there is also a line:

current_len = (self.videoLen + self.predDistance) * frame_gap

and later on a check that says:

if fnum >= current_len:
> diffnum = fnum - current_len
> startframe = random.randint(0, diffnum)
> future_idx = startframe + current_len - 1

What do videoLen, predDistance, and frame_gap represent in this context?

Weird loss progression

Since I am training the model on VLOG with a very small batch size, the training is going to take forever (8 days). And because I don't want to wait that long, I'll stop the training before 30 epochs. But the losses shown in the logs seem odd to me. Can someone provide me the log of a complete training so I can compare the losses and see if my early results are normal or not? Thanks

Learning Rate	Train Loss	Theta Loss	Theta Skip Loss	
0.000200	-0.002401	0.366067	0.331109	
0.000200	-0.002381	0.369635	0.328924	
0.000200	-0.001740	0.402181	0.374113	
0.000200	-0.001929	0.378956	0.342752

Error while converting checkpoint_14.pth.tar model

The following error is hit while converting the released model (checkpoint_14.pth.tar) to resnet50.

KeyError                                  Traceback (most recent call last)
<ipython-input-3-6df704e20364> in <module>()
     15     kk = k.replace('module.encoderVideo.', '')
     16     tmp = model_state[k]
---> 17     if net_state[kk].shape != model_state[k].shape and net_state[kk].dim() == 4 and model_state[k].dim() == 5:
     18         tmp = model_state[k].squeeze(2)
     19     net_state[kk][:] = tmp[:]

KeyError: 'conv1.weight'

Do I need a specific version of pytorch to convert the model?

How to reduce the GPU memory needs?

During the first epoch, I get the following out of memory error

Traceback (most recent call last):                                                                                                                            
  File "train_video_cycle_simple.py", line 352, in <module>                                                                                                   
    main()                                                                                                                                                    
  File "train_video_cycle_simple.py", line 232, in main                                                                                                       
    train_loss, theta_loss, theta_skip_loss = train(train_loader, model, criterion, optimizer, epoch, use_cuda, args)                                         
  File "train_video_cycle_simple.py", line 290, in train                                                                                                      
    outputs = model(imgs, patch2, img, theta)                                                                                                                 
  File "C:\Logiciels\Anaconda3\envs\torch\lib\site-packages\torch\nn\modules\module.py", line 493, in __call__                                                
    result = self.forward(*input, **kwargs)                                                                                                                   
  File "C:\Logiciels\Anaconda3\envs\torch\lib\site-packages\torch\nn\parallel\data_parallel.py", line 150, in forward                                         
    return self.module(*inputs[0], **kwargs[0])                                                                                                               
  File "C:\Logiciels\Anaconda3\envs\torch\lib\site-packages\torch\nn\modules\module.py", line 493, in __call__                                                
    result = self.forward(*input, **kwargs)                                                                                                                   
  File "C:\Users\root\Projects\TimeCycle\models\videos\model_simple.py", line 203, in forward                                                                 
    r50_feat1, r50_feat1_pre, r50_feat1_norm = self.forward_base(videoclip1)                                                                                  
  File "C:\Users\root\Projects\TimeCycle\models\videos\model_simple.py", line 164, in forward_base                                                            
    x_pre = self.encoderVideo(x)                                                                                                                              
  File "C:\Logiciels\Anaconda3\envs\torch\lib\site-packages\torch\nn\modules\module.py", line 493, in __call__                                                
    result = self.forward(*input, **kwargs)                                                                                                                   
  File "C:\Users\root\Projects\TimeCycle\models\videos\inflated_resnet.py", line 35, in forward                                                               
    x = self.layer1(x)                                                                                                                                        
  File "C:\Logiciels\Anaconda3\envs\torch\lib\site-packages\torch\nn\modules\module.py", line 493, in __call__                                                
    result = self.forward(*input, **kwargs)                                                                                                                   
  File "C:\Logiciels\Anaconda3\envs\torch\lib\site-packages\torch\nn\modules\container.py", line 92, in forward                                               
    input = module(input)                                                                                                                                     
  File "C:\Logiciels\Anaconda3\envs\torch\lib\site-packages\torch\nn\modules\module.py", line 493, in __call__                                                
    result = self.forward(*input, **kwargs)                                                                                                                   
  File "C:\Users\root\Projects\TimeCycle\models\videos\inflated_resnet.py", line 95, in forward                                                               
    out = self.conv3(out)                                                                                                                                     
  File "C:\Logiciels\Anaconda3\envs\torch\lib\site-packages\torch\nn\modules\module.py", line 493, in __call__                                                
    result = self.forward(*input, **kwargs)                                                                                                                   
  File "C:\Logiciels\Anaconda3\envs\torch\lib\site-packages\torch\nn\modules\conv.py", line 476, in forward                                                   
    self.padding, self.dilation, self.groups)                                                                                                                 
RuntimeError: CUDA out of memory. Tried to allocate 508.00 MiB (GPU 0; 8.00 GiB total capacity; 5.63 GiB already allocated; 362.97 MiB free; 41.09 MiB cached)
> c:\logiciels\anaconda3\envs\torch\lib\site-packages\torch\nn\modules\conv.py(476)forward()                                                                  
-> self.padding, self.dilation, self.groups)                                                                                                                  
(Pdb)

The settings used are the default ones

4                               
batchSize: 36                   
temperature: 0.04419417382415922
gridSize: 9                     
classNum: 49                    
videoLen: 4                     
0,1,2,3                         
False                           
self.T: 0.04419417382415922     
    Total params: 26.01M        
weight_decay: 0.0               
beta1: 0.5

What do I need to change to reduce the needs of GPU memory?

transform_trans_out

I was looking at the transform_trans_out in model_simple.py function and I noticed that the 2D rotation matrix is multiplied by 1/3. Could you please help figure out the reason for that? Shouldn't the 2D matrix be correct as is?

TimeCycle/models/videos/model_simple.py

Lines 148 to 151 in 16d33ac

 trans_out1_0 = 1.0 / 3.0 * torch.cos(trans_out1_theta).unsqueeze(1) 

 trans_out1_1 = - 1.0 / 3.0 * torch.sin(trans_out1_theta).unsqueeze(1) 

 trans_out1_3 = 1.0 / 3.0 * torch.sin(trans_out1_theta).unsqueeze(1) 

 trans_out1_4 = 1.0 / 3.0 * torch.cos(trans_out1_theta).unsqueeze(1)

	trans_out1_0 = 1.0 / 3.0 * torch.cos(trans_out1_theta).unsqueeze(1)
	trans_out1_1 = - 1.0 / 3.0 * torch.sin(trans_out1_theta).unsqueeze(1)
	trans_out1_3 = 1.0 / 3.0 * torch.sin(trans_out1_theta).unsqueeze(1)
	trans_out1_4 = 1.0 / 3.0 * torch.cos(trans_out1_theta).unsqueeze(1)