GithubHelp home page GithubHelp logo

xingyul / cpnet Goto Github PK

View Code? Open in Web Editor NEW
94.0 1.0 14.0 625 KB

Learning Video Representations from Correspondence Proposals (CVPR 2019 Oral)

License: Other

Shell 3.23% Python 86.78% Makefile 0.65% C++ 7.34% Cuda 1.99%
video-classification tensorflow point-cloud neural-network representation-learning action-recognition

cpnet's Introduction

Learning Video Representations from Correspondence Proposals

Created by Xingyu Liu, Joon-Young Lee and Hailin Jin from Stanford University and Adobe Research (paper link).

Citation

If you find our work useful in your research, please cite:

    @article{liu:2019:cpnet,
      title={Learning Video Representations from Correspondence Proposals},
      author={Xingyu Liu and Joon-Young Lee and Hailin Jin},
      journal={CVPR},
      year={2019}
    }

Abstract

Correspondences between frames encode rich information about dynamic content in videos. However, it is challenging to effectively capture and learn those due to their irregular structure and complex dynamics. In this paper, we propose a novel neural network that learns video representations by aggregating information from potential correspondences. This network, named CPNet, can learn evolving 2D fields with temporal consistency. In particular, it can effectively learn representations for videos by mixing appearance and long-range motion with an RGB-only input. We provide extensive ablation experiments to validate our model. CPNet shows stronger performance than existing methods on Kinetics and achieves the state-of-the-art performance on Something-Something and Jester. We provide analysis towards the behavior of our model and show its robustness to errors in proposals.

Installation

Install TensorFlow. The code is tested under TF1.9.0 GPU version, g++ 5.4.0, CUDA 9.0 and Python 3.5 on Ubuntu 16.04. There are also some dependencies for a few Python libraries for data processing and visualizations like cv2. It's highly recommended that you have access to GPUs.

Compile Customized TF Operators

The TF operators are included under tf_ops, you need to compile them first by make under each ops subfolder (check Makefile). Update arch in the Makefiles for different CUDA Compute Capability that suits your GPU if necessary.

Data Preprocessing

The data preprocessing scripts are included in utils/data_preparation. Please follow the instructions in the README.md of each subdirectory.

Training and Evaluation

First download the ImageNet pretrained ResNet model from here and put it in pretrained_models/ImageNet-ResNet34.npz.

To train the model for Jester dataset, rename command_train.sh.jester.experiment to be command_train.sh and simply execute the shell script command_train.sh. Batch size, learning rate etc are adjustable.

sh command_train.sh

To evaluate the model, rename command_evaluate.sh.jester.experiment to be command_evaluate.sh and simply execute the shell script command_evaluate.sh.

sh command_evaluate.sh

To test the model, rename command_test.sh.jester.experiment to be command_test.sh and simply execute the shell script command_test.sh.

sh command_test.sh

A pre-trained model with ResNet-34 as backbone on Jester dataset is provided here for download.

For Something-Something dataset, the train, evaluation and test command files are command_train.sh.something.something.experiment, command_evaluate.sh.something.something.experiment and command_test.sh.something.something.experiment.

A pre-trained model with ResNet-34 as backbone on Something-Something dataset is provided here for download.

License

Our code is released under CC BY-NC-SA-4.0 License (see LICENSE file for details).

Related Projects

cpnet's People

Contributors

xingyul avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

cpnet's Issues

The issue of envs

Oh my god! why the envs have the both tf and pytorch? And can you tell us the version of pytorch?

About training time?

Hello:
I want to know how long did the training sthv2 take?
And i want to know why transform the webm->.gmeta&.gulp.why not using video-> image?
Thanks.

Do you plan to make your toy dataset public?

Thanks for your great job! I notice you create an interesting toy dataset in Section 4. As I'm not familiar with creating a dataset, do you plan to make your toy dataset public? Or would you like to send it to my email? ([email protected])

Thanks for your great job again! Hopefully for your reply!

Loader errors

hello:
I have get some errors when i run train_something.sh.I followed the instructions in /utils/data_preparation/README.And last i get the result as your discribed.
/datasets/something-something/gulp_240/
train/
Approaching something with your camera/
xxxxx.gmeta
xxxxx.gulp
...
Attaching something to something/
xxxxx.gmeta
xxxxx.gulp
...
...
label2idx.json
gulp_log.csv
opts.json
val/
Approaching something with your camera/
xxxxx.gmeta
xxxxx.gulp
...
Attaching something to something/
xxxxx.gmeta
xxxxx.gulp
...
...
label2idx.json
gulp_log.csv
opts.json
test/
0/
xxxxx.gmeta
xxxxx.gulp
...
label2idx.json
gulp_log.csv
opts.json

However when i run somthing.sh

a part of something.sh

#!/bin/sh
command_file=basename "$0"
script_file=train.py
gpu=0,1,2,3
data=/home1/Dataset/20bn-something-something-v2/gulp_240
model=c2d_resnet34_cp_224
model_path=pretrained_models/ImageNet-ResNet34.npz
batch_size=32
learning_rate=0.01
num_threads=6
num_frames=12
frame_step=2
width=224
height=224
num_classes=174
decay_step=20
symmetric_flip_labels=86:87,93:94,166:167
log_dir=log_something_something_${model}${num_frames}${frame_step}_train
log_file=$log_dir.txt

I got the errors:
data loader
data loading -- 0
Traceback (most recent call last):
File "train.py", line 117, in
batch_size=BATCH_SIZE, num_frames=NUM_FRAMES, step_size=FRAME_STEP, val_samples=1, n_threads=NUM_THREADS)
File "/home1/user/cpnet/utils/dataloader.py", line 38, in get_loader
drop_last=True)
File "/home/user/miniconda3/envs/pytorch_1.0.1_36/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 802, in init
sampler = RandomSampler(dataset)
File "/home/user/miniconda3/envs/pytorch_1.0.1_36/lib/python3.6/site-packages/torch/utils/data/sampler.py", line 64, in init
"value, but got num_samples={}".format(self.num_samples))
ValueError: num_samples should be a positive integeral value, but got num_samples=0

I know what cause the errors is
self.gd = gulpio.GulpDirectory(data_path) Get the NULLl result .
Can you help me

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.