GithubHelp home page GithubHelp logo

zhangjiekui / dtpp Goto Github PK

View Code? Open in Web Editor NEW

This project forked from zhujiagang/dtpp

0.0 1.0 0.0 459.63 MB

Deep networks with Temporal Pyramid Pooling. The official implementation for "End-to-end Video-level Representation Learning for Action Recognition, ICPR 2018."

License: BSD 2-Clause "Simplified" License

Python 94.98% Shell 2.91% MATLAB 2.11%

dtpp's Introduction

DTPP

Updating

This repository holds the codes and models for the paper

End-to-end Video-level Representation Learning for Action Recognition, Jiagang Zhu, Wei Zou, Zheng Zhu, ICPR 2018, Beijing, China.

[Arxiv Preprint]

We follow the guidance provided by TSN to prepare the data. Please refer to the TSN repository for guidance. Here we only provide the additional training details of DTPP.

Usage Guide

Code & Data Preparation

Get the code

Use git to clone this repository and its submodules

git clone --recursive https://github.com/zhujiagang/DTPP.git

Compile Caffe

cd lib/
cp -r caffe-tpp-net/ caffe-tpp-net-python/

Please compile caffe-tpp-net/ with cmake and openmpi according to TSN for training models and compile caffe-tpp-net-python/ with python interface for evaluating models with python script.

Get initialization models

We have built the initialization model weights for both rgb and flow input. The flow initialization models implements the cross-modality training technique in the paper. To download the model weights, run

bash get_init_models.sh
bash get_kinetics_pretraining_models.sh

Start training

[back to top]

Once all necessities ready, we can start training DTPP. For example, if we want to train on HMDB51. For example, the following command runs training on HMDB51 with rgb input, with its weights initialized by ImageNet pretraining.

bash hmdb_scripts_split_1/train_rgb_tpp_delete_dropout_split_1.sh

And the following command runs training on HMDB51 with rgb input, with its weights initialized by Kinetics pretraining.

bash kinetics_hmdb_split_1/train_kinetics_rgb_tpp_p124_split_1.sh

The learned model weights will be saved in snapshot/.

Start testing

[back to top]

The reader can refer to the

eval_tpp_net_ucf.py
eval_tpp_net_hmdb.py

and modify the path in the files to test the trained models.

For the fusion of two streams and MIFS, iDT, please refer to the

eval_scores_rgb_flow.py

Our trained models will be released soon.

Citation

Please cite the following paper if you feel this repository useful.

@inproceedings{DTPP2018ICPR,
  author    = {Jiagang Zhu and
               Wei Zou and
               Zheng Zhu},
  title     = {End-to-end Video-level Representation Learning for Action Recognition},
  booktitle   = {ICPR},
  year      = {2018},
}

@inproceedings{TSN2016ECCV,
  author    = {Limin Wang and
               Yuanjun Xiong and
               Zhe Wang and
               Yu Qiao and
               Dahua Lin and
               Xiaoou Tang and
               Luc {Val Gool}},
  title     = {Temporal Segment Networks: Towards Good Practices for Deep Action Recognition},
  booktitle   = {ECCV},
  year      = {2016},
}

Contact

For any question, please contact

Jiagang Zhu: [email protected]

dtpp's People

Contributors

zhujiagang avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.