GithubHelp home page GithubHelp logo

wangyt19 / video-description-with-spatial-temporal-attention Goto Github PK

View Code? Open in Web Editor NEW

This project forked from tuyunbin/video-description-with-spatial-temporal-attention

0.0 1.0 0.0 74.19 MB

Our paper was published in the proceedings of ACM Multimedia 2017 (MM' 17) .

Home Page: https://github.com/tuyunbin/Video-Description-with-Spatial-Temporal-Attention

Python 100.00%

video-description-with-spatial-temporal-attention's Introduction

Video-Description-with-Spatial-Temporal-Attention

This package contains the accompanying code for the following paper:

Tu, Yunbin, et al. "Video Description with Spatial-Temporal Attention." which has appeared as full paper in the Proceedings of the ACM International Conference on Multimedia,2017 (ACM MM'17).

The codes are forked from yaoli/arctic-capgen-vid.

We illustrate the training details as follows:

usage

Installation

Firstly, Clone our repository:

$ git clone https://github.com/tuyunbin/Video-Description-with-Spatial-Temporal-Attention.git

Here, msvd_data contains 7 pkl files needed to train and test the model.

Dependencies

Theano can be easily installed by following the instructions there. Theano has its own dependencies as well. The second way to install Theano is to install Anaconda. If you use first way to install Theano, you may meet the error : "no module named pygpu". If so, you should install it with Anaconda, but you needn't change your python environment. You only add this command when you use Theano:

$ export PATH="/home/tuyunbin/anaconda2/bin:$PATH"

(Changing your own PATH)

coco-caption. Install it by simply adding it into your $PYTHONPATH.

Jobman. After it has been git cloned, please add it into $PYTHONPATH as well.

Finally, you will also need to install h5py, since we will use hdf5 files to store the preprocessed features.

Video Datas and Pre-extracted Features on MSVD Dataset.

The pre-processed datasets used in our paper are available at this links

The pre-processed global, motion and local features used in our paper can be download at these links:

global and motion features

local features

In our paper, we used local features extracted from the fc7 layer of Faster R-CNN network, and their number is 8. You can extract local features with other number by Faster R-CNN.

Note: Since the data amount on MSR-VTT-10K is too large, we don't offer the data we used. You can train your model on this dataset with the same code. But don't forget to shuffle the train_id when training the model.

Test model trained by us

Firstly, you need to download the pre-trained model at this link, and add them into your $PYTHONPATH.

Secondly, go to common.py and change the following two line

RAB_DATASET_BASE_PATH = '/home/tuyunbin/Video-Description-with-Spatial-Temporal-Attention/msvd_data/' 
RAB_EXP_PATH = '/home/sdc/tuyunbin/msvd_result/Video-Description-with-Spatial-Temporal-Attention/exp/' 

according to your specific setup. The first path is the parent dir path containing msvd_data folder. The second path specifies where you would like to save all the experimental results. Before testing the model, we suggest to test data_engine.py by running python data_engine.py without any error. It is also useful to verify coco-caption evaluation pipeline works properly by running python metrics.py without any error.

Finally, you can exploit our trained model by setting this configuration with 'True' in config.py.

'reload_': True,

Train your own model

Here, you need to set 'False' with reload in config.py.

Now ready to launch the training

$ THEANO_FLAGS=mode=FAST_RUN,device=cuda0,floatX=float32 python train_model.py

If you find this helps your research, please consider citing:

@inproceedings{tu2017video,
  title={Video Description with Spatial-Temporal Attention},
  author={Tu, Yunbin and Zhang, Xishan and Liu, Bingtao and Yan, Chenggang},
  booktitle={Proceedings of the 2017 ACM on Multimedia Conference},
  pages={1014--1022},
  year={2017},
  organization={ACM}
}

Notes

Running train_model.py for the first time takes much longer since Theano needs to compile for the first time lots of things and cache on disk for the future runs. You will probably see some warning messages on stdout. It is safe to ignore all of them. Both model parameters and configurations are saved (the saving path is printed out on stdout, easy to find). The most important thing to monitor is train_valid_test.txt in the exp output folder. It is a big table saving all metrics per validation.

Contact

My email is [email protected]

Any discussions and suggestions are welcome!

video-description-with-spatial-temporal-attention's People

Contributors

tuyunbin avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.