GithubHelp home page GithubHelp logo

mynlp / cst_captioning Goto Github PK

View Code? Open in Web Editor NEW
59.0 14.0 17.0 919 KB

PyTorch Implementation of Consensus-based Sequence Training for Video Captioning

Makefile 7.25% Python 92.75%
captioning-videos policy-gradient

cst_captioning's Introduction

Consensus-based Sequence Training for Video Captioning

Code for the video captioning methods from "Consensus-based Sequence Training for Video Captioning" (Phan, Henter, Miyao, Satoh. 2017).

Dependencies

(Check out the coco-caption and cider projects into your working directory)

Data

Data can be downloaded here (643 MB). This folder contains:

  • input/msrvtt: annotatated captions (note that val_videodatainfo.json is a symbolic link to train_videodatainfo.json)
  • output/feature: extracted features
  • output/model/cst_best: model file and generated captions on test videos of our best run (CIDEr 54.2)

Getting started

Extract video features

  • Extracted features of ResNet, C3D, MFCC and Category embeddings are shared in the above link

Generate metadata

make pre_process

Pre-compute document frequency for CIDEr computation

make compute_ciderdf

Pre-compute evaluation scores (BLEU_4, CIDEr, METEOR, ROUGE_L) for each caption

make compute_evalscores

Train/Test

make train [options]
make test [options]

Please refer to the Makefile (and opts.py file) for the set of available train/test options

Examples

Train XE model

make train GID=0 EXP_NAME=xe FEATS="resnet c3d mfcc category" USE_RL=0 USE_CST=0 USE_MIXER=0 SCB_CAPTIONS=0 LOGLEVEL=DEBUG MAX_EPOCHS=50

Train CST_GT_None/WXE model

make train GID=0 EXP_NAME=WXE FEATS="resnet c3d mfcc category" USE_RL=1 USE_CST=1 USE_MIXER=0 SCB_CAPTIONS=0 LOGLEVEL=DEBUG MAX_EPOCHS=50

Train CST_MS_Greedy model (using greedy baseline)

make train GID=0 EXP_NAME=CST_MS_Greedy FEATS="resnet c3d mfcc category" USE_RL=1 USE_CST=0 SCB_CAPTIONS=0 USE_MIXER=1 MIXER_FROM=1 USE_EOS=1 LOGLEVEL=DEBUG MAX_EPOCHS=200 START_FROM=output/model/WXE

Train CST_MS_SCB model (using SCB baseline, where SCB is computed from GT captions)

make train GID=0 EXP_NAME=CST_MS_SCB FEATS="resnet c3d mfcc category" USE_RL=1 USE_CST=1 USE_MIXER=1 MIXER_FROM=1 SCB_BASELINE=1 SCB_CAPTIONS=20 USE_EOS=1 LOGLEVEL=DEBUG MAX_EPOCHS=200 START_FROM=output/model/WXE

Train CST_MS_SCB(*) model (using SCB baseline, where SCB is computed from model sampled captions)

make train GID=0 MODEL_TYPE=concat EXP_NAME=CST_MS_SCBSTAR FEATS="resnet c3d mfcc category" USE_RL=1 USE_CST=1 USE_MIXER=1 MIXER_FROM=1 SCB_BASELINE=2 SCB_CAPTIONS=20 USE_EOS=1 LOGLEVEL=DEBUG MAX_EPOCHS=200 START_FROM=output/model/WXE

If you want to change the input features, modify the FEATS variable in above commands.

Reference

@article{cst_phan2017,
    author = {Sang Phan and Gustav Eje Henter and Yusuke Miyao and Shin'ichi Satoh},
    title = {Consensus-based Sequence Training for Video Captioning},
    journal = {ArXiv e-prints},
    archivePrefix = "arXiv",
    eprint = {1712.09532},
    year = {2017},
}

Todo

  • Test on Youtube2Text dataset (different number of captions per video)

Acknowledgements

  • Torch implementation of NeuralTalk2
  • PyTorch implementation of Self-critical Sequence Training for Image Captioning (SCST)
  • PyTorch Team

cst_captioning's People

Contributors

plsang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cst_captioning's Issues

Multi-GPU training support

Hi,

I am trying to use multiple GPUs on my workstation for your code. I thus use GID=0,1,2,3 in the command to start a training session. However, it seems that it's still using only 1 GPU.

Going through your code, I was unable to find DataParallel anywhere in the code. I am wondering whether if your code originally supports multi-GPU training.

If not, I might be able to take a look at.

WXE get the best result

hello, the result made me confused and i can not figure it out . CST_MS_SCB and other full RL training methods counldn't improve the result from WXE. Is there anyone met the same problem?

feature fusion?

Hello, I found one video only has one feature(C3D,Resnet),not all the fatures of frames we choosed. Could you tell me how to make them together?

options

we are not able to run the code, can you help us with it?

No such file or directory: 'data/output/metadata/msrvtt_train_ciderdf.pkl.p'

Hi,

I was just trying to run with your default setting:

Train CST_GT_None/WXE model

make train GID=0 EXP_NAME=WXE FEATS="resnet c3d mfcc category" USE_RL=1 USE_CST=1 USE_MIXER=0 SCB_CAPTIONS=0 LOGLEVEL=DEBUG MAX_EPOCHS=50

Here is the error I got:

Traceback (most recent call last):
  File "train.py", line 529, in <module>
    rl_criterion=rl_criterion)
  File "train.py", line 118, in train
    'CIDEr': CiderD(df=opt.train_cached_tokens),
  File "cider/pyciderevalcap/ciderD/ciderD.py", line 25, in __init__
    self.cider_scorer = CiderScorer(n=self._n, df_mode=self._df)
  File "cider/pyciderevalcap/ciderD/ciderD_scorer.py", line 69, in __init__
    pkl_file = pickle.load(open(os.path.join('data', df_mode + '.p'),'r'))
IOError: [Errno 2] No such file or directory: 'data/output/metadata/msrvtt_train_ciderdf.pkl.p'

There seems to be an issue with the way how we load pickle file within CIDer. Making following change solved the problem.

In "cider/pyciderevalcap/ciderD/ciderD_scorer.py"

# Line #69 is wrong:
# pkl_file = pickle.load(open(os.path.join('data', df_mode + '.p'),'r'))

# It should be changed to:
pkl_file = pickle.load(open(os.path.join(df_mode),'r'))

can't reproduce to cider 54.2

I train a WEX model and get a cider score about 50, then train CST_MS_Greedy according to your options, bug cider score doesn't grow up by reinforcement learning. You model provided can't be loaded for test also. Can you give a hint about how to use your model or how to produce cider score 54.2?

Problem about multi processing

Thanks for your great work at first.
I run you code and find out the usage rate of my GPU is always 0% when calculate the scores of the val captions. I train to use multiprocessing and need to add "num_works" in torch.utils.data.DataLoader but I find you write the class all by yourself.
So is there a way to use the mutiprocessing?

Mean pooling for ResNet features?

Hi, I was wondering if you have used mean pooling to blend ResNet features for every frame into a 2048-D vector (representing the ResNet features for that video chunk)? If not, can you describe how did you merged features across the frames for each clip?

symbolic link in val_videodatainfo.json

Hi

Thanks for sharing this amazing git repo for your paper.

Regarding the symbolic link in val_videodatainfo.json

It seems that there is an issue when viewing/downloading from your google drive for this particular file.

This issue is preventing me from correctly generating metadata using the command below.

make pre_process

Could you kindly double-check the file? Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.