GithubHelp home page GithubHelp logo

lixiangnlp / tf-mrnn Goto Github PK

View Code? Open in Web Editor NEW

This project forked from mjhucla/tf-mrnn

0.0 3.0 0.0 371 KB

Re-implementation of the m-RNN model using TensorFLow

Home Page: http://www.stat.ucla.edu/~junhua.mao/m-RNN.html

Jupyter Notebook 90.30% Python 9.34% Shell 0.36%

tf-mrnn's Introduction

TF-mRNN: a TensorFlow library for image captioning.

Created by Junhua Mao

Introduction

This package is a re-implementation of the m-RNN image captioning method using TensorFlow. The training speed is optimized with buckets of different lengths of the training sentences. It also support the Beam Search method to decode image features into sentences.

Citing m-RNN

If you find this package useful in your research, please consider citing:

@article{mao2014deep,
  title={Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN)},
  author={Mao, Junhua and Xu, Wei and Yang, Yi and Wang, Jiang and Huang, Zhiheng and Yuille, Alan},
  journal={ICLR},
  year={2015}
}

Requirements

Basic installation (sufficient for the demo)

  1. install MS COCO caption toolkit

  2. Suppose that toolkit is install on $PATH_COCOCap and this package is install at $PATH_mRNN_CR. Create a soft link to COCOCap as follows:

cd $PATH_mRNN_CR
ln -sf $PATH_COCOCap ./external/coco-caption
  1. Download necessary data for using a trained m-RNN model.
bash setup.sh

Demo

This demo shows how to use a trained model to generate descriptions for an image. Run demo.py or view demo.ipynb

The configuration of the trained model is: ./model_conf/mrnn_GRU_conf.py.

The model achieves a CIDEr of 0.890 and a BLEU-4 of 0.282 on the 1000 validation images used in the m-RNN paper. It adopts a transposed weight sharing strategy that accelerates the training and regularizes the network.

Training your own models on MS COCO

Download or extract image features for images in MS COCO.

Use the following shell to download extracted image features (Inception-v3 or VGG) for MS COCO.

# If you want to use inception-v3 image feature, then run:
bash ./download_coco_inception_features.sh
# If you want to use VGG image feature, then run:
bash ./download_coco_vgg_features.sh

Alternatively, you can extract image features yourself, you should download images from MS COCO dataset first. Please make sure that we can find the image on ./datasets/ms_coco/images/ (should have at least train2014 and val2014 folder). After that, type:

python ./exp/ms_coco_caption/extract_image_features_all.py

Generate dictionary.

python ./exp/ms_coco_caption/create_dictionary.py

Train and evaluate your model.

python ./exp/ms_coco_caption/mrnn_trainer_mscoco.py

In the training, you can see the loss of your model, but it sometimes very helpful to see the metrics (e.g. BLEU) of the generated sentences for all the checkpoints of the model. You can simply open another terminal:

python ./exp/ms_coco_caption/mrnn_validator_mscoco.py

The trained model, and the evaluation results, are all shown in ./cache/models/mscoco/

Training your models on other datasets

You should arrange the annotation of the other datasets in the same format of our MS COCO annotation format. See ./datasets/ms_coco/mscoco_anno_files/README.md for details.

TODO

  1. Allow end-to-end finetuning of the vision network parameters.

tf-mrnn's People

Contributors

mjhucla avatar

Watchers

James Cloos avatar Xiang Li avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.