GithubHelp home page GithubHelp logo

lucasace / image_captioning Goto Github PK

View Code? Open in Web Editor NEW
0.0 2.0 0.0 11.03 MB

Image Captioning using Encoder Decoder network , Pretrained models given

License: MIT License

Python 100.00%
image-captioning tensorflow encoder-decoder-model checkpoints flickr8k

image_captioning's Introduction

Image Captioning

Dataset Preparation

  • Clone this repsoitory using
    git clone https://github.com/lucasace/Image_Captioning.git 
  • Download the Flickr8k Image and Text dataset from here and here respectively
  • Unzip both the dataset and text files and place it inside the repository folder

I want to train the model

To train the model simply run

python3 main.py --type train --checkpoint_dir <checkpointdir> --cnnmodel <cnnmodel> --image_folder <imagefolder location> --caption_file <location to token.txt> --feature_extraction <True or False>
  • The checkpoint dir is the place where your model checkpoints are going to be saved.
  • cnnmodel is either inception or vgg16,default is inception
  • imagefolder is location of the folder with all the images
  • caption_file is Location to 'Flickr8k.token.txt'
  • feature_extraction - True or False,default is True
    • True if you havent extracted the image features
    • False if you have already extracted the image features This saves time and memory when training again
  • batch_size batch_size of training and validation default is 128

Testing the model

python3 main.py --type test --checkpoint_dir <checkpointdir> --cnnmodel <cnnmodel> --image_folder <imagefolder location> --caption_file <location to token,txt> --feature_extraction <True or False>
  • Download the checkpoints from here if your cnn_model is inception ,if your cnn_model is vgg 16 download from here or you can use your own trained checkpoints
  • All arguments are same as in training model

I just want to caption

python3 main.py --type caption --checkpoint_dir <checkpointdir> --cnnmodel <cnnmodel> --caption_file <location to token,txt> --to_caption <image file path to caption>
  • Download the checkpoints from here
    • Note these are inception checkpoints and for vgg16 download from here
  • captionfile is required to make the vocabulary

Custom dataset

if you want to train it on a custom dataset kindly make changes in the dataset.py folder to make it suitable for your dataset

Results

Model Type CNN_Model Bleu_1 Bleu_2 Bleu_3 Bleu_4 Meteor
Encoder-Decoder Inception_V3 60.12 51.1 48.13 39.5 25.8
VGG16 58.46 49.87 47.50 39.37 26.32

Here are some of the results:

  • 1
  • 2
  • 3

Things to Do

  • beam search
  • Image Captioning using Soft and Hard Attention
  • Image Captioning using Adversarial Training

Contributions

Any contributions are welcome

If there is any issue with the model or errors in the program, feel free to raise a issue or set up a PR.

References

  • O. Vinyals, A. Toshev, S. Bengio and D. Erhan, "Show and tell: A neural image caption generator," 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, 2015, pp. 3156-3164, doi: 10.1109/CVPR.2015.7298935.
  • Tensorflow documentation on Image Captioning
  • Machine Learning Mastery for dataset
  • nltk documentation for meteor score
  • RNN lecture by Standford University

image_captioning's People

Contributors

roysti10 avatar

Watchers

 avatar  avatar

image_captioning's Issues

Incorrect calculations of BLEU

Just wanted to let you know that you are incorrectly calculating the BLEU scores. You should split() the predicted captions as well as the reference captions. See this tutorial.

Your correctly calculated scores should be half of what you have now. Sorry to be the bearer of bad news.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.