GithubHelp home page GithubHelp logo

iamsulabh / show_tell Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 18 KB

Implementation of 'Show and Tell: A Neural Image Caption Generator' paper by [Vinyals et al.]

License: MIT License

Python 100.00%

show_tell's Introduction

Image captioning and Analysis

Implementation of 'Show and Tell: Lessons Learned from the 2015 MSCOCO Image Captioning Challenge' paper by [Vinyals et al.] (https://ieeexplore.ieee.org/abstract/document/7505636) published in IEEE TPAMI.

Analysis with different combinations of CNN+RNN, data augmentation, pre-trained embeddings.

Note: This is a work in progress. I will write a detailed blog post on medium.com explaining all the code and detailing the steps.

This will not be an exact implementation of the paper and differs in these following ways:

  1. The authors use an ensemble of models but we use only one model. The authors have found that using ensembles enables them to achieve a boost of 2 points with respect to BLEU metrics. However, as we shall see, certain optimizations allow us to acheive better scores on certain metrics than the scores mentioned in the paper.
  2. The authors extract a single image feature vector from the penulimate layer of CNN but we extract a set of feature vectors from a lower layer. We have noticed that this allows us to acheive better performance.
  3. We use pre-trained RESNET-152 Convolutional Neural Network instead of Inception used in the paper and it helps to improve results.
  4. Implementation details: We do not use batch normalization to process inputs as no noticeable improvements were observed.

In addition to implementing the paper, we perform additional analysis with the following changes:

  1. We experiment with image data augmentation and observe that it helps boost performance (depending on which technique used.)
  2. We experiment with pre-trained word embeddings. Some pre-trained word embeddings help in increasing evaluation scores.
  3. We experiment with different CNNs to extract image feature vectors.
  4. We experiment with different versions of Recurrent Neural Networks: LSTM, GRU, Bidirectional RNN(/LSTM/GRU).

show_tell's People

Contributors

iamsulabh avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.