GithubHelp home page GithubHelp logo

qianqq / image2text Goto Github PK

View Code? Open in Web Editor NEW

This project forked from kleinyuan/image2text

0.0 2.0 0.0 38 KB

A deep learning project to tell a story with an image or a video.

Makefile 0.52% Python 98.98% Shell 0.50%

image2text's Introduction

Intro

This repo is to implement a multi-modal natural language model with tensorflow.

Dependencies DataSets
python 2.7
tensorflow
lasagne
Theano
IAPR TC-12

Project Overview

  1. Firstly, a word embedding with word2vec net is trained against iaprtc12 datasets.

  2. Secondly, the filtered (meaning, if the description is too long, we only keep the first sentence) word vectors for each description of image are used as target output of a CNN network

Setup

For various systems, you need to use different tools to install tensorflow, lasagne, theano, nolearn, ... dependencies, first.

Then, simply run below scripts to download the datasets

Run:

bash setup.sh

or:

make setup

Network Design

Word2Vec StoryNet
word2vec storynet

Training

Run:

python train.py

or:

make train

Optimizer Loss
MomentumOptimizer MSE Loss

learning_curve

Pre-trained Model

Download here

Testing and Results

Run:

make demo

Train on your own

  1. Run setup bash script to download datasets

  2. Run train.py or with makefile

  3. Freeze tensorflow model with the command provided in makefile

  4. Run app.py or with makefile

Data Sets

The image collection of the IAPR TC-12 Benchmark consists of 20,000 still natural images taken from locations around the world and comprising an assorted cross-section of still natural images. This includes pictures of different sports and actions, photographs of people, animals, cities, landscapes and many other aspects of contemporary life.

Each image is associated with a text caption in up to three different languages (English, German and Spanish) . These annotations are stored in a database which is managed by a benchmark administration system that allows the specification of parameters according to which different subsets of the image collection can be generated.

The IAPR TC-12 Benchmark is now available free of charge and without copyright restrictions.

More details.

Sample annotations:


    <DOC>
    <DOCNO>annotations/01/1000.eng</DOCNO>
    <TITLE>Godchild Cristian Patricio Umaginga Tuaquiza</TITLE>
    <DESCRIPTION>a dark-skinned boy wearing a knitted woolly hat and a light and dark grey striped jumper with a grey zip, leaning on a grey wall;</DESCRIPTION>
    <NOTES></NOTES>
    <LOCATION>Quilotoa, Ecuador</LOCATION>
    <DATE>April 2002</DATE>
    <IMAGE>images/01/1000.jpg</IMAGE>
    <THUMBNAIL>thumbnails/01/1000.jpg</THUMBNAIL>
    </DOC>

References:

  1. Dong, Jianfeng, Xirong Li, and Cees GM Snoek. "Word2VisualVec: Image and video to sentence matching by visual feature prediction." CoRR, abs/1604.06838 (2016).

  2. Karpathy, Andrej, and Li Fei-Fei. "Deep visual-semantic alignments for generating image descriptions." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015.

  3. Kiros, Ryan, Ruslan Salakhutdinov, and Rich Zemel. "Multimodal neural language models." Proceedings of the 31st International Conference on Machine Learning (ICML-14). 2014.

  4. word2vec tutorial

image2text's People

Contributors

kleinyuan avatar

Watchers

James Cloos avatar vad avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.