encoder-decoder-image-captioning's Introduction

Image Captioning With Encoder-Decoder Architecture

Project for the course Deep Learning 046211 (Technion) Winter 2022-2023.

Video:

YouTube - https://youtu.be/HsJHZepSWHU (in hebrew).

Encoder Decoder Image Captioning

Background

Image captioning is the task of generating short sentences that describe the content of an image. The goal of this project is to implement an encoder-decoder network for image captioning. The encoder is a pre-trained CNN, and for the decoder we used both LSTM and Transformer networks. The network is trained on the Flickr8k dataset.

Prerequisites

Full lists of requirements are in the requirements.txt file. Require python version is 3.10.9. To install the requirements run: pip install -r requirements.txt

Files in the repository

File name	Purpsoe
'data.py'	Data loader and additional scripts for the flickr8k datasets.
'models.py'	All the models used in the project (Transformer, LSTM, resnet50k).
'train.py'	Training script.
Example_Images	Folder with example images for the README.md file.
'LSTM_optuna.py'	Optuna hyperparameter tuning script for the LSTM model.
'Transformer_optuna.py'	Optuna hyperparameter tuning script for the Transformer model.
'Transformer_full.csv'	Results for the Transformer model during final training.

Results

The full results for the Transformer model training are in the 'Transformer_full.csv' file. In order to replicate the results run train.py without changing the hyperparameters, seed or model class.

Training

In order to train the model one should clone the repository, select the model class (Transformer / LSTM) and select the required hyperparameters in the script (the optimal hyperparameters we used are already in the script).

Examples

References:

https://www.kaggle.com/code/itaishufaro/flickr-30k-data-loader-preparation-pytorch/edit (Based data loader on this script).

Recommend Projects

itaishufaro / encoder-decoder-image-captioning Goto Github PK

encoder-decoder-image-captioning's Introduction

Image Captioning With Encoder-Decoder Architecture

Project for the course Deep Learning 046211 (Technion) Winter 2022-2023.

Background

Prerequisites

Files in the repository

Results

Training

Examples

References:

encoder-decoder-image-captioning's People

Contributors

Stargazers

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs