Controllable Text-to-Image Generation

Text to Image -- the Task

The task we focus on is the text to image generation. In this task, we want the model to generate realistic images that semantically match given text descriptions. We built a mobile app as the frontend and started a server based on the ControlGan model as the backend.

Input: A sentence to to describe the desired generrated image. Output: An image that semantically match given text descriptions.

Deliverables

React-Native App
Pretrained DAMSM and ControlGAN models
One archive file containing source codes and datasets for training and deploying as the backend
One Docker image ready to deploy
One Dockerfile

Backbone Model

Here we use ControlGan as our backbone network to generate high-quality and controllable images from user inputs. The structure of ControlGAN is as follows.

Pretrained models

Pretrained DAMSM Model

DAMSM for bird. Download and save it to DAMSMencoders/
DAMSM for coco. Download and save it to DAMSMencoders/

Pretrained ControlGAN Model

ControlGAN for bird. Download and save it to models/
ControlGAN for coco. Download and save it to models/

Training Phase

To train a ControlGAN, before execute the orders in the instructions below, you should unzip the archive file and then enter the folder code.

DAMSM model includes text encoder and image encoder

Pre-train DAMSM model for bird dataset:

python pretrain_DAMSM.py --cfg cfg/DAMSM/bird.yml --gpu 0

Pre-train DAMSM model for coco dataset:

python pretrain_DAMSM.py --cfg cfg/DAMSM/coco.yml --gpu 0

ControlGAN model

Train ControlGAN model for bird dataset:

python main.py --cfg cfg/train_bird.yml --gpu 0

Train ControlGAN model for coco dataset:

python main.py --cfg cfg/train_coco.yml --gpu 0

*.yml files include configuration for training and testing.

Testing Phase

Test ControlGAN model for bird dataset:

python main.py --cfg cfg/eval_bird.yml --gpu 0

Test ControlGAN model for coco dataset:

python main.py --cfg cfg/eval_coco.yml --gpu 0

Text To Image -- The App

Server side

The backend server can be set up on any computer with at least one GPU on it. First, you should use docker to pull down the docker image:

docker pull mayukuner/text2img

and then run the docker image by:

docker run --gpus all -p 5000:5000 mayukuner/text2img

Please note, you could also use Dockerfile to build the docker image. But before doing this, you will have to download the zip file from link provided in download_code_and_data.txt.

Then the server will be successfully set up. The address will be

localhost:5000

To get the specific output image from one sentence, you could send a GET request with 3 parameters to the server as:

localhost:5000/generate?dataset=<dataset>&sentence=<sentence>&highlight=<word>

	definition	example
sentence	The sentence to generate the image	a herd of cows that are grazing on the grass
dataset	The dataset that the model is trained on	COCO
highlight	The highlighted word whose attention map will be masked on the original image	herd

The server will respond with a JsonResponse in the form of:

{
    “image_url”: <image_url>
}

where the variable <image_url> indicates the url to the generated image. By further requesting the image file by <image_url>, we will get the generated image.

The corresponding output for the example input in the above table is:

Client side

Our React-Native App is running on the Expo.

cd text2image_app
yarn install
expo start

Download an Expo Client. Open Expo Client on your device. Scan the QR code printed by expo start with Expo Client (Android) or Camera (iOS). You may have to wait a minute while your project bundles and loads for the first time.

Reference

Lee, Minhyeok, and Junhee Seok. "Controllable generative adversarial network." IEEE Access 7 (2019): 28158-28169.
Xu, Tao, et al. "Attngan: Fine-grained text to image generation with attentional generative adversarial networks." Proceedings of the IEEE conference on computer vision and pattern recognition. 2018.

qwertier24 / text2image Goto Github PK

text2image's Introduction

Controllable Text-to-Image Generation

Text to Image -- the Task

Deliverables

Backbone Model

Pretrained models

Pretrained DAMSM Model

Pretrained ControlGAN Model

Training Phase

DAMSM model includes text encoder and image encoder

ControlGAN model

Testing Phase

Text To Image -- The App

Server side

Client side

Reference

Links

text2image's People

Contributors

Watchers

Recommend Projects

Recommend Topics

Recommend Org

Jobs