The task we focus on is the text to image generation. In this task, we want the model to generate realistic images that semantically match given text descriptions. We built a mobile app as the frontend and started a server based on the ControlGan model as the backend.
Input: A sentence to to describe the desired generrated image. Output: An image that semantically match given text descriptions.
- React-Native App
- Pretrained DAMSM and ControlGAN models
- One archive file containing source codes and datasets for training and deploying as the backend
- One Docker image ready to deploy
- One Dockerfile
Here we use ControlGan as our backbone network to generate high-quality and controllable images from user inputs. The structure of ControlGAN is as follows.
- DAMSM for bird. Download and save it to
DAMSMencoders/
- DAMSM for coco. Download and save it to
DAMSMencoders/
- ControlGAN for bird. Download and save it to
models/
- ControlGAN for coco. Download and save it to
models/
To train a ControlGAN, before execute the orders in the instructions below, you should unzip the archive file and then enter the folder code
.
DAMSM model includes text encoder and image encoder
- Pre-train DAMSM model for bird dataset:
python pretrain_DAMSM.py --cfg cfg/DAMSM/bird.yml --gpu 0
- Pre-train DAMSM model for coco dataset:
python pretrain_DAMSM.py --cfg cfg/DAMSM/coco.yml --gpu 0
- Train ControlGAN model for bird dataset:
python main.py --cfg cfg/train_bird.yml --gpu 0
- Train ControlGAN model for coco dataset:
python main.py --cfg cfg/train_coco.yml --gpu 0
*.yml
files include configuration for training and testing.
- Test ControlGAN model for bird dataset:
python main.py --cfg cfg/eval_bird.yml --gpu 0
- Test ControlGAN model for coco dataset:
python main.py --cfg cfg/eval_coco.yml --gpu 0
The backend server can be set up on any computer with at least one GPU on it. First, you should use docker to pull down the docker image:
docker pull mayukuner/text2img
and then run the docker image by:
docker run --gpus all -p 5000:5000 mayukuner/text2img
Please note, you could also use Dockerfile to build the docker image. But before doing this, you will have to download the zip file from link provided in download_code_and_data.txt.
Then the server will be successfully set up. The address will be
localhost:5000
To get the specific output image from one sentence, you could send a GET request with 3 parameters to the server as:
localhost:5000/generate?dataset=<dataset>&sentence=<sentence>&highlight=<word>
definition | example | |
---|---|---|
sentence | The sentence to generate the image | a herd of cows that are grazing on the grass |
dataset | The dataset that the model is trained on | COCO |
highlight | The highlighted word whose attention map will be masked on the original image | herd |
The server will respond with a JsonResponse in the form of:
{
“image_url”: <image_url>
}
where the variable <image_url>
indicates the url to the generated image. By further requesting the image file by <image_url>
, we will get the generated image.
The corresponding output for the example input in the above table is:
Our React-Native App is running on the Expo.
cd text2image_app
yarn install
expo start
Download an Expo Client. Open Expo Client on your device. Scan the QR code printed by expo start with Expo Client (Android) or Camera (iOS). You may have to wait a minute while your project bundles and loads for the first time.
- Lee, Minhyeok, and Junhee Seok. "Controllable generative adversarial network." IEEE Access 7 (2019): 28158-28169.
- Xu, Tao, et al. "Attngan: Fine-grained text to image generation with attentional generative adversarial networks." Proceedings of the IEEE conference on computer vision and pattern recognition. 2018.