GithubHelp home page GithubHelp logo

text2image's Introduction

Controllable Text-to-Image Generation

Text to Image -- the Task

The task we focus on is the text to image generation. In this task, we want the model to generate realistic images that semantically match given text descriptions. We built a mobile app as the frontend and started a server based on the ControlGan model as the backend.

Input: A sentence to to describe the desired generrated image. Output: An image that semantically match given text descriptions.

Deliverables

Backbone Model

Here we use ControlGan as our backbone network to generate high-quality and controllable images from user inputs. The structure of ControlGAN is as follows.

Pretrained models

Pretrained DAMSM Model

Pretrained ControlGAN Model

Training Phase

To train a ControlGAN, before execute the orders in the instructions below, you should unzip the archive file and then enter the folder code.

DAMSM model includes text encoder and image encoder

  • Pre-train DAMSM model for bird dataset:
python pretrain_DAMSM.py --cfg cfg/DAMSM/bird.yml --gpu 0
  • Pre-train DAMSM model for coco dataset:
python pretrain_DAMSM.py --cfg cfg/DAMSM/coco.yml --gpu 0

ControlGAN model

  • Train ControlGAN model for bird dataset:
python main.py --cfg cfg/train_bird.yml --gpu 0
  • Train ControlGAN model for coco dataset:
python main.py --cfg cfg/train_coco.yml --gpu 0

*.yml files include configuration for training and testing.

Testing Phase

  • Test ControlGAN model for bird dataset:
python main.py --cfg cfg/eval_bird.yml --gpu 0
  • Test ControlGAN model for coco dataset:
python main.py --cfg cfg/eval_coco.yml --gpu 0

Text To Image -- The App

Server side

The backend server can be set up on any computer with at least one GPU on it. First, you should use docker to pull down the docker image:

docker pull mayukuner/text2img

and then run the docker image by:

docker run --gpus all -p 5000:5000 mayukuner/text2img

Please note, you could also use Dockerfile to build the docker image. But before doing this, you will have to download the zip file from link provided in download_code_and_data.txt.

Then the server will be successfully set up. The address will be

localhost:5000

To get the specific output image from one sentence, you could send a GET request with 3 parameters to the server as:

localhost:5000/generate?dataset=<dataset>&sentence=<sentence>&highlight=<word>
definition example
sentence The sentence to generate the image a herd of cows that are grazing on the grass
dataset The dataset that the model is trained on COCO
highlight The highlighted word whose attention map will be masked on the original image herd

The server will respond with a JsonResponse in the form of:

{
    “image_url”: <image_url>
}

where the variable <image_url> indicates the url to the generated image. By further requesting the image file by <image_url>, we will get the generated image.

The corresponding output for the example input in the above table is:

Client side

Our React-Native App is running on the Expo.

cd text2image_app
yarn install
expo start

Download an Expo Client. Open Expo Client on your device. Scan the QR code printed by expo start with Expo Client (Android) or Camera (iOS). You may have to wait a minute while your project bundles and loads for the first time.

Reference

  • Lee, Minhyeok, and Junhee Seok. "Controllable generative adversarial network." IEEE Access 7 (2019): 28158-28169.
  • Xu, Tao, et al. "Attngan: Fine-grained text to image generation with attentional generative adversarial networks." Proceedings of the IEEE conference on computer vision and pattern recognition. 2018.

Links

text2image's People

Contributors

qwertier24 avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.