GithubHelp home page GithubHelp logo

jityan / ssbigan Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 2.75 MB

The code for "Text-to-image synthesis with self-supervised bi-stage generative adversarial network"

License: MIT License

Python 100.00%
generative-adversarial-network generative-model image-generation machine-learning machine-learning-algorithms self-supervised-learning text-to-image text-to-image-generation text-to-image-synthesis

ssbigan's Introduction

Text-to-image synthesis with self-supervised bi-stage generative adversarial network

This repository provides the pytorch code for the paper "Text-to-image synthesis with self-supervised bi-stage generative adversarial network" by Yong Xuan Tan, Chin Poo Lee, Mai Neo, Kian Ming Lim, Jit Yan Lim.

Environment

The code is tested on Windows 10 with Anaconda3 and following packages:

  • python 3.7.13
  • pytorch 1.4.0
  • torchvision 0.5

Dataset

We follow the same procedure and structure as SSTIS.

Download the preprocessed char-CNN-RNN text embeddings for flowers and birds and the images for flowers and birds. Put them into ./data/oxford and ./data/cub folder.

Experiments

To train on Oxford:

python main.py --dataset flowers --exp_num oxford_exp

To evaluate on Oxford:

python main.py --dataset flowers --exp_num oxford_exp --is_test true

Pre-trained Models

Download the pretrained models. Extract it to the saved_model folder.

Examples generated by SSBi-GAN:

Citation

If you find this repo useful for your research, please consider citing the paper:

@article{TAN202343,
  title = {Text-to-image synthesis with self-supervised bi-stage generative adversarial network},
  journal = {Pattern Recognition Letters},
  volume = {169},
  pages = {43-49},
  year = {2023},
  issn = {0167-8655},
  doi = {https://doi.org/10.1016/j.patrec.2023.03.023},
  url = {https://www.sciencedirect.com/science/article/pii/S0167865523000880},
  author = {Yong Xuan Tan and Chin Poo Lee and Mai Neo and Kian Ming Lim and Jit Yan Lim},
  keywords = {Text-to-image-synthesis, Generative adversarial network, Self-supervised learning, GAN},
  abstract = {Text-to-image synthesis is challenging as generating images that are visually realistic and semantically consistent with the given text description involves multi-modal learning with text and image. To address the challenges, this paper presents a text-to-image synthesis model that utilizes self-supervision and bi-stage image distribution architecture, referred to as the Self-Supervised Bi-Stage Generative Adversarial Network (SSBi-GAN). The self-supervision diversifies the learned representation thus improving the quality of the synthesized images. Besides that, the bi-stage architecture with Residual network enables the generation of larger images with finer visual contents. Not only that, some enhancements including L1 distance, one-sided smoothing and feature matching are incorporated to enhance the visual realism and semantic consistency of the images as well as the training stability of the model. The empirical results on Oxford-102 and CUB datasets corroborate the ability of the proposed SSBi-GAN in generating visually realistic and semantically consistent images.}
}

Contacts

For any questions, please contact:

Yong Xuan Tan ([email protected])
Jit Yan Lim ([email protected])

Acknowlegements

License

This code is released under the MIT License (refer to the LICENSE file for details).

ssbigan's People

Contributors

jityan avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.