GithubHelp home page GithubHelp logo

lookastarik / img2img-turbo Goto Github PK

View Code? Open in Web Editor NEW

This project forked from gaparmar/img2img-turbo

0.0 0.0 0.0 25.31 MB

One-step image-to-image with Stable Diffusion turbo: sketch2image, day2night, and more

License: MIT License

Python 90.31% CSS 9.69%

img2img-turbo's Introduction

img2img-turbo

Paper | Sketch2Image Demo

Cat Sketching

Fish Sketching

We propose a general method for adapting a single-step diffusion model, such as SD-Turbo, to new tasks and domains through adversarial learning. This enables us to leverage the internal knowledge of pre-trained diffusion models while achieving efficient inference (e.g., for 512x512 images, 0.29 seconds on A6000 and 0.11 seconds on A100).

Our one-step conditional models CycleGAN-Turbo and pix2pix-turbo can perform various image-to-image translation tasks for both unpaired and paired settings. CycleGAN-Turbo outperforms existing GAN-based and diffusion-based methods, while pix2pix-turbo is on par with recent works such as ControlNet for Sketch2Photo and Edge2Image, but with one-step inference.

One-Step Image Translation with Text-to-Image Models
Gaurav Parmar, Taesung Park, Srinivasa Narasimhan, Jun-Yan Zhu
CMU and Adobe, arXiv 2403.12036


Results

Paired Translation with pix2pix-turbo

Edge to Image

Generating Diverse Outputs

By varying the input noise map, our method can generate diverse outputs from the same input conditioning. The output style can be controlled by changing the text prompt.

Unpaired Translation with CycleGAN-Turbo

Day to Night

Night to Day

Clear to Rainy

Rainy to Clear


Method

Our Generator Architecture: We tightly integrate three separate modules in the original latent diffusion models into a single end-to-end network with small trainable weights. This architecture allows us to translate the input image x to the output y, while retaining the input scene structure. We use LoRA adapters in each module, introduce skip connections and Zero-Convs between input and output, and retrain the first layer of the U-Net. Blue boxes indicate trainable layers. Semi-transparent layers are frozen. The same generator can be used for various GAN objectives.

Getting Started

Environment Setup

  • We provide a conda env file that contains all the required dependencies.
    conda env create -f environment.yaml
    
  • Following this, you can activate the conda environment with the command below.
    conda activate img2img-turbo
    

Paired Image Translation (pix2pix-turbo)

  • The following command takes an image file and a prompt as inputs, extracts the canny edges, and saves the results in the directory specified.

    python src/inference_paired.py --model "edge_to_image" \
        --input_image "assets/bird.png" \
        --prompt "a blue bird" \
        --output_dir "outputs"
  • The following command takes a sketch and a prompt as inputs, and saves the results in the directory specified.

    python src/inference_paired.py --model "sketch_to_image_stochastic" \
    --input_image "assets/sketch.png" --gamma 0.4 \
    --prompt "ethereal fantasy concept art of an asteroid. magnificent, celestial, ethereal, painterly, epic, majestic, magical, fantasy art, cover art, dreamy" \
    --output_dir "outputs"

Unpaired Image Translation (CycleGAN-Turbo)

  • The following command takes an image file as input, and saves the results in the directory specified.
    python src/inference_unpaired.py --model "day_to_night" \
        --input_image "assets/day.png" --output_dir "outputs"

Gradio Demo

  • We provide a Gradio demo for the paired image translation tasks.
  • The following command will launch the sketch to image locally using gradio.
    gradio gradio_sketch2image.py
    

Acknowledgment

Our work uses the Stable Diffusion-Turbo as the base model with the following LICENSE.

img2img-turbo's People

Contributors

gaparmar avatar junyanz avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.