GithubHelp home page GithubHelp logo

zpdesu / ii2s Goto Github PK

View Code? Open in Web Editor NEW
109.0 8.0 10.0 111.53 MB

Improved StyleGAN Embedding: Where are the Good Latents?

Python 85.12% C++ 1.63% Cuda 13.26%
gan generative-model image-embedding deeo-learning

ii2s's Introduction

Improved StyleGAN Embedding: Where are the Good Latents?

Peihao Zhu, Rameen Abdal, Yipeng Qin, John Femiani, Peter Wonka

arXiv | BibTeX | Video

Abstract StyleGAN is able to produce photorealistic images that are almost indistinguishable from real photos. The reverse problem of finding an embedding for a given image poses a challenge. Embeddings that reconstruct an image well are not always robust to editing operations. In this paper, we address the problem of finding an embedding that both reconstructs images and also supports image editing tasks. First, we introduce a new normalized space to analyze the diversity and the quality of the reconstructed latent codes. This space can help answer the question of where good latent codes are located in latent space. Second, we propose an improved embedding algorithm using a novel regularization method based on our analysis. Finally, we analyze the quality of different embedding algorithms. We compare our results with the current state-of-the-art methods and achieve a better trade-off between reconstruction quality and editing quality.

Description

Official Implementation of "Improved StyleGAN Embedding: Where are the Good Latents?".

Getting Started

Prerequisites

  • Linux or macOS
  • NVIDIA GPU + CUDA CuDNN
  • Python 3

Installation

  • Clone the repository:
git clone https://github.com/ZPdesu/II2S.git
cd II2S
  • Dependencies: We recommend running this repository using Anaconda. All dependencies for defining the environment are provided in environment/environment.yml.

Pretrained Models

If the automatic download doesn't work, please download the pre-trained models from the following links.

Path Description
FFHQ StyleGAN StyleGAN model pretrained on FFHQ with 1024x1024 output resolution.
Metfaces StyleGAN StyleGAN model pretrained on Metfaces with 1024x1024 output resolution.
AFHQ-Dog StyleGAN StyleGAN model pretrained on AFHQ-Dog with 512x512 output resolution.
AFHQ-Cat StyleGAN StyleGAN model pretrained on AFHQ-Cat with 512x512 output resolution.
AFHQ-Wild StyleGAN) StyleGAN model pretrained on AFHQ-Wild with 512x512 output resolution.
Face Landmark Model Face landmark model used in dlib.

By default, we assume that all models are downloaded and saved to the directory pretrained_models.

Embedding

To embed images, make sure the hyper-parameters are configured in options/face_embed_options.py. Then run the code

python main.py --input_dir XXX --output_dir XXX

Different input formats

Please perform the following adjustments in main.py to expand the input types.

  1. Input folder
ii2s.invert_images(image_path=args.input_dir, output_dir=args.output_dir)
  1. Image path
ii2s.invert_images(image_path='input/28.jpg', output_dir=args.output_dir)
  1. Image path list
ii2s.invert_images(image_path=['input/28.jpg', 'input/90.jpg'], output_dir=args.output_dir)

Save output and return latents

To save output and return latent codes, users can make the follwoing adjustments in main.py.

final_latents = ii2s.invert_images(image_path=args.input_dir, output_dir=args.output_dir, return_latents=True, save_output=True)

Align input images

By default, the input should be aligned images with 1024x1024 resolution. Users can run align_face.py to align unprocessed images and save them in another folder, then make the following modifications in main.py.

final_latents = ii2s.invert_images(image_path=args.input_dir, output_dir=args.output_dir, return_latents=True, save_output=True, align_input=False)

We also allow the alignment step during the embedding process.

final_latents = ii2s.invert_images(image_path=args.input_dir, output_dir=args.output_dir, return_latents=True, save_output=True, align_input=True)

BibTeX

@misc{zhu2020improved,
    title={Improved StyleGAN Embedding: Where are the Good Latents?},
    author={Peihao Zhu and Rameen Abdal and Yipeng Qin and John Femiani and Peter Wonka},
    year={2020},
    eprint={2012.09036},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}

ii2s's People

Contributors

zpdesu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ii2s's Issues

About the latent space

Great work, I'm wondering is it reasonable to choose the latent_p_norm as the initial latent to optimizing?

A question about the paper

Hi, thank you for your wonderful work.
I have a question about your paper in Section 4.1. You purpose 3 versions to employ the embedding, and mentioned that you will explain why only focuses on Equation 9 in supp. However, I haven't found a related explanation. Could you tell me the reason?
ๅ›พ็‰‡

questions about pca

good job first, and does the value of *_PCA.npz need for update for our own trained stylegan?

Some questions about II2S

First of all, thank you for presenting this marvelous work.

I have some questions to consult that,

  1. Does your inverted embedding belong to W+ space? In another word, can the saved latent code be directly used as the input of a StyleGAN2 model?
  2. How many iterations does it take to generate a roughly accurate reconstruction result? As 1300 iterations on 1024 resolution is sometimes too slow.
  3. Is it possible to embed a batch of images at the same time?

Thank you very much!

The code does not work

Hi, many thanks for releasing the code!
Unfortunately it does not work. After some debugging I figured it's stuck during PCA decomposition. And if I reduce the number of generated latents to the reasonable extent (1k-10k) there is just segmentation fault

I can see PCA's are available in the repo for main checkpoints, but I need that for my custom checkpoint

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.