GithubHelp home page GithubHelp logo

inpaint-gan's Introduction

inpaint-gan

This was an experiment to use Langevin sampling and a pre-trained GAN for image inpainting.

Disclaimer: this repo implements a random idea/experiment, and doesn't represent polished work or something to use in production. The idea turned out now to work very well, anyway.

Currently this repo is based on the GANs from stylegan2-ada-pytorch.

How it works

Sohl-Dickstein et al propose a simple way to condition a diffusion model. If you have a classifier p(y|x) and a diffusion model over p(x), then a small change in each sampling step can be used to sample from p(x)p(y|x).

To apply the above idea to GANs, we can write a diffusion model p(z) over the GAN latent space. If the latents are Gaussian, then the diffusion model can be written down in closed form. The classifier p(y|z) can be implemented in image space by evaluating p(y|G(z)) where we sample z one or more times from the diffusion process. The resulting algorithm looks like SGD over the latent vector z, but with noise injected into each step (like Langevin sampling).

So this tells us how to condition a GAN on some classifier, but how does this relate to inpainting? Well, we can write down a special "classifier" that simply returns the MSE between the unmasked part of the image and the original image. This can be seen as the log-likelihood of the unmasked region under a diagonal Gaussian distribution. While this is a rough approximation of the true image distribution, it may work in practice!

Results

Here are some completions from a CIFAR-10 model using 500 Langevin steps. This can be reproduced with the experiments.ipynb notebook:

Sample completions

The main issue with this method seems to be that it requires SGD to find points in latent space which generate the unmasked region of the image. This actually seems to be non-trivial, so the inpainted samples never actually match up perfectly with the original image. We can see this by tracking the "energy function", i.e. the MSE between the generated and original image region.

The main hyperparameters to be tuned from the above method are:

  • Number of z samples (and G evals) per step. More means less noise in p(y|z), and possibly better exploration over z.
  • Number of Langevin steps. More means that the diffusion conditioning approximation is more accurate.
  • Temperature of z samples; 1.0 is unbiased, 0.0 is much lower variance and makes optimization easier. With temperature 0.0, more than one z sample per Langevin step is redundant.
  • Gradient clipping. Without clipping the gradients of the MSE w.r.t. z, sampling often explodes.

inpaint-gan's People

Contributors

unixpickle avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

Forkers

wn1695173791

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.