GithubHelp home page GithubHelp logo

Comments (10)

lucidrains avatar lucidrains commented on May 3, 2024 4

@XavierXiao let's go with your guess 😄 will make it happen by week's end

from gigagan-pytorch.

XavierXiao avatar XavierXiao commented on May 3, 2024 2

OK thanks for the explanation, that is clear for you current implementation. I personally think the multi-scale design may be a bit different though. The novel design of multi-scale loss of GigaGAN is that the discriminator outputs totally L(L+1)/2 predictions, but looks like your current implementation has L predictions (i.e., one prediction for each resolution of the pyramid).

How to make L(L+1)/2 predictions? Well I guess this is what the paper means "makes independent predictions for each image scale".

So, I guess, for a collection of rgbs produced from the generator, you first send the highest resolution image x_64 to the discriminator, and this will return 5 predictions, one at each resolution. Then you INDEPENDENTLY send the second highest resolution images x_32 to the discriminator, this will return 4 predictions, as so on.

How to send different resolution images independently to the discriminator? I think according to what the paper said on the very top of the right column in page 8, it has a fromRGB layer at each resolution that process RGB images with different size and map it to a higher # of channels. So for 64x64x3 rgb input, it first goes through the fromRGB layer at 64 resolution, and then the resulting tensor will goes through the FIRST discriminator block, and then proceed with later blocks. The 32x32x3 rgb input will first goes through the fromRGB layer at 32 resolution, and the resulting tensor will be directly send to the SECOND discriminator block, and then proceed with later blocks. And so on.

This is the most reasonable guess I can have after reading the paper really carefully. Let me know what you think!

from gigagan-pytorch.

XavierXiao avatar XavierXiao commented on May 3, 2024 1

Sorry for the late reply due to July 4th holidays. I took a careful look at the new discriminator implementation. A couple of questions:

  1. I try to go over the computational graph in my mind, but I am still a bit confused about the input to the discriminator. Could you confirm that: assuming the highest resolution of image is 64x64, if I want to train the discriminator (i.e., both real and fake images are sent to the discriminator), then, images should be a 64x64x3 generated image, and rbgs should be a collection of generated image of different sizes (without the highest size), and real_images should be a 64x64x3 real image? My understanding of the code is based on this input format, so correct me if I am wrong.
  2. The predictor network outputs HxWx1, but it seems like we need to obtain a R/F prediction from each predictors. Although the paper does not say explicitly, do you think we should let predictor output 1x1 score?

Others all look good to me! BTW, it is really smart to implement the independent processing via batch dimension concatenation!

from gigagan-pytorch.

lucidrains avatar lucidrains commented on May 3, 2024

oh yes, you caught another bug for 1., thank you for the code review!

so the aim of the Discriminator was to support both fake + real images being fed in, as well as only fake (for the generator training). only one logit is outputted per batch element, and that logit is high if fake and low if real (or vice versa, as long as you flip the loss when training the generator)

for 2. i thought the multi-scale was referring to being fed in the rgbs output by the generator at different stages. i could be wrong too

from gigagan-pytorch.

lucidrains avatar lucidrains commented on May 3, 2024

i'll get around to auto-handling the hinge loss within the Discriminator instance tomorrow, as well as gradient penalties

from gigagan-pytorch.

lucidrains avatar lucidrains commented on May 3, 2024

example of the logic i'll likely just copy-paste over https://github.com/lucidrains/lightweight-gan/blob/main/lightweight_gan/lightweight_gan.py#L1226 , with modification to support distributed using huggingface accelerate

from gigagan-pytorch.

lucidrains avatar lucidrains commented on May 3, 2024

@XavierXiao want to see if the latest changes are more aligned with your expectations?

from gigagan-pytorch.

XavierXiao avatar XavierXiao commented on May 3, 2024

Wow so fast! Will take a look tomorrow.

from gigagan-pytorch.

nbardy avatar nbardy commented on May 3, 2024
image image image

Looks about right looking at the code to me.

One part I'm unsure about is if the images passed in to the discriminator should be from different steps of the pyramid from the generator. Or just resized versions of the final layer.

The prior would be skipping lots of big layers in the middle. Which feels like it lines up with the paper details about it being efficient for scaling up.

from gigagan-pytorch.

lucidrains avatar lucidrains commented on May 3, 2024

closing as addressed

from gigagan-pytorch.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.