hsinger04 / vogue-reimplementation Goto Github PK

View Code? Open in Web Editor NEW

1.0 1.0 0.0 87.41 MB

A reimplementation of the VOGUE paper for the IANNwTF WS20/21 course at the University of Osnabrück

Jupyter Notebook 97.06% Python 2.72% Cuda 0.22% Batchfile 0.01% Shell 0.01%

vogue-reimplementation's People

Stargazers

Watchers

vogue-reimplementation's Issues

Learn try-on

Have a 2-layer MLP map from w to sigma. Only a single one is needed, as w_latent stays the same for all styles / resolutions.
Have p as a trainable vector, q = sigmoid(p) and Q = DiagonalMatrix(q) for each style.
Calculate losses and optimize.

Reducing pixel size?

Either: Reduce image and segmentation dimensions or change StyleGAN to input / output 1024 images

Project images to latent vector z

Create a CNN that maps from image to latent vector z.
Train it by minimizing the perceptual loss between input and output by StlyeGAN (unclear: perceptual loss)

Creating try_on dataset

Things to optimize – prefer generator code: Simple to write and memory-efficient. And maybe try_on doesn‘t need long to train
- Speed
- Memory
What the dataset should look like
- Returns: latent_p, latent_g, seg_p, seg_g

Overall project path

Project images to latent vector z
Learn try-on

Make sure p and g in right order

Relevant code sections: TryOn.ipynb, generator, modulated_conv2d

Understand GANOutputs

Still need to understand it. Very mysterious code.

Editing localization loss

Calculate A (Is the style matrix. See Fig. 2 in Editing in Style.)
Get Segmentation (tf.resize)
Normalize A
(Downsample Segmentation)
Follow Eq. 3 from VOGUE (Just multiply A^2 with U and then use reduce_mean)
Unclear: Right side of page 4

Understanding localization loss

In particular, what A is.

General TODOs

Refactor code
Fix Encoder (find out why it fails by maybe inspecting the loss (via tensorboard) and find an alternative architecture)
GitHub README and .yaml
Analysis of results

Project dataset

Ideally, we would use the original authors' dataset. However, I also found https://github.com/royorel/FFHQ-Aging-Dataset, which is good for the following two reasons:

1.) It builds on the FFHQ dataset, which StyleGAN was also trained on --> might allow us to be more flexible with regards to using transfer learning
2.) It contains segmentation labels