GithubHelp home page GithubHelp logo

genforce / interfacegan Goto Github PK

View Code? Open in Web Editor NEW
1.5K 42.0 283.0 13.78 MB

[CVPR 2020] Interpreting the Latent Space of GANs for Semantic Face Editing

Home Page: https://genforce.github.io/interfacegan/

License: MIT License

Python 100.00%

interfacegan's Introduction

InterFaceGAN - Interpreting the Latent Space of GANs for Semantic Face Editing

Python 3.7 pytorch 1.1.0 TensorFlow 1.12.2 sklearn 0.21.2

image Figure: High-quality facial attributes editing results with InterFaceGAN.

In this repository, we propose an approach, termed as InterFaceGAN, for semantic face editing. Specifically, InterFaceGAN is capable of turning an unconditionally trained face synthesis model to controllable GAN by interpreting the very first latent space and finding the hidden semantic subspaces.

[Paper (CVPR)] [Paper (TPAMI)] [Project Page] [Demo] [Colab]

How to Use

Pick up a model, pick up a boundary, pick up a latent code, and then EDIT!

# Before running the following code, please first download
# the pre-trained ProgressiveGAN model on CelebA-HQ dataset,
# and then place it under the folder ".models/pretrain/".
LATENT_CODE_NUM=10
python edit.py \
    -m pggan_celebahq \
    -b boundaries/pggan_celebahq_smile_boundary.npy \
    -n "$LATENT_CODE_NUM" \
    -o results/pggan_celebahq_smile_editing

GAN Models Used (Prior Work)

Before going into details, we would like to first introduce the two state-of-the-art GAN models used in this work, which are ProgressiveGAN (Karras el al., ICLR 2018) and StyleGAN (Karras et al., CVPR 2019). These two models achieve high-quality face synthesis by learning unconditional GANs. For more details about these two models, please refer to the original papers, as well as the official implementations.

ProgressiveGAN: [Paper] [Code]

StyleGAN: [Paper] [Code]

Code Instruction

Generative Models

A GAN-based generative model basically maps the latent codes (commonly sampled from high-dimensional latent space, such as standart normal distribution) to photo-realistic images. Accordingly, a base class for generator, called BaseGenerator, is defined in models/base_generator.py. Basically, it should contains following member functions:

  • build(): Build a pytorch module.
  • load(): Load pre-trained weights.
  • convert_tf_model() (Optional): Convert pre-trained weights from tensorflow model.
  • sample(): Randomly sample latent codes. This function should specify what kind of distribution the latent code is subject to.
  • preprocess(): Function to preprocess the latent codes before feeding it into the generator.
  • synthesize(): Run the model to get synthesized results (or any other intermediate outputs).
  • postprocess(): Function to postprocess the outputs from generator to convert them to images.

We have already provided following models in this repository:

  • ProgressiveGAN:
    • A clone of official tensorflow implementation: models/pggan_tf_official/. This clone is only used for converting tensorflow pre-trained weights to pytorch ones. This conversion will be done automitally when the model is used for the first time. After that, tensorflow version is not used anymore.
    • Pytorch implementation of official model (just for inference): models/pggan_generator_model.py.
    • Generator class derived from BaseGenerator: models/pggan_generator.py.
    • Please download the official released model trained on CelebA-HQ dataset and place it in folder models/pretrain/.
  • StyleGAN:
    • A clone of official tensorflow implementation: models/stylegan_tf_official/. This clone is only used for converting tensorflow pre-trained weights to pytorch ones. This conversion will be done automitally when the model is used for the first time. After that, tensorflow version is not used anymore.
    • Pytorch implementation of official model (just for inference): models/stylegan_generator_model.py.
    • Generator class derived from BaseGenerator: models/stylegan_generator.py.
    • Please download the official released models trained on CelebA-HQ dataset and FF-HQ dataset and place them in folder models/pretrain/.
    • Support synthesizing images from $\mathcal{Z}$ space, $\mathcal{W}$ space, and extended $\mathcal{W}$ space (18x512).
    • Set truncation trick and noise randomization trick in models/model_settings.py. Among them, STYLEGAN_RANDOMIZE_NOISE is highly recommended to set as False. STYLEGAN_TRUNCATION_PSI = 0.7 and STYLEGAN_TRUNCATION_LAYERS = 8 are inherited from official implementation. Users can customize their own models. NOTE: These three settings will NOT affect the pre-trained weights.
  • Customized model:
    • Users can do experiments with their own models by easily deriving new class from BaseGenerator.
    • Before used, new model should be first registered in MODEL_POOL in file models/model_settings.py.

Utility Functions

We provide following utility functions in utils/manipulator.py to make InterFaceGAN much easier to use.

  • train_boundary(): This function can be used for boundary searching. It takes pre-prepared latent codes and the corresponding attributes scores as inputs, and then outputs the normal direction of the separation boundary. Basically, this goal is achieved by training a linear SVM. The returned vector can be further used for semantic face editing.
  • project_boundary(): This function can be used for conditional manipulation. It takes a primal direction and other conditional directions as inputs, and then outputs a new normalized direction. Moving latent code along this new direction will manipulate the primal attribute yet barely affect the conditioned attributes. NOTE: For now, at most two conditions are supported.
  • linear_interpolate(): This function can be used for semantic face editing. It takes a latent code and the normal direction of a particular semantic boundary as inputs, and then outputs a collection of manipulated latent codes with linear interpolation. These interpolation can be used to see how the synthesis will vary if moving the latent code along the given direction.

Tools

  • generate_data.py: This script can be used for data preparation. It will generate a collection of syntheses (images are saved for further attribute prediction) as well as save the input latent codes.

  • train_boundary.py: This script can be used for boundary searching.

  • edit.py: This script can be usd for semantic face editing.

Usage

We take ProgressiveGAN model trained on CelebA-HQ dataset as an instance.

Prepare data

NUM=10000
python generate_data.py -m pggan_celebahq -o data/pggan_celebahq -n "$NUM"

Predict Attribute Score

Get your own predictor for attribute $ATTRIBUTE_NAME, evaluate on all generated images, and save the inference results as data/pggan_celebahq/"$ATTRIBUTE_NAME"_scores.npy. NOTE: The save results should be with shape ($NUM, 1).

Search Semantic Boundary

python train_boundary.py \
    -o boundaries/pggan_celebahq_"$ATTRIBUTE_NAME" \
    -c data/pggan_celebahq/z.npy \
    -s data/pggan_celebahq/"$ATTRIBUTE_NAME"_scores.npy

Compute Conditional Boundary (Optional)

This step is optional. It depends on whether conditional manipulation is needed. Users can use function project_boundary() in file utils/manipulator.py to compute the projected direction.

Boundaries Description

We provided following boundaries in folder boundaries/. The boundaries can be more accurate if stronger attribute predictor is used.

  • ProgressiveGAN model trained on CelebA-HQ dataset:

    • Single boundary:
      • pggan_celebahq_pose_boundary.npy: Pose.
      • pggan_celebahq_smile_boundary.npy: Smile (expression).
      • pggan_celebahq_age_boundary.npy: Age.
      • pggan_celebahq_gender_boundary.npy: Gender.
      • pggan_celebahq_eyeglasses_boundary.npy: Eyeglasses.
      • pggan_celebahq_quality_boundary.npy: Image quality.
    • Conditional boundary:
      • pggan_celebahq_age_c_gender_boundary.npy: Age (conditioned on gender).
      • pggan_celebahq_age_c_eyeglasses_boundary.npy: Age (conditioned on eyeglasses).
      • pggan_celebahq_age_c_gender_eyeglasses_boundary.npy: Age (conditioned on gender and eyeglasses).
      • pggan_celebahq_gender_c_age_boundary.npy: Gender (conditioned on age).
      • pggan_celebahq_gender_c_eyeglasses_boundary.npy: Gender (conditioned on eyeglasses).
      • pggan_celebahq_gender_c_age_eyeglasses_boundary.npy: Gender (conditioned on age and eyeglasses).
      • pggan_celebahq_eyeglasses_c_age_boundary.npy: Eyeglasses (conditioned on age).
      • pggan_celebahq_eyeglasses_c_gender_boundary.npy: Eyeglasses (conditioned on gender).
      • pggan_celebahq_eyeglasses_c_age_gender_boundary.npy: Eyeglasses (conditioned on age and gender).
  • StyleGAN model trained on CelebA-HQ dataset:

    • Single boundary in $\mathcal{Z}$ space:
      • stylegan_celebahq_pose_boundary.npy: Pose.
      • stylegan_celebahq_smile_boundary.npy: Smile (expression).
      • stylegan_celebahq_age_boundary.npy: Age.
      • stylegan_celebahq_gender_boundary.npy: Gender.
      • stylegan_celebahq_eyeglasses_boundary.npy: Eyeglasses.
    • Single boundary in $\mathcal{W}$ space:
      • stylegan_celebahq_pose_w_boundary.npy: Pose.
      • stylegan_celebahq_smile_w_boundary.npy: Smile (expression).
      • stylegan_celebahq_age_w_boundary.npy: Age.
      • stylegan_celebahq_gender_w_boundary.npy: Gender.
      • stylegan_celebahq_eyeglasses_w_boundary.npy: Eyeglasses.
  • StyleGAN model trained on FF-HQ dataset:

    • Single boundary in $\mathcal{Z}$ space:
      • stylegan_ffhq_pose_boundary.npy: Pose.
      • stylegan_ffhq_smile_boundary.npy: Smile (expression).
      • stylegan_ffhq_age_boundary.npy: Age.
      • stylegan_ffhq_gender_boundary.npy: Gender.
      • stylegan_ffhq_eyeglasses_boundary.npy: Eyeglasses.
    • Conditional boundary in $\mathcal{Z}$ space:
      • stylegan_ffhq_age_c_gender_boundary.npy: Age (conditioned on gender).
      • stylegan_ffhq_age_c_eyeglasses_boundary.npy: Age (conditioned on eyeglasses).
      • stylegan_ffhq_eyeglasses_c_age_boundary.npy: Eyeglasses (conditioned on age).
      • stylegan_ffhq_eyeglasses_c_gender_boundary.npy: Eyeglasses (conditioned on gender).
    • Single boundary in $\mathcal{W}$ space:
      • stylegan_ffhq_pose_w_boundary.npy: Pose.
      • stylegan_ffhq_smile_w_boundary.npy: Smile (expression).
      • stylegan_ffhq_age_w_boundary.npy: Age.
      • stylegan_ffhq_gender_w_boundary.npy: Gender.
      • stylegan_ffhq_eyeglasses_w_boundary.npy: Eyeglasses.

BibTeX

@inproceedings{shen2020interpreting,
  title     = {Interpreting the Latent Space of GANs for Semantic Face Editing},
  author    = {Shen, Yujun and Gu, Jinjin and Tang, Xiaoou and Zhou, Bolei},
  booktitle = {CVPR},
  year      = {2020}
}
@article{shen2020interfacegan,
  title   = {InterFaceGAN: Interpreting the Disentangled Face Representation Learned by GANs},
  author  = {Shen, Yujun and Yang, Ceyuan and Tang, Xiaoou and Zhou, Bolei},
  journal = {TPAMI},
  year    = {2020}
}

interfacegan's People

Contributors

clementapa avatar limbo0000 avatar parthpatel002 avatar shenyujun avatar younesbelkada avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

interfacegan's Issues

Omitting intercept

Hey!
First of all great job on the paper and the code!

I was wondering what was the motivation behind assuming that all boundaries (hyperplanes) pass through the origin? I would imagine this assumption might be restrictive (especially for general theory when the mapping network can produce distributions that are not centered around the origin).

Also related to this it seems like when you find the hyperplane in train_boundary via fitting of a linear model you do not enforce the intercept to be 0.

the change of smile when increase the distance of the "glasses" attribute

Hi,thanks for your great research.
I have a problem when I apply the "pggan_celebahq_eyeglasses_c_age_gender_boundary.npy" on the pre-trained pggan model. With the conditional manipulation on "glasses" attribute , the smile changes gradually. Here are the results.
000_001
000_004
008_001
008_005
But the attributes seem to be uncorrelated in your paper.
Could you explained it for me?

Issues converting custom pggan models

It seems that for pggan, state dictionaries for the model trained by paper authors (file karras2018iclr-celebahq-1024x1024.pkl) and custom models that are trained by their released code are different. You can check the differences here: https://www.diffchecker.com/0hFYlK82.

Unfortunately I was unable to tweak your code so that it would convert my trained model correctly.

Did you try converting custom trained model using pggan code? If so, could you please provide a model you trained that worked for you other than the one provided by pggan authors?

Training with stylegan2 model

Hi. Your results are good. Currently i am working pose transformation. Your repository helps me alot to go step forward.
Currently stylegan2 produced generated images after mixing are too divergent and good results than previous.

Can you tell the process of how to train stylegan2 model to get boundaries??

I changed following parameter values based on stylegan2-ffhq-config-f.pkl.

_RESOLUTIONS_TO_CHANNELS = {
8: [512, 512, 512],
16: [512, 512, 512, 512],
32: [512, 512, 512, 512, 512],
64: [512, 512, 512, 512, 512, 256],
128: [512, 512, 512, 512, 512, 256, 128],
256: [512, 512, 512, 512, 512, 256, 128, 64],
512: [512, 512, 512, 512, 512, 256, 128, 64, 32],
1024: [512, 512, 512, 512, 512, 256, 128, 64, 32, 16],
}

pylint: disable=line-too-long

Variable mapping from pytorch model to official tensorflow model.

_STYLEGAN_PTH_VARS_TO_TF_VARS = {
# Statistic information of disentangled latent feature, w.
'truncation.w_avg':'dlatent_avg', # [512]

# Noises.
'synthesis.layer0.epilogue.apply_noise.noise': 'noise0',    # [1, 1, 4, 4]
'synthesis.layer1.epilogue.apply_noise.noise': 'noise1',    # [1, 1, 8, 8]
'synthesis.layer2.epilogue.apply_noise.noise': 'noise2',    # [1, 1, 8,8]
'synthesis.layer3.epilogue.apply_noise.noise': 'noise3',    # [1, 1, 16, 16]
'synthesis.layer4.epilogue.apply_noise.noise': 'noise4',    # [1, 1, 16, 16]
'synthesis.layer5.epilogue.apply_noise.noise': 'noise5',    # [1, 1, 32, 32]
'synthesis.layer6.epilogue.apply_noise.noise': 'noise6',    # [1, 1, 32, 32]
'synthesis.layer7.epilogue.apply_noise.noise': 'noise7',    # [1, 1, 64, 64]
'synthesis.layer8.epilogue.apply_noise.noise': 'noise8',    # [1, 1, 64, 64]
'synthesis.layer9.epilogue.apply_noise.noise': 'noise9',    # [1, 1, 128, 128]
'synthesis.layer10.epilogue.apply_noise.noise': 'noise10',  # [1, 1, 128, 128]
'synthesis.layer11.epilogue.apply_noise.noise': 'noise11',  # [1, 1, 256, 256]
'synthesis.layer12.epilogue.apply_noise.noise': 'noise12',  # [1, 1, 256, 256]
'synthesis.layer13.epilogue.apply_noise.noise': 'noise13',  # [1, 1, 512, 512]
'synthesis.layer14.epilogue.apply_noise.noise': 'noise14',  # [1, 1, 512, 512]
'synthesis.layer15.epilogue.apply_noise.noise': 'noise15',  # [1, 1, 1024, 1024]
'synthesis.layer16.epilogue.apply_noise.noise': 'noise16',  # [1, 1, 1024, 1024]

# Mapping blocks.
'mapping.dense0.linear.weight': 'Dense0/weight',  # [512, 512]
'mapping.dense0.wscale.bias': 'Dense0/bias',  # [512]
'mapping.dense1.linear.weight': 'Dense1/weight',  # [512, 512]
'mapping.dense1.wscale.bias': 'Dense1/bias',  # [512]
'mapping.dense2.linear.weight': 'Dense2/weight',  # [512, 512]
'mapping.dense2.wscale.bias': 'Dense2/bias',  # [512]
'mapping.dense3.linear.weight': 'Dense3/weight',  # [512, 512]
'mapping.dense3.wscale.bias': 'Dense3/bias',  # [512]
'mapping.dense4.linear.weight': 'Dense4/weight',  # [512, 512]
'mapping.dense4.wscale.bias': 'Dense4/bias',  # [512]
'mapping.dense5.linear.weight': 'Dense5/weight',  # [512, 512]
'mapping.dense5.wscale.bias': 'Dense5/bias',  # [512]
'mapping.dense6.linear.weight': 'Dense6/weight',  # [512, 512]
'mapping.dense6.wscale.bias': 'Dense6/bias',  # [512]
'mapping.dense7.linear.weight': 'Dense7/weight',  # [512, 512]
'mapping.dense7.wscale.bias': 'Dense7/bias',  # [512]

# Synthesis blocks.

'synthesis.lod': 'lod' , #[]
'synthesis.add_constant': '4x4/Const/const',
'synthesis.layer0.conv.weight': '4x4/Conv/weight',
'synthesis.layer0.epilogue.mod_weight':'4x4/Conv/mod_weight',
'synthesis.layer0.epilogue.mod_bias':'4x4/Conv/mod_bias',
'synthesis.layer0.epilogue.apply_noise':'4x4/Conv/noise_strength',
'synthesis.layer0.epilogue.bias': '4x4/Conv/bias',
'synthesis.output0.conv.weight': '4x4/ToRGB/weight',   
'synthesis.output0.epilogue.mod_weight':'4x4/ToRGB/mod_weight',
'synthesis.output0.epilogue.mod_bias':'4x4/ToRGB/mod_bias',
'synthesis.output0.epilogue.bias':'4x4/ToRGB/bias',

'synthesis.layer1.conv.weight':'8x8/Conv0_up/weight',
'synthesis.layer1.epilogue.mod_weight':'8x8/Conv0_up/mod_weight',
'synthesis.layer1.epilogue.mod_bias':'8x8/Conv0_up/mod_bias',
'synthesis.layer1.epilogue.apply_noise':'8x8/Conv0_up/noise_strength',
'synthesis.layer1.epilogue.bias':'8x8/Conv0_up/bias',
'synthesis.layer2.conv.weight':'8x8/Conv1/weight',
'synthesis.layer2.epilogue.mod_weight':'8x8/Conv1/mod_weight',
'synthesis.layer2.epilogue.mod_bias':'8x8/Conv1/mod_bias',
'synthesis.layer2.epilogue.apply_noise':'8x8/Conv1/noise_strength',
'synthesis.layer2.epilogue.bias':'8x8/Conv1/bias',
'synthesis.output1.conv.weight': '8x8/ToRGB/weight',
'synthesis.output1.epilogue.mod_weight':'8x8/ToRGB/mod_weight',
'synthesis.output1.epilogue.mod_bias':'8x8/ToRGB/mod_bias',
'synthesis.output1.epilogue.bias':'8x8/ToRGB/bias',

'synthesis.layer3.conv.weight':'16x16/Conv0_up/weight',
'synthesis.layer3.epilogue.mod_weight':'16x16/Conv0_up/mod_weight',
'synthesis.layer3.epilogue.mod_bias':'16x16/Conv0_up/mod_bias',
'synthesis.layer3.epilogue.apply_noise':'16x16/Conv0_up/noise_strength',
'synthesis.layer3.epilogue.bias':'16x16/Conv0_up/bias',
'synthesis.layer4.conv.weight':'16x16/Conv1/weight',
'synthesis.layer4.epilogue.mod_weight':'16x16/Conv1/mod_weight',
'synthesis.layer4.epilogue.mod_bias':'16x16/Conv1/mod_bias',
'synthesis.layer4.epilogue.apply_noise':'16x16/Conv1/noise_strength',
'synthesis.layer4.epilogue.bias':'16x16/Conv1/bias',
'synthesis.output2.conv.weight': '16x16/ToRGB/weight',
'synthesis.output2.epilogue.mod_weight':'16x16/ToRGB/mod_weight',
'synthesis.output2.epilogue.mod_bias':'16x16/ToRGB/mod_bias',
'synthesis.output2.epilogue.bias':'16x16/ToRGB/bias',

'synthesis.layer5.conv.weight':'32x32/Conv0_up/weight',
'synthesis.layer5.epilogue.mod_weight':'32x32/Conv0_up/mod_weight',
'synthesis.layer5.epilogue.mod_bias':'32x32/Conv0_up/mod_bias',
'synthesis.layer5.epilogue.apply_noise':'32x32/Conv0_up/noise_strength',
'synthesis.layer5.epilogue.bias':'32x32/Conv0_up/bias',
'synthesis.layer6.conv.weight':'32x32/Conv1/weight',
'synthesis.layer6.epilogue.mod_weight':'32x32/Conv1/mod_weight',
'synthesis.layer6.epilogue.mod_bias':'32x32/Conv1/mod_bias',
'synthesis.layer6.epilogue.apply_noise':'32x32/Conv1/noise_strength',
'synthesis.layer6.epilogue.bias':'32x32/Conv1/bias',
'synthesis.output3.conv.weight': '32x32/ToRGB/weight',
'synthesis.output3.epilogue.mod_weight':'32x32/ToRGB/mod_weight',
'synthesis.output3.epilogue.mod_bias':'32x32/ToRGB/mod_bias',
'synthesis.output3.epilogue.bias':'32x32/ToRGB/bias',

'synthesis.layer7.conv.weight':'64x64/Conv0_up/weight',
'synthesis.layer7.epilogue.mod_weight':'64x64/Conv0_up/mod_weight',
'synthesis.layer7.epilogue.mod_bias':'64x64/Conv0_up/mod_bias',
'synthesis.layer7.epilogue.apply_noise':'64x64/Conv0_up/noise_strength',
'synthesis.layer7.epilogue.bias':'64x64/Conv0_up/bias',
'synthesis.layer8.conv.weight':'64x64/Conv1/weight',
'synthesis.layer8.epilogue.mod_weight':'64x64/Conv1/mod_weight',
'synthesis.layer8.epilogue.mod_bias':'64x64/Conv1/mod_bias',
'synthesis.layer8.epilogue.apply_noise':'64x64/Conv1/noise_strength',
'synthesis.layer8.epilogue.bias':'64x64/Conv1/bias',
'synthesis.output4.conv.weight': '64x64/ToRGB/weight',
'synthesis.output4.epilogue.mod_weight':'64x64/ToRGB/mod_weight',
'synthesis.output4.epilogue.mod_bias':'64x64/ToRGB/mod_bias',
'synthesis.output4.epilogue.bias':'64x64/ToRGB/bias',   

'synthesis.layer9.conv.weight':'128x128/Conv0_up/weight',
'synthesis.layer9.epilogue.mod_weight':'128x128/Conv0_up/mod_weight',
'synthesis.layer9.epilogue.mod_bias':'128x128/Conv0_up/mod_bias',
'synthesis.layer9.epilogue.apply_noise':'128x128/Conv0_up/noise_strength',
'synthesis.layer9.epilogue.bias':'128x128/Conv0_up/bias',
'synthesis.layer10.conv.weight':'128x128/Conv1/weight',
'synthesis.layer10.epilogue.mod_weight':'128x128/Conv1/mod_weight',
'synthesis.layer10.epilogue.mod_bias':'128x128/Conv1/mod_bias',
'synthesis.layer10.epilogue.apply_noise':'128x128/Conv1/noise_strength',
'synthesis.layer10.epilogue.bias':'128x128/Conv1/bias',
'synthesis.output5.conv.weight': '128x128/ToRGB/weight',
'synthesis.output5.epilogue.mod_weight':'128x128/ToRGB/mod_weight',
'synthesis.output5.epilogue.mod_bias':'128x128/ToRGB/mod_bias',
'synthesis.output5.epilogue.bias':'128x128/ToRGB/bias',

'synthesis.layer11.conv.weight':'256x256/Conv0_up/weight',
'synthesis.layer11.epilogue.mod_weight':'256x256/Conv0_up/mod_weight',
'synthesis.layer11.epilogue.mod_bias':'256x256/Conv0_up/mod_bias',
'synthesis.layer11.epilogue.apply_noise':'256x256/Conv0_up/noise_strength',
'synthesis.layer11.epilogue.bias':'256x256/Conv0_up/bias',
'synthesis.layer12.conv.weight':'256x256/Conv1/weight',
'synthesis.layer12.epilogue.mod_weight':'256x256/Conv1/mod_weight',
'synthesis.layer12.epilogue.mod_bias':'256x256/Conv1/mod_bias',
'synthesis.layer12.epilogue.apply_noise':'256x256/Conv1/noise_strength',
'synthesis.layer12.epilogue.bias':'256x256/Conv1/bias',
'synthesis.output6.conv.weight': '256x256/ToRGB/weight',
'synthesis.output6.epilogue.mod_weight':'256x256/ToRGB/mod_weight',
'synthesis.output6.epilogue.mod_bias':'256x256/ToRGB/mod_bias',
'synthesis.output6.epilogue.bias':'256x256/ToRGB/bias',

'synthesis.layer13.conv.weight':'512x512/Conv0_up/weight',
'synthesis.layer13.epilogue.mod_weight':'512x512/Conv0_up/mod_weight',
'synthesis.layer13.epilogue.mod_bias':'512x512/Conv0_up/mod_bias',
'synthesis.layer13.epilogue.apply_noise':'512x512/Conv0_up/noise_strength',
'synthesis.layer13.epilogue.bias':'512x512/Conv0_up/bias',
'synthesis.layer14.conv.weight':'512x512/Conv1/weight',
'synthesis.layer14.epilogue.mod_weight':'512x512/Conv1/mod_weight',
'synthesis.layer14.epilogue.mod_bias':'512x512/Conv1/mod_bias',
'synthesis.layer14.epilogue.apply_noise':'512x512/Conv1/noise_strength',
'synthesis.layer14.epilogue.bias':'512x512/Conv1/bias',
'synthesis.output7.conv.weight': '512x512/ToRGB/weight',
'synthesis.output7.epilogue.mod_weight':'512x512/ToRGB/mod_weight',
'synthesis.output7.epilogue.mod_bias':'512x512/ToRGB/mod_bias',
'synthesis.output7.epilogue.bias':'512x512/ToRGB/bias',

'synthesis.layer15.conv.weight':'1024x1024/Conv0_up/weight',
'synthesis.layer15.epilogue.mod_weight':'1024x1024/Conv0_up/mod_weight',
'synthesis.layer15.epilogue.mod_bias':'1024x1024/Conv0_up/mod_bias',
'synthesis.layer15.epilogue.apply_noise':'1024x1024/Conv0_up/noise_strength',
'synthesis.layer15.epilogue.bias':'1024x1024/Conv0_up/bias',
'synthesis.layer16.conv.weight':'1024x1024/Conv1/weight',
'synthesis.layer16.epilogue.mod_weight':'1024x1024/Conv1/mod_weight',
'synthesis.layer16.epilogue.mod_bias':'1024x1024/Conv1/mod_bias',
'synthesis.layer16.epilogue.apply_noise':'1024x1024/Conv1/noise_strength',
'synthesis.layer16.epilogue.bias':'1024x1024/Conv1/bias',
'synthesis.output8.conv.weight': '1024x1024/ToRGB/weight',
'synthesis.output8.epilogue.mod_weight':'1024x1024/ToRGB/mod_weight',
'synthesis.output8.epilogue.mod_bias':'1024x1024/ToRGB/mod_bias',
'synthesis.output8.epilogue.bias':'1024x1024/ToRGB/bias'

}
in stylegan2_generator_model.py (which is a copy of stylegan_generator_model.py
In stylegan_generator.py)

if 'ToRGB_lod' in tf_var_name:
lod = int(tf_var_name[len('ToRGB_lod')])
lod_shift = 10 - int(np.log2(self.resolution))
tf_var_name = tf_var_name.replace(f'{lod}', f'{lod - lod_shift}')
if tf_var_name not in tf_vars:
self.logger.debug(f'Variable {tf_var_name} does not exist in '
f'tensorflow model.')
is used.

Here in stylegan2 tf variable names are like '512x512/ToRGB/weight':
So, i have removed the above steps because it contains resolution size directly in variable name.

If i execute the codes by doing above modifications,it is saving that successfully saved the model. But at the time of loading giving error:
size mismatch for synthesis.layer1.epilogue.apply_noise.noise: copying a param with shape torch.Size([1, 1, 8, 8]) from checkpoint, the shape in current model is torch.Size([1, 1, 4, 4]).
size mismatch for synthesis.layer3.epilogue.apply_noise.noise: copying a param with shape torch.Size([1, 1, 16, 16]) from checkpoint, the shape in current model is torch.Size([1, 1, 8, 8]).
size mismatch for synthesis.layer5.epilogue.apply_noise.noise: copying a param with shape torch.Size([1, 1, 32, 32]) from checkpoint, the shape in current model is torch.Size([1, 1, 16, 16]).
size mismatch for synthesis.layer7.epilogue.apply_noise.noise: copying a param with shape torch.Size([1, 1, 64, 64]) from checkpoint, the shape in current model is torch.Size([1, 1, 32, 32]).
size mismatch for synthesis.layer8.conv.weight: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 512, 3, 3]).
size mismatch for synthesis.layer8.epilogue.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for synthesis.layer9.epilogue.apply_noise.noise: copying a param with shape torch.Size([1, 1, 128, 128]) from checkpoint, the shape in current model is torch.Size([1, 1, 64, 64]).
size mismatch for synthesis.output4.conv.weight: copying a param with shape torch.Size([3, 512, 1, 1]) from checkpoint, the shape in current model is torch.Size([3, 256, 1, 1]).
size mismatch for synthesis.layer10.epilogue.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for synthesis.layer11.epilogue.apply_noise.noise: copying a param with shape torch.Size([1, 1, 256, 256]) from checkpoint, the shape in current model is torch.Size([1, 1, 128, 128]).
size mismatch for synthesis.output5.conv.weight: copying a param with shape torch.Size([3, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([3, 128, 1, 1]).
size mismatch for synthesis.layer12.epilogue.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([64]).
size mismatch for synthesis.layer13.epilogue.apply_noise.noise: copying a param with shape torch.Size([1, 1, 512, 512]) from checkpoint, the shape in current model is torch.Size([1, 1, 256, 256]).
size mismatch for synthesis.output6.conv.weight: copying a param with shape torch.Size([3, 128, 1, 1]) from checkpoint, the shape in current model is torch.Size([3, 64, 1, 1]).
size mismatch for synthesis.layer14.epilogue.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([32]).
size mismatch for synthesis.layer15.epilogue.apply_noise.noise: copying a param with shape torch.Size([1, 1, 1024, 1024]) from checkpoint, the shape in current model is torch.Size([1, 1, 512, 512]).
size mismatch for synthesis.output7.conv.weight: copying a param with shape torch.Size([3, 64, 1, 1]) from checkpoint, the shape in current model is torch.Size([3, 32, 1, 1]).
size mismatch for synthesis.layer16.epilogue.bias: copying a param with shape torch.Size([32]) from checkpoint, the shape in current model is torch.Size([16]).
size mismatch for synthesis.output8.conv.weight: copying a param with shape torch.Size([3, 32, 1, 1]) from checkpoint, the shape in current model is torch.Size([3, 16, 1, 1]).

Can you help me change the functions in stylegan2_generator_model.py

Thanks in Advance .
Regards,
SandhyaLaxmi Kanna

SyntaxError: invalid syntax

E:\Users\Raytine\Anaconda3\python.exe F:/expression/InterFaceGAN-master/edit.py -m stylegan_celebahq -b boundaries/stylegan_celebahq_pose_boundary -n 10 -o results/stylegan_celebahq_smile_editing
[2019-08-12 13:50:32,896][INFO] Initializing generator.
[2019-08-12 13:50:34,279][INFO] Loading tensorflow model from {self.tf_model_path}.
Traceback (most recent call last):
File "F:/expression/InterFaceGAN-master/edit.py", line 114, in
main()
File "F:/expression/InterFaceGAN-master/edit.py", line 68, in main
model = StyleGANGenerator(args.model_name, logger)
File "F:\expression\InterFaceGAN-master\models\stylegan_generator.py", line 42, in init
super().init(model_name, logger)
File "F:\expression\InterFaceGAN-master\models\base_generator.py", line 96, in init
self.convert_tf_model()
File "F:\expression\InterFaceGAN-master\models\stylegan_generator.py", line 73, in convert_tf_model
_, , tf_model = pickle.load(f)
File "models/stylegan_tf_official\dnnlib_init
.py", line 20
submit_config: SubmitConfig = None # Package level variable for SubmitConfig which is only valid when inside the run function.
^
SyntaxError: invalid syntax

How can I input an image to edit?

Hi,
First many thanks for sharing this great work.
I want to ask a question. The input of InterFaceGAN is "Latent code". If I want to input a face image and edit the face, how can I deal with? Need I transform the image as a "Latent code"? And how to do it?

Thank you.

Does Interface GAN keep face id when editing faces?

Hi Yujun,
Thanks for the great work.
When I run edit.py, the output images do not keep the face ID, does it possible to keep the face ID (just like FaceID GAN/FaceFeat GAN) if one laten code is passed to generator?

Trying to use a pretrained stylegan with gray-scaled images

Hi,

I am trying to use generate_data.py with a trained stylegan on gray-scaled images, i registered my model in model_settings.py and specified number of channels as 1 in there. However, it turns out that something in the original model is fixed size to 3, the error is:

RuntimeError: Error(s) in loading state_dict for StyleGANGeneratorModel:
size mismatch for synthesis.output6.bias: copying a param with shape torch.Size([1]) from checkpoint, the shape in current model is torch.Size([3]).

Do you have any idea, how can i solve this.

Thanks in advance.

RuntimeError: Given groups=1, weight of size [16, 16, 3, 3], expected input[4, 512, 1, 1] to have 16 channels, but got 512 channels instead

E:\Users\Raytine\Anaconda3\python.exe F:/expression/InterFaceGAN-master/edit.py -m pggan_celebahq -b boundaries/pggan_celebahq_smile_boundary.npy -n 10 -o results/pggan_celebahq_smile_editing
[2019-08-12 09:17:26,315][INFO] Initializing generator.
[2019-08-12 09:17:26,440][WARNING] No pre-trained model will be loaded!
[2019-08-12 09:17:27,728][INFO] Preparing boundary.
[2019-08-12 09:17:27,731][INFO] Preparing latent codes.
[2019-08-12 09:17:27,731][INFO] Sample latent codes randomly.
[2019-08-12 09:17:27,732][INFO] Editing {total_num} samples.
Traceback (most recent call last):
File "F:/expression/InterFaceGAN-master/edit.py", line 112, in
main()
File "F:/expression/InterFaceGAN-master/edit.py", line 98, in main
outputs = model.easy_synthesize(interpolations_batch)
File "F:\expression\InterFaceGAN-master\models\base_generator.py", line 230, in easy_synthesize
outputs = self.synthesize(latent_codes, **kwargs)
File "F:\expression\InterFaceGAN-master\models\pggan_generator.py", line 117, in synthesize
images = self.model(zs)
File "E:\Users\Raytine\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "F:\expression\InterFaceGAN-master\models\pggan_generator_model.py", line 127, in forward
return super().forward(x)
File "E:\Users\Raytine\Anaconda3\lib\site-packages\torch\nn\modules\container.py", line 91, in forward
input = module(input)
File "E:\Users\Raytine\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "F:\expression\InterFaceGAN-master\models\pggan_generator_model.py", line 243, in forward
x = self.conv(x)
File "E:\Users\Raytine\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "E:\Users\Raytine\Anaconda3\lib\site-packages\torch\nn\modules\conv.py", line 301, in forward
self.padding, self.dilation, self.groups)
RuntimeError: Given groups=1, weight of size [16, 16, 3, 3], expected input[4, 512, 1, 1] to have 16 channels, but got 512 channels instead

Process finished with exit code 1

artifacts boundary

Hi there,

In the paper the last paragraph of section 3.3 mentions how to fix artifacts using certain hyperplane. However this hyperplane is missing in the boundaries folder of this repo. Are you going to release it soon?

Thanks!

How to get the pose boundary?

Hello! I am confused about how to get the pose boundary. The paper said that the auxiliary attribute prediction model predicts the 5-point facial landmarks. I find the label about the facial landmarks is coordinate in the CelebFace Dataset. The paper said that turning right is the corresponding positive direction. How did you turn the coordinate into binary code that represents the direction?

Issue with tensorflow version

When I try to install tensorflow 1.12.2 as I believe tensorflow 1.12.2 and cudo 9 are compatible to execute this code.. I'm getting version mismatch error and some other version of gets installed and when try to display tensorflow version I'm getting errors... images are not getting for other versions of tensorflow generated..

generate_data.py generates empty images

Hi guys,

Thank you for your great work. I have a question.

I found that generate_data.py works very well with my style-gan models, but when I trained PGGAN with the same dataset of images and then ran generate_data.py for generating 10k images, all images look like this:

000009

I tried another checkpoint but it gave similar results:

000015

I guess it is caused by some misconfiguration between my PGGAN and the inference code of InterFaceGAN, but haven't found it yet. Can you give me some advice please?

Attributes predictor

Hi! Thanks for your great research!
Is there any chance you can provide attributes predictor that you used for finding boundaries? I didn't really get about scores that it should give to attributes on the pictures.
Also for example for binary attributes like glasses/no glasses is it possible to manually put labels of say 1 for glasses, 0 for no glasses and feed this data to train a new boundary?

Confusing about the definition of binary attribute

Hello! I am confusing about the definition of binary attribute. For example, if I want to change the hair color attribute, whether the method can only work for dart colors vs. light colors. Can i get the centain color direction like red hair color direction through collecting red hair and no red hair images to train a
binary classifier? Maybe can we get the red color direction through collecting red hair and yellow hair?

Questions about generator W space data by stylegan_ffhq

Hi, I want to edit image in W space.

In #35, it is suggested that using generate_data.py to get w.npy first.
I used the following code to generator. However, the image is strange that it is not like a normal human face
python generate_data.py -m stylegan_ffhq -o data/stylegan-ffhq -n 3 -s W

000002
000000
000001

Was my code wrong?

Also, I want to ask. If I want to edit the W space image, was the following code right?

python edit.py \
    -m stylegan_ffhq \
    -b boundaries/stylegan_ffhq_age_w_boundary.npy \
    -i ./data/stylegan-ffhq/w.npy \
    -o results/stylegan_celebahq_age_w_boundary \
    -s W

about style_mod

why the function is x * (style[:, 0] + 1) + style[:, 1]?what "+1" of (style[:, 0] + 1)mean?,the official implementation is also same as yours.i can‘t figure it out

How to perform StyleGAN inversion?

Hi Yujun,

In the paper you claimed that it must use GAN inversion method to map real images to latent codes, and StyleGAN inversion methods are much better, are there documents introducing how to do the inversion?
Any comments are appreciated! Best Regards.

RuntimeError: Given groups=1, weight of size [16, 16, 3, 3], expected input[4, 512, 1, 1] to have 16 channels, but got 512 channels instead

E:\Users\Raytine\Anaconda3\python.exe F:/expression/InterFaceGAN-master/edit.py -m pggan_celebahq -b boundaries/pggan_celebahq_smile_boundary.npy -n 10 -o results/pggan_celebahq_smile_editing
[2019-08-12 14:15:15,846][INFO] Initializing generator.
[2019-08-12 14:15:15,972][INFO] Loading pytorch model from {self.model_path}.
[2019-08-12 14:15:16,002][INFO] Successfully loaded!
[2019-08-12 14:15:17,357][INFO] Preparing boundary.
0%| | 0/10 [00:00<?, ?it/s][2019-08-12 14:15:17,394][INFO] Preparing latent codes.
[2019-08-12 14:15:17,394][INFO] Sample latent codes randomly.
[2019-08-12 14:15:17,395][INFO] Editing {total_num} samples.
Traceback (most recent call last):
File "F:/expression/InterFaceGAN-master/edit.py", line 114, in
main()
File "F:/expression/InterFaceGAN-master/edit.py", line 100, in main
outputs = model.easy_synthesize(interpolations_batch)
File "F:\expression\InterFaceGAN-master\models\base_generator.py", line 230, in easy_synthesize
outputs = self.synthesize(latent_codes, **kwargs)
File "F:\expression\InterFaceGAN-master\models\pggan_generator.py", line 132, in synthesize
images = self.model(zs)
File "E:\Users\Raytine\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "F:\expression\InterFaceGAN-master\models\pggan_generator_model.py", line 127, in forward
return super().forward(x)
File "E:\Users\Raytine\Anaconda3\lib\site-packages\torch\nn\modules\container.py", line 91, in forward
input = module(input)
File "E:\Users\Raytine\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "F:\expression\InterFaceGAN-master\models\pggan_generator_model.py", line 243, in forward
x = self.conv(x)
File "E:\Users\Raytine\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "E:\Users\Raytine\Anaconda3\lib\site-packages\torch\nn\modules\conv.py", line 301, in forward
self.padding, self.dilation, self.groups)
RuntimeError: Given groups=1, weight of size [16, 16, 3, 3], expected input[4, 512, 1, 1] to have 16 channels, but got 512 channels instead

Requirments.txt and pre-trained models

Great contribution and git my friends, thanks so much for posting this open to the community to use.

Would like to ask several questions:

  1. Any way you can provide a requirements.txt file in order for those who don't have the libraries to easily use your git?

  2. Can you link to the appropriate pre-trained models and where to download them? I have tried using the PGGAN from the github and yet the edit.py isn't working.

Thanks again!

How to manipulate real faces?

Dear author, after checking this repository, I have found that there isn't included a encoder-decoder model as the paper tests in Figure 11. WILL this release in the near future?

RuntimeError: Error(s) in loading state_dict for PGGANGeneratorModel:

E:\Users\Raytine\Anaconda3\python.exe F:/expression/InterFaceGAN-master/edit.py -m pggan_celebahq -b boundaries/pggan_celebahq_smile_boundary.npy -n 10 -o results/pggan_celebahq_smile_editing
[2019-08-12 10:30:13,999][INFO] Initializing generator.
[2019-08-12 10:30:23,051][INFO] Loading tensorflow model from {self.tf_model_path}.
[2019-08-12 10:30:28,891][INFO] Successfully loaded!
[2019-08-12 10:30:28,892][INFO] Converting tensorflow model to pytorch version.
[2019-08-12 10:30:29,095][INFO] Successfully converted!
[2019-08-12 10:30:29,095][INFO] Saving pytorch model to {self.model_path}.
[2019-08-12 10:30:29,120][INFO] Successfully saved!
[2019-08-12 10:30:29,120][INFO] Loading pytorch model from {self.model_path}.
Traceback (most recent call last):
File "F:/expression/InterFaceGAN-master/edit.py", line 112, in
main()
File "F:/expression/InterFaceGAN-master/edit.py", line 63, in main
model = PGGANGenerator(args.model_name, logger)
File "F:\expression\InterFaceGAN-master\models\pggan_generator.py", line 24, in init
super().init(model_name, logger)
File "F:\expression\InterFaceGAN-master\models\base_generator.py", line 96, in init
self.convert_tf_model()
File "F:\expression\InterFaceGAN-master\models\pggan_generator.py", line 70, in convert_tf_model
self.load()
File "F:\expression\InterFaceGAN-master\models\pggan_generator.py", line 34, in load
self.model.load_state_dict(torch.load(self.model_path))
File "E:\Users\Raytine\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 719, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for PGGANGeneratorModel:
Unexpected key(s) in state_dict: "layer5.conv.weight", "layer18.conv.weight", "layer2.conv.weight", "layer15.wscale.bias", "layer18.wscale.bias", "layer4.wscale.bias", "layer6.wscale.bias", "layer1.conv.weight", "layer8.conv.weight", "layer14.wscale.bias", "layer10.wscale.bias", "layer8.wscale.bias", "layer15.conv.weight", "output_1024x1024.conv.weight", "output_1024x1024.wscale.bias", "layer7.conv.weight", "layer9.conv.weight", "layer9.wscale.bias", "layer17.conv.weight", "layer13.wscale.bias", "layer12.wscale.bias", "layer14.conv.weight", "layer16.wscale.bias", "layer11.wscale.bias", "layer16.conv.weight", "layer10.conv.weight", "layer6.conv.weight", "layer17.wscale.bias", "layer4.conv.weight", "layer13.conv.weight", "layer5.wscale.bias", "layer2.wscale.bias", "layer3.wscale.bias", "layer12.conv.weight", "layer1.wscale.bias", "layer11.conv.weight", "layer7.wscale.bias", "layer3.conv.weight".

how to prepare data for a custom attribute

Hi, I think the main idea is to prepare data with/without a certain attribute, and train a binary classifier (namely linear-SVM) with the data.

But most attributes are of continuous values, such as pose rotation.
The logic in your code is to use the average of the largest value and the smallest value as a threshold.

There are also some attributes should be quantised as several finite states, for example, face shape can be in [square, triangle, heart, round, oval]

Do you have more detailed suggestions on how to quantize those attributes?

Thanks!

Can it make an adult into a baby?

Hi, the age demo in the paper makes an adult into a child, could you please tell me what will if I set the age attribute as an extreme value?

Can it make an adult into a one-year-old baby?Can it keep the generator output a normal human face?

Roll and Pitch for Pose data

Hi Shenyujun,
Thanks so much for awesome work! I saw you have pose direction in repo, but there is no roll or pitch rotation for pose in that direction. Do you think it's possible to train with both pitch, roll and yaw? BTW, any attribute predictor or data could share regarding the pose generation? That will be really helpful, not found in original stylegan repo.
Thanks

Issue when using pretrained model with input size 512x512

Hi,

I am trying to run generate_data.py using my pretrained model which was trained on 512x512 images. It successfully converted the pkl model to pth, but then showed the error below.

Traceback (most recent call last): File "/media/tai/6TB/Projects/InterFaceGAN/InterFaceGAN/generate_data.py", line 114, in <module> main() File "/media/tai/6TB/Projects/InterFaceGAN/InterFaceGAN/generate_data.py", line 65, in main model = StyleGANGenerator(args.model_name, logger) File "/media/tai/6TB/Projects/InterFaceGAN/InterFaceGAN/models/stylegan_generator.py", line 42, in __init__ super().__init__(model_name, logger) File "/media/tai/6TB/Projects/InterFaceGAN/InterFaceGAN/models/base_generator.py", line 95, in __init__ self.load() File "/media/tai/6TB/Projects/InterFaceGAN/InterFaceGAN/models/stylegan_generator.py", line 63, in load self.model.load_state_dict(state_dict) File "/media/tai/6TB/anaconda3/envs/InterfaceGAN/lib/python3.6/site-packages/torch/nn/modules/module.py", line 845, in load_state_dict self.__class__.__name__, "\n\t".join(error_msgs))) RuntimeError: Error(s) in loading state_dict for StyleGANGeneratorModel: Unexpected key(s) in state_dict: "synthesis.output8.conv.weight", "synthesis.output8.bias". size mismatch for synthesis.output4.conv.weight: copying a param with shape torch.Size([3, 512, 1, 1]) from checkpoint, the shape in current model is torch.Size([3, 256, 1, 1]). size mismatch for synthesis.output5.conv.weight: copying a param with shape torch.Size([3, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([3, 128, 1, 1]). size mismatch for synthesis.output6.conv.weight: copying a param with shape torch.Size([3, 128, 1, 1]) from checkpoint, the shape in current model is torch.Size([3, 64, 1, 1]). size mismatch for synthesis.output7.conv.weight: copying a param with shape torch.Size([3, 64, 1, 1]) from checkpoint, the shape in current model is torch.Size([3, 32, 1, 1]).
I guess it is because InterFaceGAN is set to work with models being trained with 1024x1024 images by default.

Where should I modify so that I can load my 512x512 model?

Thank you very much!

How to train w+ space boundary?

Using the stylegan-encoder project, I got the latent codes as array (n,18,512). However, the training code is for 1d vector input , do i need to separate the latent code into 1d vector?
Thanks a lot!

quality on stylegan_ffhq

Hi, thanks for the paper and the results are impressive!

I tested the code with "stylegan_ffhq" model and "stylegan_ffhq_pose_boundary.npy or stylegan_ffhq_pose_w_boundary.npy", with the default settings, but the results are not very good.

The person identity, age, even gender changed simultaneously with the pose.
Regarding to the "stylegan_ffhq_pose_w_boundary.npy", the degree of pose changes are more or less ignorable.

python edit.py -m stylegan_ffhq -o results/stylegan_ffhq_pose_w_boundary -b ./boundaries/stylegan_ffhq_pose_w_boundary.npy -n 10

Is there anything that I have to adjust?

Attribute Scores

Hello, thanks for the great work.
I would like to know if it is possible to see the code for the attribute predictor you used, or if you could share the scores of these directly to be able to find boundaries on different architectures but with the same dataset.
Thanks very much

W space or W+ space

hello! The GitHub provides us some boundaries about the W space in StyleGAN. But I found that the code contains two configurations which are W space and W+ space. So I wonder if all the boundaries label W in paper and Github are corresponding to W space? There is no provided W+ space boundaries and no W+ space results in the paper? Thanks a lot.

Questions about the truncation module.

I have a question on your implementation of the truncation module. Why do the first 9 channels of the W+ code are same? It looks like you separate W+ code into just 2 blocks but 18 blocks? This is strange for that in the official code each channel(I mean 18, not 512) of W+ code is different.

Issue learning latent encoding for new faces

I am trying to derive latent encodings for cutom faces, as done in https://github.com/Puzer/stylegan-encoder.

Here are the details after porting the same to pytorch:

from models.stylegan_generator import StyleGANGenerator

#load the pre-trained synthesis network
m_synth = StyleGANGenerator("stylegan_ffhq").model.synthesis.cuda().eval()

#process the output of the synthesis module
class PostProcAfterSynth(nn.Module):
    def __init__(self):
        super(PostProcAfterSynth, self).__init__()
    def forward(self, gen_img):
        #remap to [0,1]
        return (gen_img+1)/2
    
post_proc_layer = PostProcAfterSynth()

#preprocess the generated image before feeding into perceptual model    
class PreProcBeforePerception(nn.Module):
    def __init__(self, img_size):
        super(PreProcBeforePerception, self).__init__()
        self.img_size = img_size
        self.mean = torch.tensor([0.485, 0.456, 0.406], device=device).view(-1, 1, 1)
        self.std = torch.tensor([0.229, 0.224, 0.225], device=device).view(-1, 1, 1)
    def forward(self, gen_img):
        #resize input image
        gen_img = F.adaptive_avg_pool2d(gen_img, self.img_size)
        #normalize
        gen_img = (gen_img - self.mean) / self.std
        return gen_img
    
pre_proc_layer = PreProcBeforePerception(img_size=256)

#use pre-trained vgg model for feature extraction
m_vgg = models.vgg16(pretrained=True).features[:16].to(device).eval()

#set up the model
model = nn.Sequential(m_synth)
model.add_module(str(1), post_proc_layer)
model.add_module(str(2), pre_proc_layer)
model.add_module(str(3), m_vgg)

for param in model.parameters():
    param.requires_grad_(False)

print(m_vgg)

Sequential(
  (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (1): ReLU(inplace)
  (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (3): ReLU(inplace)
  (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (6): ReLU(inplace)
  (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (8): ReLU(inplace)
  (9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (11): ReLU(inplace)
  (12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (13): ReLU(inplace)
  (14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (15): ReLU(inplace)
)

As done by Puzer, I select the [conv->conv->pool->conv->conv->pool->conv->conv->conv] section of the vgg network for feature extraction.

Pre-computing the features for the reference image:

ref_img_path = "."
ref_img = np.array(Image.open(ref_img_path))
ref_img = ref_img.astype(np.float32)/255.
ref_img = np.array([np.transpose(ref_img, (2,0,1))])
ref_img = torch.tensor(ref_img, device=device)
ref_img = pre_proc_layer(ref_img)
ref_img_features = m_vgg(ref_img).detach()

Optimization:

trainable_latent = torch.randn((1,18,512), device=device).requires_grad_(True)
loss_func = torch.nn.MSELoss()

optimizer = optim.SGD([trainable_latent], lr=0.5)

losses = []
for i in tqdm(range(1000)):
    optimizer.zero_grad()
    gen_img_features = model(trainable_latent)
    loss = loss_func(gen_img_features, ref_img_features)
    loss_val = loss.data.cpu()
    losses.append(loss_val)
    loss.backward()
    optimizer.step()

The latent encoding and subsequent generated images are of a poor quality. The results are nowhere near as crisp as that by Puzer.

What I have tried:

  1. Learning Z space latent instead of WP+
  2. Variety of optimizers, learning rate, iterations combos

What could be wrong:

  1. There might be issues with my pipeline above (new to pytorch)
  2. There might be some difference in pre-trained vgg networks for pytorch and keras, that I might have failed to take into account.
  3. The perceptual model used is not complex enough. (but it does work for Puzer)

Any help with the above would be much appreciated.

Regarding yaw pose estimation using facial landmarks

Hi,

I wish to apply your technique to a StyleGAN model that I have trained on Celeba-HQ-128 images. Can you please release the code to estimate yaw pose using the five facial landmarks present in CelebA dataset (left eye centre, right eye centre, nose tip, left mouth corner and right mouth corner)?

Thanks.

AssertionError: Torch not compiled with CUDA enabled

(base) PS E:\darshan\pytorch_stylegan_encoder-master\InterFaceGAN> python generate_data.py -m stylegan_ffhq -o data/pggan_celebahq -n 10000
[2020-01-20 03:53:48,282][INFO] Initializing generator.
[2020-01-20 03:53:48,521][WARNING] No pre-trained model will be loaded!
Traceback (most recent call last):
File "generate_data.py", line 111, in
main()
File "generate_data.py", line 64, in main
model = StyleGANGenerator(args.model_name, logger)
File "E:\darshan\pytorch_stylegan_encoder-master\InterFaceGAN\models\stylegan_generator.py", line 42, in init
super().init(model_name, logger)
File "E:\darshan\pytorch_stylegan_encoder-master\InterFaceGAN\models\base_generator.py", line 103, in init
self.model.eval().to(self.run_device)
File "C:\Users\HpZ8\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 426, in to
return self._apply(convert)
File "C:\Users\HpZ8\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 202, in _apply
module._apply(fn)
File "C:\Users\HpZ8\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 202, in _apply
module._apply(fn)
File "C:\Users\HpZ8\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 202, in _apply
module._apply(fn)
File "C:\Users\HpZ8\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 224, in apply
param_applied = fn(param)
File "C:\Users\HpZ8\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 424, in convert
return t.to(device, dtype if t.is_floating_point() else None, non_blocking)
File "C:\Users\HpZ8\Anaconda3\lib\site-packages\torch\cuda_init
.py", line 192, in _lazy_init
check_driver()
File "C:\Users\HpZ8\Anaconda3\lib\site-packages\torch\cuda_init
.py", line 95, in _check_driver
raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled

How to find more boundaries?

Congratulations for the great work!

Please correct me if I'm wrong. From what I understand, providing a pretrained model and a boundary (in boundaries/) we can tuning features of generated images. This is amazing.

I wonder if I can explore other boundaries as well? Let's say, hair color or skin colors? If it is possible, how could I do that?

Thank you very much!

Why the latent code from GAN inversion methods can be manipulated by the boundary

Hi, thanks for sharing this great work!

I'm trying to edit a new face. In #30, it is suggested using https://github.com/Puzer/stylegan-encoder firstly to get the new face latent code of W+ space. However, the shape of the latent code is (18, 512) and 18 layers have different values.

What confusing me is :

  1. The shape of "stylegan_ffhq_age_w_boundary.npy" is (1,512), so if using (1, 512) boundary to edit (18,512) latent code, all layers will edit by the same value. But the meaning of different layers of (18,512) latent code is not the same, because the values of 18 layers are different.

Why can we use (1, 512) boundary to edit (18,512) latent code? Why it can also work?

  1. If the (18,512) latent code has different values of its 18 layers, training a (18, 512) boundary (which also has different values of its 18 layers) is more reasonable, isn't it?

  2. In your paper, you also do the experiment of real images. What the latent space did you get from your stylegan encode? Z, W or W+?
    If the shape of your latent code is (18,512), do 18 layers have different values?

Thank you!

Multi-GPU support

I have warped the model in models/base_generator. However, CUDA out of memory occurs when I run the synthesize script. Could you help me figure it out?
GPU: P40, 24G. batchsize:32.

different interpolation logic

Hi,

First many thanks for sharing this great work.

Got a question regarding the linear interpolation logic. In function linear_interpolate(), when len(latent_code.shape) == 2, dot product of latent_code and boundary is subtracted from [start_distance, end_distance]. However if if len(latent_code.shape) == 3, the dot product is not considered at all. Just wondering why these 2 cases are treated differently.

Thanks.

Ways of learning the attribute vector

Very impressive work! I'm wondering if you have compared the proposed way of learning the attribute vector (by classification) with the way in [1] (by simply using the difference between the mean features)?.

[1] P. Upchurch, J. Gardner, G. Pleiss, R. Pless, N. Snavely, K. Bala, and K. Weinberger. Deep feature interpolation for image content changes. In CVPR, 2017

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.