GithubHelp home page GithubHelp logo

nvlabs / stylegan2-ada Goto Github PK

View Code? Open in Web Editor NEW
1.8K 35.0 502.0 1.28 MB

StyleGAN2 with adaptive discriminator augmentation (ADA) - Official TensorFlow implementation

Home Page: https://arxiv.org/abs/2006.06676

License: Other

Dockerfile 0.17% Python 92.59% Cuda 7.25%

stylegan2-ada's Introduction

StyleGAN2 with adaptive discriminator augmentation (ADA)
— Official TensorFlow implementation

Teaser image

Training Generative Adversarial Networks with Limited Data
Tero Karras, Miika Aittala, Janne Hellsten, Samuli Laine, Jaakko Lehtinen, Timo Aila
https://arxiv.org/abs/2006.06676

Abstract: Training generative adversarial networks (GAN) using too little data typically leads to discriminator overfitting, causing training to diverge. We propose an adaptive discriminator augmentation mechanism that significantly stabilizes training in limited data regimes. The approach does not require changes to loss functions or network architectures, and is applicable both when training from scratch and when fine-tuning an existing GAN on another dataset. We demonstrate, on several datasets, that good results are now possible using only a few thousand training images, often matching StyleGAN2 results with an order of magnitude fewer images. We expect this to open up new application domains for GANs. We also find that the widely used CIFAR-10 is, in fact, a limited data benchmark, and improve the record FID from 5.59 to 2.42.

For business inquiries, please visit our website and submit the form: NVIDIA Research Licensing

Looking for the PyTorch version?

The Official PyTorch version is now available and supersedes the TensorFlow version. See the full list of versions here.

What's new

This repository supersedes the original StyleGAN2 with the following new features:

  • ADA: Significantly better results for datasets with less than ~30k training images. State-of-the-art results for CIFAR-10.
  • Mixed-precision support: ~1.6x faster training, ~1.3x faster inference, ~1.5x lower GPU memory consumption.
  • Better hyperparameter defaults: Reasonable out-of-the-box results for different dataset resolutions and GPU counts.
  • Clean codebase: Extensive refactoring and simplification. The code should be generally easier to work with.
  • Command line tools: Easily reproduce training runs from the paper, generate projection videos for arbitrary images, etc.
  • Network import: Full support for network pickles produced by StyleGAN and StyleGAN2. Faster loading times.
  • Augmentation pipeline: Self-contained, reusable GPU implementation of extensive high-quality image augmentations.
  • Bugfixes

External data repository

Path Description
stylegan2-ada Main directory hosted on Amazon S3
  ├  ada-paper.pdf Paper PDF
  ├  images Curated example images produced using the pre-trained models
  ├  videos Curated example interpolation videos
  └  pretrained Pre-trained models
    ├  metfaces.pkl MetFaces at 1024x1024, transfer learning from FFHQ using ADA
    ├  brecahad.pkl BreCaHAD at 512x512, trained from scratch using ADA
    ├  afhqcat.pkl AFHQ Cat at 512x512, trained from scratch using ADA
    ├  afhqdog.pkl AFHQ Dog at 512x512, trained from scratch using ADA
    ├  afhqwild.pkl AFHQ Wild at 512x512, trained from scratch using ADA
    ├  cifar10.pkl Class-conditional CIFAR-10 at 32x32
    ├  ffhq.pkl FFHQ at 1024x1024, trained using original StyleGAN2
    ├  paper-fig7c-training-set-sweeps All models used in Fig.7c (baseline, ADA, bCR)
    ├  paper-fig8a-comparison-methods All models used in Fig.8a (comparison methods)
    ├  paper-fig8b-discriminator-capacity All models used in Fig.8b (discriminator capacity)
    ├  paper-fig11a-small-datasets All models used in Fig.11a (small datasets, transfer learning)
    ├  paper-fig11b-cifar10 All models used in Fig.11b (CIFAR-10)
    ├  transfer-learning-source-nets Models used as starting point for transfer learning
    └  metrics Feature detectors used by the quality metrics

Requirements

  • Linux and Windows are supported, but we recommend Linux for performance and compatibility reasons.
  • 64-bit Python 3.6 or 3.7. We recommend Anaconda3 with numpy 1.14.3 or newer.
  • We recommend TensorFlow 1.14, which we used for all experiments in the paper, but TensorFlow 1.15 is also supported on Linux. TensorFlow 2.x is not supported.
  • On Windows you need to use TensorFlow 1.14, as the standard 1.15 installation does not include necessary C++ headers.
  • 1–8 high-end NVIDIA GPUs with at least 12 GB of GPU memory, NVIDIA drivers, CUDA 10.0 toolkit and cuDNN 7.5.
  • Docker users: use the provided Dockerfile to build an image with the required library dependencies.

The generator and discriminator networks rely heavily on custom TensorFlow ops that are compiled on the fly using NVCC. On Windows, the compilation requires Microsoft Visual Studio to be in PATH. We recommend installing Visual Studio Community Edition and adding it into PATH using "C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Auxiliary\Build\vcvars64.bat".

Getting started

Pre-trained networks are stored as *.pkl files that can be referenced using local filenames or URLs:

# Generate curated MetFaces images without truncation (Fig.10 left)
python generate.py --outdir=out --trunc=1 --seeds=85,265,297,849 \
    --network=https://nvlabs-fi-cdn.nvidia.com/stylegan2-ada/pretrained/metfaces.pkl

# Generate uncurated MetFaces images with truncation (Fig.12 upper left)
python generate.py --outdir=out --trunc=0.7 --seeds=600-605 \
    --network=https://nvlabs-fi-cdn.nvidia.com/stylegan2-ada/pretrained/metfaces.pkl

# Generate class conditional CIFAR-10 images (Fig.17 left, Car)
python generate.py --outdir=out --trunc=1 --seeds=0-35 --class=1 \
    --network=https://nvlabs-fi-cdn.nvidia.com/stylegan2-ada/pretrained/cifar10.pkl

Outputs from the above commands are placed under out/*.png. You can change the location with --outdir. Temporary cache files, such as CUDA build results and downloaded network pickles, will be saved under $HOME/.cache/dnnlib. This can be overridden using the DNNLIB_CACHE_DIR environment variable.

Docker: You can run the above curated image example using Docker as follows:

docker build --tag stylegan2ada:latest .
docker run --gpus all -it --rm -v `pwd`:/scratch --user $(id -u):$(id -g) stylegan2ada:latest bash -c \
    "(cd /scratch && DNNLIB_CACHE_DIR=/scratch/.cache python3 generate.py --trunc=1 --seeds=85,265,297,849 \
    --outdir=out --network=https://nvlabs-fi-cdn.nvidia.com/stylegan2-ada/pretrained/metfaces.pkl)"

Note: The above defaults to a container base image that requires NVIDIA driver release r455.23 or later. To build an image for older drivers and GPUs, run:

docker build --build-arg BASE_IMAGE=tensorflow/tensorflow:1.14.0-gpu-py3 --tag stylegan2ada:latest .

Projecting images to latent space

To find the matching latent vector for a given image file, run:

python projector.py --outdir=out --target=targetimg.png \
    --network=https://nvlabs-fi-cdn.nvidia.com/stylegan2-ada/pretrained/ffhq.pkl

For optimal results, the target image should be cropped and aligned similar to the original FFHQ dataset. The above command saves the projection target out/target.png, result out/proj.png, latent vector out/dlatents.npz, and progression video out/proj.mp4. You can render the resulting latent vector by specifying --dlatents for python generate.py:

python generate.py --outdir=out --dlatents=out/dlatents.npz \
    --network=https://nvlabs-fi-cdn.nvidia.com/stylegan2-ada/pretrained/ffhq.pkl

Preparing datasets

Datasets are stored as multi-resolution TFRecords, i.e., the same format used by StyleGAN and StyleGAN2. Each dataset consists of multiple *.tfrecords files stored under a common directory, e.g., ~/datasets/ffhq/ffhq-r*.tfrecords

MetFaces: Download the MetFaces dataset and convert to TFRecords:

python dataset_tool.py create_from_images ~/datasets/metfaces ~/downloads/metfaces/images
python dataset_tool.py display ~/datasets/metfaces

BreCaHAD: Download the BreCaHAD dataset. Generate 512x512 resolution crops and convert to TFRecords:

python dataset_tool.py extract_brecahad_crops --cropsize=512 \
    --output_dir=/tmp/brecahad-crops --brecahad_dir=~/downloads/brecahad/images

python dataset_tool.py create_from_images ~/datasets/brecahad /tmp/brecahad-crops
python dataset_tool.py display ~/datasets/brecahad

AFHQ: Download the AFHQ dataset and convert to TFRecords:

python dataset_tool.py create_from_images ~/datasets/afhqcat ~/downloads/afhq/train/cat
python dataset_tool.py create_from_images ~/datasets/afhqdog ~/downloads/afhq/train/dog
python dataset_tool.py create_from_images ~/datasets/afhqwild ~/downloads/afhq/train/wild
python dataset_tool.py display ~/datasets/afhqcat

CIFAR-10: Download the CIFAR-10 python version. Convert to two separate TFRecords for unconditional and class-conditional training:

python dataset_tool.py create_cifar10 --ignore_labels=1 \
    ~/datasets/cifar10u ~/downloads/cifar-10-batches-py

python dataset_tool.py create_cifar10 --ignore_labels=0 \
    ~/datasets/cifar10c ~/downloads/cifar-10-batches-py

python dataset_tool.py display ~/datasets/cifar10c

FFHQ: Download the Flickr-Faces-HQ dataset as TFRecords:

pushd ~
git clone https://github.com/NVlabs/ffhq-dataset.git
cd ffhq-dataset
python download_ffhq.py --tfrecords
popd
python dataset_tool.py display ~/ffhq-dataset/tfrecords/ffhq

LSUN: Download the desired LSUN categories in LMDB format from the LSUN project page and convert to TFRecords:

python dataset_tool.py create_lsun --resolution=256 --max_images=200000 \
    ~/datasets/lsuncat200k ~/downloads/lsun/cat_lmdb

python dataset_tool.py display ~/datasets/lsuncat200k

Custom: Custom datasets can be created by placing all images under a single directory. The images must be square-shaped and they must all have the same power-of-two dimensions. To convert the images to multi-resolution TFRecords, run:

python dataset_tool.py create_from_images ~/datasets/custom ~/custom-images
python dataset_tool.py display ~/datasets/custom

Training new networks

In its most basic form, training new networks boils down to:

python train.py --outdir=~/training-runs --gpus=1 --data=~/datasets/custom --dry-run
python train.py --outdir=~/training-runs --gpus=1 --data=~/datasets/custom

The first command is optional; it will validate the arguments, print out the resulting training configuration, and exit. The second command will kick off the actual training.

In this example, the results will be saved to a newly created directory ~/training-runs/<RUNNING_ID>-custom-auto1 (controlled by --outdir). The training will export network pickles (network-snapshot-<KIMG>.pkl) and example images (fakes<KIMG>.png) at regular intervals (controlled by --snap). For each pickle, it will also evaluate FID by default (controlled by --metrics) and log the resulting scores in metric-fid50k_full.txt.

The name of the output directory (e.g., 00000-custom-auto1) reflects the hyperparameter configuration that was used. In this case, custom indicates the training set (--data) and auto1 indicates the base configuration that was used to select the hyperparameters (--cfg):

Base config Description
auto (default) Automatically select reasonable defaults based on resolution and GPU count. Serves as a good starting point for new datasets, but does not necessarily lead to optimal results.
stylegan2 Reproduce results for StyleGAN2 config F at 1024x1024 using 1, 2, 4, or 8 GPUs.
paper256 Reproduce results for FFHQ and LSUN Cat at 256x256 using 1, 2, 4, or 8 GPUs.
paper512 Reproduce results for BreCaHAD and AFHQ at 512x512 using 1, 2, 4, or 8 GPUs.
paper1024 Reproduce results for MetFaces at 1024x1024 using 1, 2, 4, or 8 GPUs.
cifar Reproduce results for CIFAR-10 (tuned configuration) using 1 or 2 GPUs.
cifarbaseline Reproduce results for CIFAR-10 (baseline configuration) using 1 or 2 GPUs.

The training configuration can be further customized with additional arguments. Common examples:

  • --aug=noaug disables ADA (default: enabled).
  • --mirror=1 amplifies the dataset with x-flips. Often beneficial, even with ADA.
  • --resume=ffhq1024 --snap=10 performs transfer learning from FFHQ trained at 1024x1024.
  • --resume=~/training-runs/<RUN_NAME>/network-snapshot-<KIMG>.pkl resumes where a previous training run left off.
  • --gamma=10 overrides R1 gamma. We strongly recommend trying out at least a few different values for each new dataset.

Augmentation fine-tuning:

  • --aug=ada --target=0.7 adjusts ADA target value (default: 0.6).
  • --aug=adarv selects the alternative ADA heuristic (requires a separate validation set).
  • --augpipe=blit limits the augmentation pipeline to pixel blitting only.
  • --augpipe=bgcfnc enables all available augmentations (blit, geom, color, filter, noise, cutout).
  • --cmethod=bcr enables bCR with small integer translations.

Please refer to python train.py --help for the full list.

Expected training time

The total training time depends heavily on the resolution, number of GPUs, desired quality, dataset, and hyperparameters. In general, the training time can be expected to scale linearly with respect to the resolution and inversely proportional with respect to the number of GPUs. Small datasets tend to reach their lowest achievable FID faster than larger ones, but the convergence is somewhat less predictable. Transfer learning tends to converge significantly faster than training from scratch.

To give a rough idea of typical training times, the following figure shows several examples of FID as a function of wallclock time. Each curve corresponds to training a given dataset from scratch using --cfg=auto with a given number of NVIDIA Tesla V100 GPUs:

Training curves

Please note that --cfg=auto only serves as a reasonable first guess for the hyperparameters — it does not necessarily lead to optimal results for a given dataset. For example, --cfg=stylegan2 yields considerably better FID for FFHQ-140k at 1024x1024 than illustrated above. We recommend trying out at least a few different values of --gamma for each new dataset.

Preparing training set sweeps

In the paper, we perform several experiments using artificially limited/amplified versions of the training data, such as ffhq30k, ffhq140k, and lsuncat30k. These are constructed by first unpacking the original dataset into a temporary directory with python dataset_tool.py unpack and then repackaging the appropriate versions into TFRecords with python dataset_tool.py pack. In the following examples, the temporary directories are created under /tmp and can be safely deleted afterwards.

# Unpack FFHQ images at 256x256 resolution.
python dataset_tool.py unpack --resolution=256 \
    --tfrecord_dir=~/ffhq-dataset/tfrecords/ffhq --output_dir=/tmp/ffhq-unpacked

# Create subset with 30k images.
python dataset_tool.py pack --num_train=30000 --num_validation=10000 --seed=123 \
    --tfrecord_dir=~/datasets/ffhq30k --unpacked_dir=/tmp/ffhq-unpacked

# Create amplified version with 140k images.
python dataset_tool.py pack --num_train=70000 --num_validation=0 --mirror=1 --seed=123 \
    --tfrecord_dir=~/datasets/ffhq140k --unpacked_dir=/tmp/ffhq-unpacked

# Unpack LSUN Cat images at 256x256 resolution.
python dataset_tool.py unpack --resolution=256 \
    --tfrecord_dir=~/datasets/lsuncat200k --output_dir=/tmp/lsuncat200k-unpacked

# Create subset with 30k images.
python dataset_tool.py pack --num_train=30000 --num_validation=10000 --seed=123 \
    --tfrecord_dir=~/datasets/lsuncat30k --unpacked_dir=/tmp/lsuncat200k-unpacked

Please note that when training with artifically limited/amplified datasets, the quality metrics (e.g., fid50k_full) should still be evaluated against the corresponding original datasets. This can be done by specifying a separate metric dataset for train.py and calc_metrics.py using the --metricdata argument. For example:

python train.py [OTHER_OPTIONS] --data=~/datasets/ffhq30k --metricdata=~/ffhq-dataset/tfrecords/ffhq

Reproducing training runs from the paper

The pre-trained network pickles (stylegan2-ada/pretrained/paper-fig*) reflect the training configuration the same way as the output directory names, making it straightforward to reproduce a given training run from the paper. For example:

# 1. AFHQ Dog
# paper-fig11a-small-datasets/afhqdog-mirror-paper512-ada.pkl
python train.py --outdir=~/training-runs --gpus=8 --data=~/datasets/afhqdog \
    --mirror=1 --cfg=paper512 --aug=ada

# 2. Class-conditional CIFAR-10
# pretrained/paper-fig11b-cifar10/cifar10c-cifar-ada-best-fid.pkl
python train.py --outdir=~/training-runs --gpus=2 --data=~/datasets/cifar10c \
    --cfg=cifar --aug=ada

# 3. MetFaces with transfer learning from FFHQ
# paper-fig11a-small-datasets/metfaces-mirror-paper1024-ada-resumeffhq1024.pkl
python train.py --outdir=~/training-runs --gpus=8 --data=~/datasets/metfaces \
    --mirror=1 --cfg=paper1024 --aug=ada --resume=ffhq1024 --snap=10

# 4. 10k subset of FFHQ with ADA and bCR
# paper-fig7c-training-set-sweeps/ffhq10k-paper256-ada-bcr.pkl
python train.py --outdir=~/training-runs --gpus=8 --data=~/datasets/ffhq10k \
    --cfg=paper256 --aug=ada --cmethod=bcr --metricdata=~/ffhq-dataset/tfrecords/ffhq

# 5. StyleGAN2 config F
# transfer-learning-source-nets/ffhq-res1024-mirror-stylegan2-noaug.pkl
python train.py --outdir=~/training-runs --gpus=8 --data=~/ffhq-dataset/tfrecords/ffhq \
    --res=1024 --mirror=1 --cfg=stylegan2 --aug=noaug --metrics=fid50k

Notes:

  • You can use fewer GPUs than shown in the above examples. This will only increase the training time — it will not affect the quality of the results.
  • Example 3 specifies --snap=10 to export network pickles more frequently than usual. This is recommended, because transfer learning tends to yield very fast convergence.
  • Example 4 specifies --metricdata to evaluate quality metrics against the original FFHQ dataset, not the artificially limited 10k subset used for training.
  • Example 5 specifies --metrics=fid50k to evaluate FID the same way as in the StyleGAN2 paper (see below).

Quality metrics

By default, train.py will automatically compute FID for each network pickle. We strongly recommend inspecting metric-fid50k_full.txt at regular intervals to monitor the training progress. When desired, the automatic computation can be disabled with --metrics none to speed up the training.

Additional quality metrics can also be computed after the training:

# Previous training run: look up options automatically, save result to text file.
python calc_metrics.py --metrics=pr50k3_full \
    --network=~/training-runs/00000-ffhq10k-res64-auto1/network-snapshot-000000.pkl

# Pretrained network pickle: specify dataset explicitly, print result to stdout.
python calc_metrics.py --metrics=fid50k_full --metricdata=~/datasets/ffhq --mirror=1 \
    --network=https://nvlabs-fi-cdn.nvidia.com/stylegan2-ada/pretrained/ffhq.pkl

The first example will automatically find training_options.json stored alongside the network pickle and perform the same operation as if --metrics pr50k3_full had been specified during training. The second example will download a pre-trained network pickle, in which case the values of --mirror and --metricdata have to be specified explicitly.

Note that many of the metrics have a significant one-off cost (up to an hour or more) when they are calculated for the first time using a given dataset. Also note that the evaluation is done using a different random seed each time, so the results will vary if the same metric is computed multiple times.

We employ the following metrics in the ADA paper. The expected execution times correspond to using one Tesla V100 GPU at 1024x1024 and 256x256 resolution:

Metric 1024x1024 256x256 Description
fid50k_full 15 min 5 min Fréchet inception distance[1] against the full dataset.
kid50k_full 15 min 5 min Kernel inception distance[2] against the full dataset.
pr50k3_full 20 min 10 min Precision and recall[3] againt the full dataset.
is50k 25 min 5 min Inception score[4] for CIFAR-10.

In addition, all metrics that were used in the StyleGAN and StyleGAN2 papers are also supported for backwards compatibility:

Legacy: StyleGAN2 1024x1024 Description
fid50k 15 min Fréchet inception distance against 50k real images.
kid50k 15 min Kernel inception distance against 50k real images.
pr50k3 20 min Precision and recall against 50k real images.
ppl2_wend 40 min Perceptual path length[5] in W at path endpoints against full image.
Legacy: StyleGAN 1024x1024 Description
ppl_zfull 40 min Perceptual path length in Z for full paths against cropped image.
ppl_wfull 40 min Perceptual path length in W for full paths against cropped image.
ppl_zend 40 min Perceptual path length in Z at path endpoints against cropped image.
ppl_wend 40 min Perceptual path length in W at path endpoints against cropped image.
ls 10 hrs Linear separability[5] with respect to CelebA attributes.

References:

  1. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium, Heusel et al. 2017
  2. Demystifying MMD GANs, Bińkowski et al. 2018
  3. Improved Precision and Recall Metric for Assessing Generative Models, Kynkäänniemi et al. 2019
  4. Improved Techniques for Training GANs, Salimans et al. 2016
  5. A Style-Based Generator Architecture for Generative Adversarial Networks, Karras et al. 2018

License

Copyright © 2020, NVIDIA Corporation. All rights reserved.

This work is made available under the Nvidia Source Code License.

Citation

@inproceedings{Karras2020ada,
  title     = {Training Generative Adversarial Networks with Limited Data},
  author    = {Tero Karras and Miika Aittala and Janne Hellsten and Samuli Laine and Jaakko Lehtinen and Timo Aila},
  booktitle = {Proc. NeurIPS},
  year      = {2020}
}

Development

This is a research reference implementation and is treated as a one-time code drop. As such, we do not accept outside code contributions in the form of pull requests.

Acknowledgements

We thank David Luebke for helpful comments; Tero Kuosmanen and Sabu Nadarajan for their support with compute infrastructure; and Edgar Schönfeld for guidance on setting up unconditional BigGAN.

stylegan2-ada's People

Contributors

jannehellsten avatar nurpax avatar tkarras avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

stylegan2-ada's Issues

NVCC returned an error

Hi all, I try to get results of pretrained model by file generate.py
But I got an error like this:
image

all logs here

My setup:
Ubuntu 18.04
Cuda 10.1
Anaconda 3
Python 3.7
Can someone help me?

Is tf.nn.depthwise_conv2d_backprop_input equal to tf.nn.conv2d_transpose for implementing Upsample2D?

Hi,
I am trying to adapt the codes to accommodate my project which is based on PyTorch. I have trouble with transferring the Upsample2D with a specific kernel (e.g. https://github.com/NVlabs/stylegan2-ada/blob/main/training/augment.py#L425) since pytorch does not have a corresponding API for tf.nn.depthwise_conv2d_backprop_input. I wonder if this operation is the same as tf.nn.conv2d_transpose so that I can safely replace it with torch.nn.funtional.conv_transpose2d. It would be very helpful if you can comment on it. Thanks a lot!

how to setting the epochs ?

im training with own dataset but dont know which one setting epoch. anybody can tell me which codes and line define epoch?

rtx 3000 series broken compatibility

I tried to install nvidia driver ( 455 ) by myself on my ubuntu 18.04 with python 3.7 and tensorflow 1.14 (also tried 1.15).
It always said it couldn't find a gpu when trying to start training (or other errors like attempting to import cublas.10 files with a failure, while I had cuda 11 installed instead ). I got an rtx 3090 founder edition gpu.
I tried different approaches by reinstalling things and wasted more than 10 hours, it never worked for me. It was working on my titan rtx though, on a few different computer rigs.
Finally I thought that maintainers claimed it is working on their end for rtx 3000, maybe I can try their docker container.
It didn't work initially, then I realized I have a few more steps to do, so I installed nvidia-docker2 ( nvidia-container-toolkit ) thinking that it should certainly work. Unfortunately, it causes errors again:

Output directory: ./results/00015-jjl_1024-mirror-24gb-gpu-bg-resumeffhq1024
Training data: ./datasets/jjl_1024
Training length: 25000 kimg
Resolution: 1024
Number of GPUs: 1

Creating output directory...
Loading training set...
Image shape: [3, 1024, 1024]
Label shape: [0]

Constructing networks...
Setting up TensorFlow plugin "fused_bias_act.cu": Compiling... Failed!
Traceback (most recent call last):
File "train.py", line 591, in
main()
File "train.py", line 583, in main
run_training(**vars(args))
File "train.py", line 473, in run_training
training_loop.training_loop(**training_options)
File "/var/www/training/training_loop.py", line 123, in training_loop
Gs = G.clone('Gs')
File "/var/www/dnnlib/tflib/network.py", line 457, in clone
net.copy_vars_from(self)
File "/var/www/dnnlib/tflib/network.py", line 490, in copy_vars_from
src_net._get_vars()
File "/var/www/dnnlib/tflib/network.py", line 297, in _get_vars
self._vars = OrderedDict(self._get_own_vars())
File "/var/www/dnnlib/tflib/network.py", line 286, in _get_own_vars
self._init_graph()
File "/var/www/dnnlib/tflib/network.py", line 151, in _init_graph
out_expr = self._build_func(*self._input_templates, **build_kwargs)
File "/var/www/training/networks.py", line 231, in G_main
num_layers = components.synthesis.input_shape[1]
File "/var/www/dnnlib/tflib/network.py", line 232, in input_shape
return self.input_shapes[0]
File "/var/www/dnnlib/tflib/network.py", line 219, in input_shapes
self._input_shapes = [t.shape.as_list() for t in self.input_templates]
File "/var/www/dnnlib/tflib/network.py", line 267, in input_templates
self._init_graph()
File "/var/www/dnnlib/tflib/network.py", line 151, in _init_graph
out_expr = self._build_func(*self._input_templates, **build_kwargs)
File "/var/www/training/networks.py", line 439, in G_synthesis
x = layer(x, layer_idx=0, fmaps=nf(1), kernel=3)
File "/var/www/training/networks.py", line 392, in layer
x = modulated_conv2d_layer(x, dlatents_in[:, layer_idx], fmaps=fmaps, kernel=kernel, up=up, resample_kernel=resample_kernel, fused_modconv=fused_modconv)
File "/var/www/training/networks.py", line 105, in modulated_conv2d_layer
s = apply_bias_act(s, bias_var='mod_bias', trainable=trainable) + 1 # [BI] Add bias (initially 1).
File "/var/www/training/networks.py", line 50, in apply_bias_act
return fused_bias_act(x, b=tf.cast(b, x.dtype), act=act, gain=gain, clamp=clamp)
File "/var/www/dnnlib/tflib/ops/fused_bias_act.py", line 72, in fused_bias_act
return impl_dict[impl](x=x, b=b, axis=axis, act=act, alpha=alpha, gain=gain, clamp=clamp)
File "/var/www/dnnlib/tflib/ops/fused_bias_act.py", line 132, in _fused_bias_act_cuda
cuda_op = _get_plugin().fused_bias_act
File "/var/www/dnnlib/tflib/ops/fused_bias_act.py", line 18, in _get_plugin
return custom_ops.get_plugin(os.path.splitext(file)[0] + '.cu')
File "/var/www/dnnlib/tflib/custom_ops.py", line 159, in get_plugin
_run_cmd(nvcc_cmd + ' "%s" --shared -o "%s" --keep --keep-dir "%s"' % (cuda_file, tmp_file, tmp_dir))
File "/var/www/dnnlib/tflib/custom_ops.py", line 69, in _run_cmd
raise RuntimeError('NVCC returned an error. See below for full command line and output log:\n\n%s\n\n%s' % (cmd, output))
RuntimeError: NVCC returned an error. See below for full command line and output log:

nvcc --compiler-options '-fPIC' --compiler-options '-I/usr/local/lib/python3.6/dist-packages/tensorflow/include -D_GLIBCXX_USE_CXX11_ABI=0' --linker-options '-L/usr/local/lib/python3.6/dist-packages/tensorflow -l:libtensorflow_framework.so.1' --gpu-architecture=sm_86 --use_fast_math --disable-warnings --include-path "/usr/local/lib/python3.6/dist-packages/tensorflow/include" --include-path "/usr/local/lib/python3.6/dist-packages/tensorflow/include/external/protobuf_archive/src" --include-path "/usr/local/lib/python3.6/dist-packages/tensorflow/include/external/com_google_absl" --include-path "/usr/local/lib/python3.6/dist-packages/tensorflow/include/external/eigen_archive" 2>&1 "/var/www/dnnlib/tflib/ops/fused_bias_act.cu" --shared -o "/tmp/tmp4dn1nm6o/fused_bias_act_tmp.so" --keep --keep-dir "/tmp/tmp4dn1nm6o"

nvcc fatal : Value 'sm_86' is not defined for option 'gpu-architecture'

By googling it I identified that similar errors ( sm_75 ) are occurring when there is code / cuda / driver compatibility issues. At least that's what people say.
Please help with a decent working container version at least.

dimension error when resuming model from stylegan2

I trained a model in stylegan2. When I try to --resume training with that model using stylegan2-ada, I get the following error:

Constructing networks...
Setting up TensorFlow plugin "fused_bias_act.cu": Loading... Done.
Setting up TensorFlow plugin "upfirdn_2d.cu": Loading... Done.
Resuming from "/datadrive/stylegan2_results/00003-stylegan2-wikiart-128-1gpu-config-f/network-snapshot-002286.pkl"
Traceback (most recent call last):
  File "train.py", line 561, in <module>
    main()
  File "train.py", line 553, in main
    run_training(**vars(args))
  File "train.py", line 451, in run_training
    training_loop.training_loop(**training_options)
  File "/datadrive/stylegan2-ada/training/training_loop.py", line 128, in training_loop
    G.copy_vars_from(rG)
  File "/datadrive/stylegan2-ada/dnnlib/tflib/network.py", line 511, in copy_vars_from
    self._components[name].copy_vars_from(src_comp)
  File "/datadrive/stylegan2-ada/dnnlib/tflib/network.py", line 508, in copy_vars_from
    self.copy_own_vars_from(src_net)
  File "/datadrive/stylegan2-ada/dnnlib/tflib/network.py", line 481, in copy_own_vars_from
    tfutil.set_vars({self._get_vars()[name]: value for name, value in value_dict.items() if name in self._get_vars()})
  File "/datadrive/stylegan2-ada/dnnlib/tflib/tfutil.py", line 227, in set_vars
    run(ops, feed_dict)
  File "/datadrive/stylegan2-ada/dnnlib/tflib/tfutil.py", line 33, in run
    return tf.get_default_session().run(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 950, in run
    run_metadata_ptr)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1149, in _run
    str(subfeed_t.get_shape())))
ValueError: Cannot feed value of shape (3, 3, 512, 512) for Tensor 'G_synthesis/64x64/Conv0_up/weight/new_value:0', which has shape '(3, 3, 512, 256)'

My original model was trained on a 128x128 dataset using the following command:

docker exec -ti -w /datadrive/stylegan2 wonderful_driscoll python run_training.py --num-gpus=1 --data-dir=/datadrive/data/sg2-datasets --dataset=wikiart-128 --total-kimg 10000 --config=config-f

I'm running stylegan2-ada with this command:

docker exec -ti -w /datadrive/stylegan2-ada wonderful_driscoll python train.py --outdir=/datadrive/stylegan2ada_results/ --gpus=1 --data=/datadrive/data/sg2-datasets/wikiart-128/ --resume /datadrive/stylegan2_results/00003-stylegan2-wikiart-128-1gpu-config-f/network-snapshot-002286.pkl --metrics fid50k 

Am I correct in understanding that sg2-ada is supposed to be backwards compatible with sg2 datasets? If so, am I supposed to be doing something different or is there a bug here? Thanks!

weird dark fake init

Hi,

Why is my fake init so dark?

fakes_init

I used this command

python train.py --outdir=~/training-runs --gpus=1 --data=~/datasets/custom --resume=ffhq1024 --snap=10 --cfg=stylegan2

My dataset is 100 256x256 santa claus images cropped just like FFHQ 🎅

Screen Shot 2020-12-05 at 10 36 30 AM

thanks

cuSolverDN error

Hi,

Thank you for releasing the code to the public. I tried to train the model using a custom dataset which I prepared using dataset_tool.py for image resolution 512. I am using default training options and encountered the following error. I am using Tensorflow 1.15, on ubuntu 18 with cuda version 10.0. I also tried with Tensorflow 1.14 and had the same issue. What version of cuda/cudnn should I be using?

nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130

Error encountered:
F tensorflow/core/kernels/cuda_solvers.cc:94] Check failed: cusolverDnCreate(&cusolver_dn_handle) == CUSOLVER_STATUS_SUCCESS Failed to create cuSolverDN instance.

I would appreciate any guidance for resolving this error.

Thanks.

metrics take forever to complete, no progress and some errors, and more Q's

I am training my own model based on custom made images for an art project.
when I use stylegan2 I get metrics in about 15mins, I did see the warning that stylegan2-ada metrics can take a long time for the first metrics run, but its been running of 8 hours on a 48 core, V100 based machine, and hasnt produced any output. is there a debug function to see where its stuck? in linux I can only see that the python process is waiting on some output from the gpu.

training works as expected although quite slow, with auto giving me quite small batch sizes, my GPU has 32GB mem, and by default stylegan2-ada is only using 35% of gpu mem, by tweaking the config to 24G gpu complex, I get 54% of memory. guess I can tweak some more.

python calc_metrics.py --metrics=fid50k_full --network=./results/00001-romi1-auto1-kimg15000/network-snapshot-000160.pkl
....
....
Calculating real image statistics for fid50k_full...
2020-12-24 23:00:25.724829: W tensorflow/compiler/jit/mark_for_compilation_pass.cc:1412] (One-time warning): Not using XLA:CPU for cluster because envvar TF_XLA_FLAGS=--tf_xla_cpu_global_jit was not set. If you want XLA:CPU, either set that envvar, or use experimental_jit_scope to enable XLA:CPU. To confirm that XLA is active, pass --vmodule=xla_compilation_cache=1 (as a proper command-line flag, not via TF_XLA_FLAGS) or set the envvar XLA_FLAGS=--xla_hlo_profile.

or is it not using my GPU at that moment, I ran it concurently with the training at some time, the training was hardly impacted, so the metrics run was not doing a lot.

Where the result image and model pkl?

I was training and that successfully running forever, but i dont know where the result ? I was make outdir put still empty

Anybody know my problem? Please help me

Generating Images without a GPU

Hi there, can I generate images (using some pre-trained model) with Stylegan2-ADA without a GPU?

I have a model trained, I just need to generate some images locally without a GPU to test the integration with a local system in a POC.

After the POC the company can afford a GPU.

Thank you!

Thank you for sharing this! I have just read the README:

  • ADA, mixed-precision support, and command-line tools are all appreciated,
  • access to transfer learning from the diverse 256x256 dog snapshot lsundog256 is much appreciated!

tensorflow/core/protobuf/

  • conda install tensorflow-gpu=1.15.0
  • then i use the "python3 generate.py --outdir=out --trunc=1 --seeds=0-35 --class=1 --network=pretrained/cifar10.pk"
  • RuntimeError: NVCC returned an error. See below for full command line and output log:nvcc --compiler-options '-fPIC' --compiler-options '...........................
  • error: cannot call member function ‘bool google::protobuf::Map<Key, T>::InnerMap::TableEntryIsList(google::protobuf::Map<Key, T>::size_type) const [with Key = std::cxx11::basic_string; T = tensorflow::AttrValue; google::protobuf::Map<Key, T>::size_type = long unsigned int]’ without object
    return m
    ->TableEntryIsList(bucket_index
    );
    ~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~

How to monitor real and fake losses?

I'm trying to monitor losses with tensorboard inside Colab but can't figure what the logdir path is. I'm using the following line code:

%tensorboard --logdir logs

If this is not possible with this repo. What is the simplest way to print out 'Loss/scores/real' and 'Loss/scores/real' for each tick?

how to fix the C++11 error

Hi, I using tensorflow==1.14.0 , CUDA 10.0 , cuDNN 7.5.1 on Ubuntu16.04
but I meet a C++ bug

C++ versions less than C++11 are not supported
#error This file requires compiler and library support for the ISO c++ 2011 standard. This support must be enabled with the -std=C++11 or std=gun++11 compiler options

But I do not know where should I add "-std=c++11 or -std=gun++11" or Should I update my g++ version ??
How I fix it ?

Generate.py on Colab? Can generate image locally, but not on Colab

Hi,

I'm trying to generate images from a pretrained network on Colab, but didn't work. However, it went well locally.

Here was the command :

!python generate.py --outdir=out --trunc=1 --seeds=0-35 --class=1 --network="https://nvlabs-fi-cdn.nvidia.com/stylegan2-ada/pretrained/cifar10.pkl"

I also tried with the command given in the README file to generate curated MetFaces images. Again, this works well locally but not on Colab.

Error:
generate.py: error: unrecognized arguments: --outdir=out --trunc=1 --seeds=0-35 --class=1 --network=https://nvlabs-fi-cdn.nvidia.com/stylegan2-ada/pretrained/cifar10.pkl

Config
Colab Pro
tensorflow_version 1.x (I also tried to force to version 1.14)
I double checked few times I was on a GPU, more precisely it's on the Tesla V100-SXM2

Customize new loss

Hi there,

Nice work! Could you give some instructions about how to customize new loss functions for the discriminator and the generator into your framework?

Training slows after a dozen ticks

Hi,

I am experiencing the following issue, as illustrated on the screenshot enclosed (illustration).

The training goes at a normal speed during a few ticks (about a dozen), and then slows considerably (from 100sec/kimg to more than 500sec/kimg). I notice this behavior every time I run the training. It happens when training is made with a 512 resolution but it did not happen at 256 resolution.

Training is made on 1 GPU (RTX 3090).
Has anyone had the same issue?

Thanks,

batch size

Hi,

How do I change batch size?
It works for me with 256x256 images but with 1024x1024 my gpu runs out of memory.

thanks

One moment

Hi

On Windows you need to use TensorFlow 1.14, as the standard 1.15 installation does not include necessary C++ headers.

Where I can find TensorFlow 1.14 and do you have any guide how to setup? Thanks.

Weird implementation of conditional case

Hi guys! First of all, thank you for the paper and the implementation, it helps a lot!

Concerning the issue. I'm not sure to understand your implementation of conditional gan case.
In the discriminator you perform label embedding:

# Label embedding and mapping.

Could you please explain what's the point of doing that? You don't seem to do it in the original stylegan2 implementation, by the way: https://github.com/NVlabs/stylegan2/blob/23f8bed55f4b220c69cff98139a000d4c77cd558/training/networks_stylegan2.py#L640

Further, in the end of the discriminator network, you output the tensor with the shape equal to the size of the embedding space, which is the dimension of the latent space by default. Why?

Thanks in advance!

Other models for transfer learning

Hi,

I am training a GAN on images of buildings.
Until now, I used StyleGAN2 using transfer learning, starting from the model learnt on LSUN church dataset (and available on this repo https://github.com/NVlabs/stylegan2).
I would like to switch to StyleGAN2 with ADA because of the limited size of my dataset, but the LSUN church model available on StyleGAN2 repo is not compatible with StyleGAN2 with ADA.
Did anyone encounter the same issue ?

RuntimeError: NVCC returned an error. See below for full command line and output log:

'''
nvcc --compiler-options '-fPIC' --compiler-options '-I/home/gayrat/miniconda3/envs/Tensorflow_v1/lib/python3.7/site-packages/tensorflow/include -D_GLIBCXX_USE_CXX11_ABI=1' --linker-options '-L/home/gayrat/miniconda3/envs/Tensorflow_v1/lib/python3.7/site-packages/tensorflow -ltensorflow_framework' --gpu-architecture=sm_70 --use_fast_math --disable-warnings --include-path "/home/gayrat/miniconda3/envs/Tensorflow_v1/lib/python3.7/site-packages/tensorflow/include" --include-path "/home/gayrat/miniconda3/envs/Tensorflow_v1/lib/python3.7/site-packages/tensorflow/include/external/protobuf_archive/src" --include-path "/home/gayrat/miniconda3/envs/Tensorflow_v1/lib/python3.7/site-packages/tensorflow/include/external/com_google_absl" --include-path "/home/gayrat/miniconda3/envs/Tensorflow_v1/lib/python3.7/site-packages/tensorflow/include/external/eigen_archive" 2>&1 "/home/gayrat/PycharmProjects/stylegan2-ada-main/dnnlib/tflib/ops/fused_bias_act.cu" --shared -o "/tmp/tmpxkquz3dr/fused_bias_act_tmp.so" --keep --keep-dir "/tmp/tmpxkquz3dr"

/bin/sh: nvcc: command not found
'''

Hi brother. I am trying to deal with it. Can you help me?

additional info:
NVIDIA-SMI 418.165.02 Driver Version: 418.165.02 CUDA Version: 10.1
centos7
tensorflow-gpu version 1.13.1

Suggestion to fix the AssertionError

Hello,

I would suggest that you remove the assert here:

# Render images for a given dlatent vector.
if dlatents_npz is not None:
print(f'Generating images from dlatents file "{dlatents_npz}"')
dlatents = np.load(dlatents_npz)['dlatents']
assert dlatents.shape[1:] == (18, 512) # [N, 18, 512]

Indeed, if I train my own network, I can have different shapes for the latent vectors, depending on my config and depending on my image resolution. For instance, in my case (256x256 images), shape is [14, 512] (based on the output of projector.py).

I can use my trained network to project a real image (using your script projector.py). However, the assert prevents me from using my latent vector (obtained via projection) and my trained network to generate the projected image (using your script generate.py). Once the assert is commented out, everything works as intended.

Edit: As shown below, the AssertionError would arise with your own pre-trained networks at 256x256 resolution.

Training starts from 0 after resuming, even though the log says it resumed.

Hey, I tried to return to the model I trained, I specified the full path of the PKL after the --reusme flag.
In the log it claimed that it loaded the PKL


[Constructing networks...
Setting up TensorFlow plugin "fused_bias_act.cu": Loading... Done.
Setting up TensorFlow plugin "upfirdn_2d.cu": Loading... Done.
Resuming from "./results/00000-FLAGS1-mirror-mirrory-11gb-gpu-bg-resumeffhq512/network-snapshot-000032.pkl"

G                             Params    OutputShape         WeightShape     
---                           ---       ---                 ---             
latents_in                    -         (?, 512)            -               
labels_in                     -         (?, 0)              -               
G_mapping/Normalize           -         (?, 512)            -               
G_mapping/Dense0              262656    (?, 512)            (512, 512)      
G_mapping/Dense1              262656    (?, 512)            (512, 512)      
G_mapping/Broadcast           -         (?, 16, 512)        -               
dlatent_avg                   -         (512,)              -               
Truncation/Lerp               -         (?, 16, 512)        -               
G_synthesis/4x4/Const         8192      (?, 512, 4, 4)      (1, 512, 4, 4)  
G_synthesis/4x4/Conv          2622465   (?, 512, 4, 4)      (3, 3, 512, 512)
G_synthesis/4x4/ToRGB         264195    (?, 3, 4, 4)        (1, 1, 512, 3)  
G_synthesis/8x8/Conv0_up      2622465   (?, 512, 8, 8)      (3, 3, 512, 512)
G_synthesis/8x8/Conv1         2622465   (?, 512, 8, 8)      (3, 3, 512, 512)
G_synthesis/8x8/Upsample      -         (?, 3, 8, 8)        -               
G_synthesis/8x8/ToRGB         264195    (?, 3, 8, 8)        (1, 1, 512, 3)  
G_synthesis/16x16/Conv0_up    2622465   (?, 512, 16, 16)    (3, 3, 512, 512)
G_synthesis/16x16/Conv1       2622465   (?, 512, 16, 16)    (3, 3, 512, 512)
G_synthesis/16x16/Upsample    -         (?, 3, 16, 16)      -               
G_synthesis/16x16/ToRGB       264195    (?, 3, 16, 16)      (1, 1, 512, 3)  
G_synthesis/32x32/Conv0_up    2622465   (?, 512, 32, 32)    (3, 3, 512, 512)
G_synthesis/32x32/Conv1       2622465   (?, 512, 32, 32)    (3, 3, 512, 512)
G_synthesis/32x32/Upsample    -         (?, 3, 32, 32)      -               
G_synthesis/32x32/ToRGB       264195    (?, 3, 32, 32)      (1, 1, 512, 3)  
G_synthesis/64x64/Conv0_up    2622465   (?, 512, 64, 64)    (3, 3, 512, 512)
G_synthesis/64x64/Conv1       2622465   (?, 512, 64, 64)    (3, 3, 512, 512)
G_synthesis/64x64/Upsample    -         (?, 3, 64, 64)      -               
G_synthesis/64x64/ToRGB       264195    (?, 3, 64, 64)      (1, 1, 512, 3)  
G_synthesis/128x128/Conv0_up  1442561   (?, 256, 128, 128)  (3, 3, 512, 256)
G_synthesis/128x128/Conv1     721409    (?, 256, 128, 128)  (3, 3, 256, 256)
G_synthesis/128x128/Upsample  -         (?, 3, 128, 128)    -               
G_synthesis/128x128/ToRGB     132099    (?, 3, 128, 128)    (1, 1, 256, 3)  
G_synthesis/256x256/Conv0_up  426369    (?, 128, 256, 256)  (3, 3, 256, 128)
G_synthesis/256x256/Conv1     213249    (?, 128, 256, 256)  (3, 3, 128, 128)
G_synthesis/256x256/Upsample  -         (?, 3, 256, 256)    -               
G_synthesis/256x256/ToRGB     66051     (?, 3, 256, 256)    (1, 1, 128, 3)  
G_synthesis/512x512/Conv0_up  139457    (?, 64, 512, 512)   (3, 3, 128, 64) 
G_synthesis/512x512/Conv1     69761     (?, 64, 512, 512)   (3, 3, 64, 64)  
G_synthesis/512x512/Upsample  -         (?, 3, 512, 512)    -               
G_synthesis/512x512/ToRGB     33027     (?, 3, 512, 512)    (1, 1, 64, 3)   
---                           ---       ---                 ---             
Total                         28700647                                      


D                    Params    OutputShape         WeightShape     
---                  ---       ---                 ---             
images_in            -         (?, 3, 512, 512)    -               
labels_in            -         (?, 0)              -               
512x512/FromRGB      256       (?, 64, 512, 512)   (1, 1, 3, 64)   
512x512/Conv0        36928     (?, 64, 512, 512)   (3, 3, 64, 64)  
512x512/Conv1_down   73856     (?, 128, 256, 256)  (3, 3, 64, 128) 
512x512/Skip         8192      (?, 128, 256, 256)  (1, 1, 64, 128) 
256x256/Conv0        147584    (?, 128, 256, 256)  (3, 3, 128, 128)
256x256/Conv1_down   295168    (?, 256, 128, 128)  (3, 3, 128, 256)
256x256/Skip         32768     (?, 256, 128, 128)  (1, 1, 128, 256)
128x128/Conv0        590080    (?, 256, 128, 128)  (3, 3, 256, 256)
128x128/Conv1_down   1180160   (?, 512, 64, 64)    (3, 3, 256, 512)
128x128/Skip         131072    (?, 512, 64, 64)    (1, 1, 256, 512)
64x64/Conv0          2359808   (?, 512, 64, 64)    (3, 3, 512, 512)
64x64/Conv1_down     2359808   (?, 512, 32, 32)    (3, 3, 512, 512)
64x64/Skip           262144    (?, 512, 32, 32)    (1, 1, 512, 512)
32x32/Conv0          2359808   (?, 512, 32, 32)    (3, 3, 512, 512)
32x32/Conv1_down     2359808   (?, 512, 16, 16)    (3, 3, 512, 512)
32x32/Skip           262144    (?, 512, 16, 16)    (1, 1, 512, 512)
16x16/Conv0          2359808   (?, 512, 16, 16)    (3, 3, 512, 512)
16x16/Conv1_down     2359808   (?, 512, 8, 8)      (3, 3, 512, 512)
16x16/Skip           262144    (?, 512, 8, 8)      (1, 1, 512, 512)
8x8/Conv0            2359808   (?, 512, 8, 8)      (3, 3, 512, 512)
8x8/Conv1_down       2359808   (?, 512, 4, 4)      (3, 3, 512, 512)
8x8/Skip             262144    (?, 512, 4, 4)      (1, 1, 512, 512)
4x4/MinibatchStddev  -         (?, 513, 4, 4)      -               
4x4/Conv             2364416   (?, 512, 4, 4)      (3, 3, 513, 512)
4x4/Dense0           4194816   (?, 512)            (8192, 512)     
Output               513       (?, 1)              (512, 1)        
---                  ---       ---                 ---             
Total                28982849                                      

Exporting sample images...
Replicating networks across 1 GPUs...
Initializing augmentations...
Setting up optimizers...
Constructing training graph...
Finalizing training ops...
Initializing metrics...
Training for 25000 kimg...

tick 0     kimg 0.0      time 1m 40s       sec/tick 28.1    sec/kimg 877.83  maintenance 71.9   gpumem 10.0  augment 0.000

But as you can see, it also says "kimg 0.0". it means the tranining started from 0, and I can see it really did start from 0 by looking at the output images.

I tried multiple PKL snapshots, but the problem persists.

my full arguanents were:

!python train.py --outdir ./results --snap=2 --cfg=auto --data=./datasets/secret --augpipe="bg" --mirror=True --mirrory=True --metrics=None --resume="./results/00000-FLAGS1-mirror-mirrory-11gb-gpu-bg-resumeffhq512/network-snapshot-000032.pkl" --augpipe="bg"

And I ran the training on Google Colab.
I'm looking for solutions
Thank you very much

FID scores on FFHQ 1K

Hi,

Appreciate for sharing the implementation!

I cannot reproduce the FID scores of ADA (w/o bcr) on the FFHQ 1K dataset (21.29 according to Figure 7(c) in the paper).
Here are the commands I had:
python dataset_tool.py pack --num_train=1000 --num_validation=10000 --seed=123 --tfrecord_dir=~/datasets/ffhq1k --unpacked_dir=/tmp/ffhq-unpacked
python train.py --outdir=~/training-runs --gpus=8 --data=~/datasets/ffhq1k --cfg=paper256 --aug=ada --metricdata=~/ffhq-dataset/tfrecords/ffhq
I can only got the best FID scores around 24. Did I do something wrong?

Thanks!

how to finetune ?

Hi, If I have some data images (maybe 1k) how to finetune with it ?

Project image to latent space with condition

If the network was trained with labels, what's the solution to do latent vector interpolation with labels? Moreover, is there a way to project an image back to latent space with such conditional network?

Overfitting?!

Below I describe why I think you are overfitting and how you can test.

On CIFAR10 you report

fid(x_train_50k, fakes_50k)=2.42
fid(x_val_10k, fakes_10k)=7.01

But train/val data has higher FID

fid(x_train_50k, fid_val_10k)=3.1

It is possible that the difference from 3.1 to 2.42 is caused by the additional 40k samples. Could you compute

fid(x_train_10k, fakes_10k)

If this value is below 3.1 your CIFAR model is overfitting.

It is true that FID authors recommend reporting FID on training set. But if you are overfitting to the extent that the model distribution is closer to training data than the valiation set is, I think it is misleading to report training FID.

I'd be happy to hear any counter-arguments to this point, and apologize if I misunderstood anything.

Can't see the GPU on Colab

Hi, I try to run the code in Google Colab with GPU. However it fails to load the pertained network. It can't find the GPU.

the code I ran is below as well as the error message. Thanks for the advice
!python generate.py --outdir=out --trunc=1 --seeds=0-35 --class=1
--network='https://nvlabs-fi-cdn.nvidia.com/stylegan2-ada/pretrained/cifar10.pkl'

File "/content/stylegan2-ada/dnnlib/tflib/ops/fused_bias_act.py", line 72, in fused_bias_act
return impl_dict[impl](x=x, b=b, axis=axis, act=act, alpha=alpha, gain=gain, clamp=clamp)
File "/content/stylegan2-ada/dnnlib/tflib/ops/fused_bias_act.py", line 132, in _fused_bias_act_cuda
cuda_op = _get_plugin().fused_bias_act
File "/content/stylegan2-ada/dnnlib/tflib/ops/fused_bias_act.py", line 18, in _get_plugin
return custom_ops.get_plugin(os.path.splitext(file)[0] + '.cu')
File "/content/stylegan2-ada/dnnlib/tflib/custom_ops.py", line 139, in get_plugin
compile_opts += f' --gpu-architecture={_get_cuda_gpu_arch_string()}'
File "/content/stylegan2-ada/dnnlib/tflib/custom_ops.py", line 60, in _get_cuda_gpu_arch_string
raise RuntimeError('No GPU devices found')
RuntimeError: No GPU devices found

RTX 30x0 Support

Your current docker image relies on an older version of CUDA.
The current 3080 and 3090 series GPUs are only supported under CUDA 11.1.
It would be wonderful if you could update the image or offer a workaround.

Error message when running with older CUDA:
nvcc fatal : Value 'sm_86' is not defined for option 'gpu-architecture'

How to augment training to focus more on backgrounds?

Hi, I wanna say first that I had great success with StyleGAN2, and was able to improve a YoloV2 model's performance with the synthetically generated data. However, I did notice that the GAN had trouble reproducing different types of backgrounds. For example, even when using an image the model was trained on, it was unable to find dlatent vectors that could produce rocks in the background.

Is there a way to change the training pipeline to focus more on synthetically generating different backgrounds? Or do I need to increase the resolution of my synthetic data so that it can learn to produce various backgrounds?

Augmentation strength always increasing - what is the rt heuristic doing?

Can someone give me some insight about the heuristic value (rt) that controls the augmentation strength in the ada pipeline? In augments.py the code is the following:

strength += nimg_ratio * np.sign(rt - self.tune_target)

I am training using the default tune_target (0.6) and I've realized that strength is always increasing (as rt > tune_target most of the times). I'm close to an augmentation strength of 10 after 1000 kimg or so. What does exactly mean this? What is the reasoning behind rt? Does the augmentation strength increasingly grow because my dataset is not good enoguh so the network feels like the strength needs to keep increasing? Thanks in advance.

Cannot train with raw dataset

Hello, when I try to train using a raw dataset (even when using --use-raw=True), it gives the following errors (for every config, as well):

Traceback (most recent call last):
  File "F:\zPL Folder\a\lib\site-packages\tensorflow\python\client\session.py", line 1356, in _do_call
    return fn(*args)
  File "F:\zPL Folder\a\lib\site-packages\tensorflow\python\client\session.py", line 1341, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "F:\zPL Folder\a\lib\site-packages\tensorflow\python\client\session.py", line 1429, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot batch tensors with different shapes in component 0. First element had shape [3,512,512] and element 2 had shape [1,512,512].
         [[{{node Dataset_1/IteratorGetNext}}]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "train.py", line 620, in <module>
    main()
  File "train.py", line 612, in main
    run_training(**vars(args))
  File "train.py", line 501, in run_training
    training_loop.training_loop(**training_options)
  File "F:\Downloads 2\stylegan2-ada-main\training\training_loop.py", line 135, in training_loop
    grid_size, grid_reals, grid_labels = setup_snapshot_image_grid(training_set)
  File "F:\Downloads 2\stylegan2-ada-main\training\training_loop.py", line 33, in setup_snapshot_image_grid
    reals, labels = training_set.get_minibatch_np(gw * gh)
  File "F:\Downloads 2\stylegan2-ada-main\training\dataset.py", line 182, in get_minibatch_np
    return tflib.run(self._tf_minibatch_np)
  File "F:\Downloads 2\stylegan2-ada-main\dnnlib\tflib\tfutil.py", line 33, in run
    return tf.get_default_session().run(*args, **kwargs)
  File "F:\zPL Folder\a\lib\site-packages\tensorflow\python\client\session.py", line 950, in run
    run_metadata_ptr)
  File "F:\zPL Folder\a\lib\site-packages\tensorflow\python\client\session.py", line 1173, in _run
    feed_dict_tensor, options, run_metadata)
  File "F:\zPL Folder\a\lib\site-packages\tensorflow\python\client\session.py", line 1350, in _do_run
    run_metadata)
  File "F:\zPL Folder\a\lib\site-packages\tensorflow\python\client\session.py", line 1370, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot batch tensors with different shapes in component 0. First element had shape [3,512,512] and element 2 had shape [1,512,512].
         [[node Dataset_1/IteratorGetNext (defined at F:\Downloads 2\stylegan2-ada-main\training\dataset.py:165) ]]

Errors may have originated from an input operation.
Input Source operations connected to node Dataset_1/IteratorGetNext:
 Dataset/IteratorV2 (defined at F:\Downloads 2\stylegan2-ada-main\training\dataset.py:147)

Original stack trace for 'Dataset_1/IteratorGetNext':
  File "train.py", line 620, in <module>
    main()
  File "train.py", line 612, in main
    run_training(**vars(args))
  File "train.py", line 501, in run_training
    training_loop.training_loop(**training_options)
  File "F:\Downloads 2\stylegan2-ada-main\training\training_loop.py", line 135, in training_loop
    grid_size, grid_reals, grid_labels = setup_snapshot_image_grid(training_set)
  File "F:\Downloads 2\stylegan2-ada-main\training\training_loop.py", line 33, in setup_snapshot_image_grid
    reals, labels = training_set.get_minibatch_np(gw * gh)
  File "F:\Downloads 2\stylegan2-ada-main\training\dataset.py", line 180, in get_minibatch_np
    self._tf_minibatch_np = self.get_minibatch_tf()
  File "F:\Downloads 2\stylegan2-ada-main\training\dataset.py", line 165, in get_minibatch_tf
    images, labels = self._tf_iterator.get_next()
  File "F:\zPL Folder\a\lib\site-packages\tensorflow\python\data\ops\iterator_ops.py", line 426, in get_next
    output_shapes=self._structure._flat_shapes, name=name)
  File "F:\zPL Folder\a\lib\site-packages\tensorflow\python\ops\gen_dataset_ops.py", line 1974, in iterator_get_next
    output_shapes=output_shapes, name=name)
  File "F:\zPL Folder\a\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 788, in _apply_op_helper
    op_def=op_def)
  File "F:\zPL Folder\a\lib\site-packages\tensorflow\python\util\deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "F:\zPL Folder\a\lib\site-packages\tensorflow\python\framework\ops.py", line 3616, in create_op
    op_def=op_def)
  File "F:\zPL Folder\a\lib\site-packages\tensorflow\python\framework\ops.py", line 2005, in __init__
    self._traceback = tf_stack.extract_stack()

Thanks in advance for any help

Memory Leak

Almost all of my training runs eventually (after 250k imgs or so) crash without an explicit error message...

how to create dataset with label

If I want to finetune stylegan2-ada with conditional label how I prepare dataset ??

the readem had only for cifar10 dataset with conditional but how to use for my own dataset??

what is the format of data ?

Generating image sequences

Hi, thanks for the contribution!

Using stylegan2-ada, is it possible to generate consecutive frames of a video? (such as sequences in the KITTI/Cityscapes dataset)
I am not talking about style transfer, I would like to generate completely new sequences.
Or do you happen to know a better GAN-based solution for this problem?

Thanks, Fabian

Projecting to transfer learned model seems broken

Sorry for bringing this here, but I can't find help on this anywhere else.

I've trained a model on top of FFHQ, with a very distinct style for human faces. Now I'm trying to project a real face into the model to get the stylized representation but the results are just... awful. It just looks like a blurry FFHQ representation, with very slight styling.

Am I doing something wrong?

Here are the steps to how I got to this point:

  1. Created a dataset of 500+ images of similar style, aligned the same way as FFHQ portraits.
  2. Ran training with StyleGAN2-ADA, and freezed set at 3, for ~24 hours, or 150k updates/snapshots. The results in the samples are really good, and are heavily stylized the way I want them to be.
  3. Created aligned image out of real picture.
  4. Ran the projector using the trained network & the aligned image.

The command I'm running:

!python projector.py --network=./models/network-snapshot-000150.pkl --target=./projection/aligned/a_01_aligned.png --outdir=./projection/generated

Any help would be greatly appreciated, thank you!

Dimension mismatch error (using gray-scale dataset)

I am now running the model on a gray-scale dataset. As a test, I try to run projector.py, but I get the following error.

`Traceback (most recent call last):
File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 572, in set_shape
unknown_shape)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Dimension 1 in both shapes must be equal, but are 1 and 3. Shapes are [1,1,256,256] and [?,3,?,?].

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "projector.py", line 288, in
main()
File "projector.py", line 283, in main
project(**vars(parser.parse_args()))
File "projector.py", line 225, in project
proj.set_network(Gs)
File "projector.py", line 118, in set_network
self._dist = self._lpips.get_output_for(proc_images_expr, self._target_images_var)
File "/home/ubuntu/jyk_project/stylegan2-ada/dnnlib/tflib/network.py", line 371, in get_output_for
out_expr = self._build_func(*final_inputs, **build_kwargs)
File "", line 140, in lpips_network
File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 575, in set_shape
raise ValueError(str(e))
ValueError: Dimension 1 in both shapes must be equal, but are 1 and 3. Shapes are [1,1,256,256] and [?,3,?,?].`

Please. help is appreciated.

Thank you.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.