autonomousvision / stylegan_xl Goto Github PK

View Code? Open in Web Editor NEW

960.0 37.0 113.0 13.91 MB

[SIGGRAPH'22] StyleGAN-XL: Scaling StyleGAN to Large Diverse Datasets

License: MIT License

Python 83.71% C++ 3.96% Cuda 12.33%

stylegan_xl's Introduction

[Project] [PDF]

This repository contains code for our SIGGRAPH'22 paper "StyleGAN-XL: Scaling StyleGAN to Large Diverse Datasets"

by Axel Sauer, Katja Schwarz, and Andreas Geiger.

If you find our code or paper useful, please cite

@InProceedings{Sauer2021ARXIV,
  author    = {Axel Sauer and Katja Schwarz and Andreas Geiger},
  title     = {StyleGAN-XL: Scaling StyleGAN to Large Diverse Datasets},
  journal   = {arXiv.org},
  volume    = {abs/2201.00273},
  year      = {2022},
  url       = {https://arxiv.org/abs/2201.00273},
}

Rank on Papers With Code

Related Projects

Projected GANs Converge Faster (NeurIPS'21) - Official Repo -
StyleGAN-XL + CLIP (Implemented by CasualGANPapers) -
StyleGAN-XL + CLIP (Modified by Katherine Crowson to optimize in W+ space) -

Requirements

64-bit Python 3.8 and PyTorch 1.9.0 (or later). See https://pytorch.org for PyTorch install instructions.
CUDA toolkit 11.1 or later.
GCC 7 or later compilers. The recommended GCC version depends on your CUDA version; see for example, CUDA 11.4 system requirements.
If you run into problems when setting up the custom CUDA kernels, we refer to the Troubleshooting docs of the original StyleGAN3 repo and the following issues: #23.
Windows user struggling installing the env might find #10 helpful.
Use the following commands with Miniconda3 to create and activate your PG Python environment:
- conda env create -f environment.yml
- conda activate sgxl

Data Preparation

For a quick start, you can download the few-shot datasets provided by the authors of FastGAN. You can download them here. To prepare the dataset at the respective resolution, run

python dataset_tool.py --source=./data/pokemon --dest=./data/pokemon256.zip \
  --resolution=256x256 --transform=center-crop

You need to follow our progressive growing scheme to get the best results. Therefore, you should prepare separate zips for each training resolution. You can get the datasets we used in our paper at their respective websites (FFHQ, ImageNet).

Training

For progressive growing, we train a stem on low resolution, e.g., 16² pixels. When the stem is finished, i.e., FID is saturating, you can start training the upper stages; we refer to these as superresolution stages.

Training the stem

Training StyleGAN-XL on Pokemon using 8 GPUs:

python train.py --outdir=./training-runs/pokemon --cfg=stylegan3-t --data=./data/pokemon16.zip \
    --gpus=8 --batch=64 --mirror=1 --snap 10 --batch-gpu 8 --kimg 10000 --syn_layers 10

--batch specifies the overall batch size, --batch-gpu specifies the batch size per GPU. The training loop will automatically accumulate gradients if you use fewer GPUs until the overall batch size is reached.

Samples and metrics are saved in outdir. If you don't want to track metrics, set --metrics=none. You can inspect fid50k_full.json or run tensorboard in training-runs/ to monitor the training progress.

For a class-conditional dataset (ImageNet, CIFAR-10), add the flag --cond True . The dataset needs to contain the class labels; see the StyleGAN2-ADA repo on how to prepare class-conditional datasets.

Training the super-resolution stages

Continuing with pretrained stem:

python train.py --outdir=./training-runs/pokemon --cfg=stylegan3-t --data=./data/pokemon32.zip \
  --gpus=8 --batch=64 --mirror=1 --snap 10 --batch-gpu 8 --kimg 10000 --syn_layers 10 \
  --superres --up_factor 2 --head_layers 7 \
  --path_stem training-runs/pokemon/00000-stylegan3-t-pokemon16-gpus8-batch64/best_model.pkl

--up_factor allows to train several stages at once, i.e., with --up_factor=4 and a 16² stem you can directly train at resolution 64².

If you have enough compute, a good tactic is to train several stages in parallel and then restart the superresolution stage training once in a while. The current stage will then reload its previous stem's best_model.pkl. Performance can sometimes drop at first because of domain shift, but the superresolution stage quickly recovers and improves further.

Training recommendations for datasets other than ImageNet

The default settings are tuned for ImageNet. For smaller datasets (<50k images) or well-curated datasets (FFHQ), you can significantly decrease the model size enabling much faster training. Recommended settings are: --cbase 16384 --cmax 256 --syn_layers 7 and for superresolution stages --head_layers 4.

Suppose you want to train as few stages as possible. We recommend training a 32x32 or 64x64 stem, then directly scaling to the final resolution (as described above, you must adjust --up_factor accordingly). However, generally, progressive growing yields better results faster as the throughput is much higher at lower resolutions. This can be seen in this figure by Karras et al., 2017:

Generating Samples & Interpolations

To generate samples and interpolation videos, run

python gen_images.py --outdir=out --trunc=0.7 --seeds=10-15 --batch-sz 1 \
  --network=https://s3.eu-central-1.amazonaws.com/avg-projects/stylegan_xl/models/pokemon256.pkl

and

python gen_video.py --output=lerp.mp4 --trunc=0.7 --seeds=0-31 --grid=4x2 \
  --network=https://s3.eu-central-1.amazonaws.com/avg-projects/stylegan_xl/models/pokemon256.pkl

For class-conditional models, you can pass the class index via --class, a index-to-label dictionary for Imagenet can be found here. For interpolation between classes, provide, e.g., --cls=0-31 to gen_video.py. The list of classes has to be the same length as --seeds.

To generate a conditional sample sheet, run

python gen_class_samplesheet.py --outdir=sample_sheets --trunc=1.0 \
  --network=https://s3.eu-central-1.amazonaws.com/avg-projects/stylegan_xl/models/imagenet128.pkl \
  --samples-per-class 4 --classes 0-32 --grid-width 32

For ImageNet models, we enable multi-modal truncation (proposed by Self-Distilled GAN). We generated 600k find 10k cluster centroids via k-means. For a given samples, multi-modal truncation finds the closest centroids and interpolates towards it. To switch from uni-model to multi-modal truncation, pass

_{--centroids-path=https://s3.eu-central-1.amazonaws.com/avg-projects/stylegan_xl/models/imagenet_centroids.npy}

No Truncation	Uni-Modal Truncation	Multi-Modal Truncation

Image Inversion

To invert a given image via latent optimization, and optionally use our reimplementation of Pivotal Tuning Inversion, run

python run_inversion.py --outdir=inversion_out \
  --target media/jay.png \
  --inv-steps 1000 --run-pti --pti-steps 350 \
  --network=https://s3.eu-central-1.amazonaws.com/avg-projects/stylegan_xl/models/imagenet512.pkl

Provide an image via target, it is automatically resized and center-cropped to match the generator network. You do not need to provide a class for ImageNet models, we infer the class of a given sample via a pretrained classifier.

Image Editing

To use our reimplementation of StyleMC, and generate the example above, run

python run_stylemc.py --outdir=stylemc_out \
  --text-prompt "a chimpanzee | laughter | happyness| happy chimpanzee | happy monkey | smile | grin" \
  --seeds 0-256 --class-idx 367 --layers 10-30 --edit-strength 0.75 --init-seed 49 \
  --network=https://s3.eu-central-1.amazonaws.com/avg-projects/stylegan_xl/models/imagenet128.pkl \
  --bigger-network https://s3.eu-central-1.amazonaws.com/avg-projects/stylegan_xl/models/imagenet1024.pkl

Recommended workflow:

Sample images via gen_images.py.
Pick a sample and use it as the inital image for stylemc.py by providing --init-seed and --class-idx.
Find a direction in style space via --text-prompt.
Finetune --edit-strength, --layers, and amount of --seeds.
Once you found a good setting, provide a larger model via --bigger-network. The script still optimizes the direction for the smaller model, but uses the bigger model for the final output.

Pretrained Models

We provide the following pretrained models (pass the url as PATH_TO_NETWORK_PKL):

Dataset	Res	FID	PATH
ImageNet	16²	0.73	_{https://s3.eu-central-1.amazonaws.com/avg-projects/stylegan_xl/models/imagenet16.pkl}
ImageNet	32²	1.11	_{https://s3.eu-central-1.amazonaws.com/avg-projects/stylegan_xl/models/imagenet32.pkl}
ImageNet	64²	1.52	_{https://s3.eu-central-1.amazonaws.com/avg-projects/stylegan_xl/models/imagenet64.pkl}
ImageNet	128²	1.77	_{https://s3.eu-central-1.amazonaws.com/avg-projects/stylegan_xl/models/imagenet128.pkl}
ImageNet	256²	2.26	_{https://s3.eu-central-1.amazonaws.com/avg-projects/stylegan_xl/models/imagenet256.pkl}
ImageNet	512²	2.42	_{https://s3.eu-central-1.amazonaws.com/avg-projects/stylegan_xl/models/imagenet512.pkl}
ImageNet	1024²	2.51	_{https://s3.eu-central-1.amazonaws.com/avg-projects/stylegan_xl/models/imagenet1024.pkl}
CIFAR10	32²	1.85	_{https://s3.eu-central-1.amazonaws.com/avg-projects/stylegan_xl/models/cifar10.pkl}
FFHQ	256²	2.19	_{https://s3.eu-central-1.amazonaws.com/avg-projects/stylegan_xl/models/ffhq256.pkl}
FFHQ	512²	2.23	_{https://s3.eu-central-1.amazonaws.com/avg-projects/stylegan_xl/models/ffhq512.pkl}
FFHQ	1024²	2.02	_{https://s3.eu-central-1.amazonaws.com/avg-projects/stylegan_xl/models/ffhq1024.pkl}
Pokemon	256²	23.97	_{https://s3.eu-central-1.amazonaws.com/avg-projects/stylegan_xl/models/pokemon256.pkl}
Pokemon	512²	23.82	_{https://s3.eu-central-1.amazonaws.com/avg-projects/stylegan_xl/models/pokemon512.pkl}
Pokemon	1024²	25.47	_{https://s3.eu-central-1.amazonaws.com/avg-projects/stylegan_xl/models/pokemon1024.pkl}

Quality Metrics

Per default, train.py tracks FID50k during training. To calculate metrics for a specific network snapshot, run

python calc_metrics.py --metrics=fid50k_full --network=PATH_TO_NETWORK_PKL

To see the available metrics, run

python calc_metrics.py --help

We provide precomputed FID statistics for all pretrained models:

wget https://s3.eu-central-1.amazonaws.com/avg-projects/stylegan_xl/gan-metrics.zip
unzip gan-metrics.zip -d dnnlib/

Further Information

This repo builds on the codebase of StyleGAN3 and our previous project Projected GANs Converge Faster.

stylegan_xl's People

Contributors

Stargazers

Watchers

Forkers

ak391 torment123 dmarx mehdidc androd21 wn1695173791 kalufinnle guoyang-xie cryptowealth-technology ccliangxd dvschultz netruk44 lime-cakes tripplyons peterouzh celsopitta swordbearfire cnu1439 ricklentz peterzs changdedu ppthe2nd eastrain21 jxzhangjhu nicholasjprimianomd zsjinit ayankumarbhunia cv-synthesis whuhxb liuqinglong110 jingxuanyang swing148 inarikami alexander-vasilyev-liftoff rsn870 codeaudit zivzone 516025 aravind598 elives123 image-ai shinypond baoren1996 jihadfenix shinoharahare irisrainbowneko nhgowtham vcornejo22 shengyu-meng thenickestnick ydmeira basemdabbour aiarcade 0x1355 oceanechy janfschr huvers maximkm neoh1 hukkelas howtocreateaname cleancoindev pan2017 mahsashv nbardy integritynoble gxsong limchr aibots-team ernoult siddharth9820 avdravid mrcodechef jackzhousz jimmywutang standardgalactic lzl199704 liquidinstinct frozvolca fonzen1 diningsystem richienb benelbo defi-coder-news-letter kwon-taewoo iramshiv ageliss keyu-tian kaselby benodry paulhuangkm pkulwj1994 parthaeth zhengwjie u0159868 rimless chen132407 cocktail-salad prajwalsingh carlyliu

stylegan_xl's Issues

missing .pkl on first round of training

Working through the pokemon demo. I think there might be a step missing in the instructions to download pre-trained weights. Encountered the following error attempting to run train.py on first round of progressive training:

(sgxl) user@host:~/dmarx/stylegan_xl$ python train.py --outdir=./training-runs/pokemon --cfg=stylegan3-t --data=../../datasets/pokemon16.zip     --gpus=4 --batch=64 --mirror=1 --snap 10 --batch-gpu 16 --kimg 10000 --stem --syn_layers 10

Training options:
{
  "G_kwargs": {
    "class_name": "training.networks_stylegan3_resetting.Generator",
    "z_dim": 64,
    "w_dim": 512,
    "mapping_kwargs": {
      "num_layers": 2,
      "rand_embedding": false
    },
    "channel_base": 65536,
    "channel_max": 1024,
    "magnitude_ema_beta": 0.9977843871238888,
    "num_layers": 10
  },
  "D_kwargs": {
    "class_name": "pg_modules.discriminator.ProjectedDiscriminator",
    "backbones": [
      "deit_base_distilled_patch16_224",
      "tf_efficientnet_lite0"
    ],
    "diffaug": true,
    "interp224": true,
    "backbone_kwargs": {
      "cout": 64,
      "expand": true,
      "proj_type": 1,
      "num_discs": 4,
      "cond": false
    }
  },
  "G_opt_kwargs": {
    "class_name": "torch.optim.Adam",
    "betas": [
      0,
      0.99
    ],
    "eps": 1e-08,
    "lr": 0.0025
  },
  "D_opt_kwargs": {
    "class_name": "torch.optim.Adam",
    "betas": [
      0,
      0.99
    ],
    "eps": 1e-08,
    "lr": 0.002
  },
  "loss_kwargs": {
    "class_name": "training.loss.ProjectedGANLoss",
    "blur_init_sigma": 2,
    "blur_fade_kimg": 300,
    "pl_weight": 2.0,
    "pl_no_weight_grad": true,
    "style_mixing_prob": 0.0,
    "cls_weight": 0.0,
    "cls_model": "deit_small_distilled_patch16_224",
    "train_head_only": false
  },
  "data_loader_kwargs": {
    "pin_memory": true,
    "prefetch_factor": 2,
    "num_workers": 3
  },
  "training_set_kwargs": {
    "class_name": "training.dataset.ImageFolderDataset",
    "path": "../../datasets/pokemon16.zip",
    "use_labels": false,
    "max_size": 4729,
    "xflip": true,
    "resolution": 16,
    "random_seed": 0
  },
  "num_gpus": 4,
  "batch_size": 64,
  "batch_gpu": 16,
  "metrics": [
    "fid50k_full"
  ],
  "total_kimg": 10000,
  "kimg_per_tick": 4,
  "image_snapshot_ticks": 10,
  "network_snapshot_ticks": 10,
  "random_seed": 0,
  "ema_kimg": 20.0,
  "restart_every": 999999999,
  "run_dir": "./training-runs/pokemon/00000-stylegan3-t-pokemon16-gpus4-batch64"
}

Output directory:    ./training-runs/pokemon/00000-stylegan3-t-pokemon16-gpus4-batch64
Number of GPUs:      4
Batch size:          64 images
Training duration:   10000 kimg
Dataset path:        ../../datasets/pokemon16.zip
Dataset size:        4729 images
Dataset resolution:  16
Dataset labels:      False
Dataset x-flips:     True

Creating output directory...
Launching processes...
Loading training set...

Num images:  9458
Image shape: [3, 16, 16]
Label shape: [0]

Constructing networks...
Traceback (most recent call last):
  File "train.py", line 362, in <module>
    main()  # pylint: disable=no-value-for-parameter
  File "/home/user/miniconda/envs/sgxl/lib/python3.7/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/home/user/miniconda/envs/sgxl/lib/python3.7/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/home/user/miniconda/envs/sgxl/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/user/miniconda/envs/sgxl/lib/python3.7/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "train.py", line 347, in main
    launch_training(c=c, desc=desc, outdir=opts.outdir, dry_run=opts.dry_run)
  File "train.py", line 106, in launch_training
    torch.multiprocessing.spawn(fn=subprocess_fn, args=(c, temp_dir), nprocs=c.num_gpus)
  File "/home/user/miniconda/envs/sgxl/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 199, in spawn
    return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
  File "/home/user/miniconda/envs/sgxl/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 157, in start_processes
    while not context.join():
  File "/home/user/miniconda/envs/sgxl/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 118, in join
    raise Exception(msg)
Exception: 

-- Process 1 terminated with the following error:
Traceback (most recent call last):
  File "/home/user/miniconda/envs/sgxl/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 19, in _wrap
    fn(i, *args)
  File "/home/user/dmarx/stylegan_xl/train.py", line 49, in subprocess_fn
    training_loop.training_loop(rank=rank, **c)
  File "/home/user/dmarx/stylegan_xl/training/training_loop.py", line 170, in training_loop
    G = dnnlib.util.construct_class_by_name(**G_kwargs, **common_kwargs).train().requires_grad_(False).to(device) # subclass of torch.nn.Module
  File "/home/user/dmarx/stylegan_xl/dnnlib/util.py", line 303, in construct_class_by_name
    return call_func_by_name(*args, func_name=class_name, **kwargs)
  File "/home/user/dmarx/stylegan_xl/dnnlib/util.py", line 298, in call_func_by_name
    return func_obj(*args, **kwargs)
  File "/home/user/dmarx/stylegan_xl/torch_utils/persistence.py", line 104, in __init__
    super().__init__(*args, **kwargs)
  File "/home/user/dmarx/stylegan_xl/training/networks_stylegan3_resetting.py", line 583, in __init__
    self.mapping = MappingNetwork(z_dim=z_dim, c_dim=c_dim, w_dim=w_dim, num_ws=self.num_ws, **mapping_kwargs)
  File "/home/user/dmarx/stylegan_xl/torch_utils/persistence.py", line 104, in __init__
    super().__init__(*args, **kwargs)
  File "/home/user/dmarx/stylegan_xl/training/networks_stylegan3_resetting.py", line 138, in __init__
    with open(embed_path, 'rb') as f:
FileNotFoundError: [Errno 2] No such file or directory: 'in_embeddings/tf_efficientnet_lite0.pkl'

Possible Metrics Issues

Training works great however when it gets to calculating metrics I receive the following errors.

`Calculating the stats for this dataset the first time

Saving them to ./dnnlib/gan-metrics/Solitude16-inception-2015-12-05-d3fec46f71c9675575ac92ce317e7832.pkl
/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py:481: UserWarning: This DataLoader will create 3 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
cpuset_checked))
0%| | 0/69 [00:00<?, ?it/s]
1%|1 | 1/69 [00:19<22:37, 19.96s/it]
3%|2 | 2/69 [00:20<09:20, 8.37s/it]
4%|4 | 3/69 [00:20<05:04, 4.61s/it]
6%|5 | 4/69 [00:22<03:54, 3.60s/it]
7%|7 | 5/69 [00:22<02:31, 2.37s/it]
9%|8 | 6/69 [00:22<01:41, 1.61s/it]
10%|# | 7/69 [00:22<01:10, 1.14s/it]
12%|#1 | 8/69 [00:23<00:49, 1.23it/s]
13%|#3 | 9/69 [00:23<00:36, 1.66it/s]
14%|#4 | 10/69 [00:23<00:27, 2.18it/s]
16%|#5 | 11/69 [00:23<00:20, 2.80it/s]
17%|#7 | 12/69 [00:23<00:16, 3.48it/s]
19%|#8 | 13/69 [00:23<00:13, 4.16it/s]
20%|## | 14/69 [00:23<00:11, 4.79it/s]
22%|##1 | 15/69 [00:23<00:09, 5.43it/s]
23%|##3 | 16/69 [00:24<00:08, 6.01it/s]
25%|##4 | 17/69 [00:24<00:08, 6.41it/s]
26%|##6 | 18/69 [00:24<00:07, 6.81it/s]
28%|##7 | 19/69 [00:24<00:07, 7.12it/s]
29%|##8 | 20/69 [00:24<00:06, 7.20it/s]
30%|### | 21/69 [00:24<00:06, 7.41it/s]
32%|###1 | 22/69 [00:24<00:06, 7.33it/s]
33%|###3 | 23/69 [00:24<00:06, 7.39it/s]
35%|###4 | 24/69 [00:25<00:05, 7.51it/s]
36%|###6 | 25/69 [00:25<00:05, 7.67it/s]
38%|###7 | 26/69 [00:25<00:05, 7.69it/s]
39%|###9 | 27/69 [00:25<00:05, 7.60it/s]
41%|#### | 28/69 [00:25<00:05, 7.00it/s]
42%|####2 | 29/69 [00:25<00:06, 6.51it/s]
43%|####3 | 30/69 [00:25<00:05, 6.64it/s]
45%|####4 | 31/69 [00:26<00:05, 7.01it/s]
46%|####6 | 32/69 [00:26<00:05, 7.36it/s]
48%|####7 | 33/69 [00:26<00:04, 7.65it/s]
49%|####9 | 34/69 [00:26<00:04, 7.19it/s]
51%|##### | 35/69 [00:26<00:04, 7.51it/s]
52%|#####2 | 36/69 [00:26<00:04, 7.57it/s]
54%|#####3 | 37/69 [00:26<00:04, 7.24it/s]
55%|#####5 | 38/69 [00:27<00:04, 7.36it/s]
57%|#####6 | 39/69 [00:27<00:04, 7.42it/s]
58%|#####7 | 40/69 [00:27<00:03, 7.58it/s]
59%|#####9 | 41/69 [00:27<00:03, 7.54it/s]
61%|###### | 42/69 [00:27<00:03, 7.49it/s]
62%|######2 | 43/69 [00:27<00:03, 7.61it/s]
64%|######3 | 44/69 [00:27<00:03, 7.70it/s]
65%|######5 | 45/69 [00:27<00:03, 7.80it/s]
67%|######6 | 46/69 [00:28<00:03, 7.54it/s]
68%|######8 | 47/69 [00:28<00:02, 7.54it/s]
70%|######9 | 48/69 [00:28<00:02, 7.57it/s]
71%|#######1 | 49/69 [00:28<00:02, 7.45it/s]
72%|#######2 | 50/69 [00:28<00:02, 7.38it/s]
74%|#######3 | 51/69 [00:28<00:02, 7.32it/s]
75%|#######5 | 52/69 [00:28<00:02, 6.98it/s]
77%|#######6 | 53/69 [00:29<00:02, 7.25it/s]
78%|#######8 | 54/69 [00:29<00:02, 7.16it/s]
80%|#######9 | 55/69 [00:29<00:01, 7.42it/s]
81%|########1 | 56/69 [00:29<00:01, 7.54it/s]
83%|########2 | 57/69 [00:29<00:01, 7.67it/s]
84%|########4 | 58/69 [00:29<00:01, 7.76it/s]
86%|########5 | 59/69 [00:29<00:01, 7.80it/s]
87%|########6 | 60/69 [00:29<00:01, 7.57it/s]
88%|########8 | 61/69 [00:30<00:01, 7.54it/s]
90%|########9 | 62/69 [00:30<00:00, 7.60it/s]
91%|#########1| 63/69 [00:30<00:00, 7.75it/s]
93%|#########2| 64/69 [00:30<00:00, 8.03it/s]
94%|#########4| 65/69 [00:30<00:00, 8.23it/s]
96%|#########5| 66/69 [00:30<00:00, 8.27it/s]
97%|#########7| 67/69 [00:30<00:00, 8.36it/s]
99%|#########8| 68/69 [00:30<00:00, 8.14it/s]
100%|##########| 69/69 [00:31<00:00, 3.60it/s]
100%|##########| 69/69 [00:32<00:00, 2.10it/s]
Traceback (most recent call last):
File "/content/drive/.shortcut-targets-by-id/1f-kIJloZwicV2WXZYcNGGXmZ9dyxIaeO/colab-sg3XL/stylegan_xl/dnnlib/util.py", line 45, in getattr
return self[name]
KeyError: 'truncation_psi'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "train.py", line 332, in
main() # pylint: disable=no-value-for-parameter
File "/usr/local/lib/python3.7/dist-packages/click/core.py", line 829, in call
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.7/dist-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.7/dist-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "train.py", line 317, in main
launch_training(c=c, desc=desc, outdir=opts.outdir, dry_run=opts.dry_run)
File "train.py", line 104, in launch_training
subprocess_fn(rank=0, c=c, temp_dir=temp_dir)
File "train.py", line 49, in subprocess_fn
training_loop.training_loop(rank=rank, **c)
File "/content/drive/.shortcut-targets-by-id/1f-kIJloZwicV2WXZYcNGGXmZ9dyxIaeO/colab-sg3XL/stylegan_xl/training/training_loop.py", line 478, in training_loop
dataset_kwargs=training_set_kwargs, num_gpus=num_gpus, rank=rank, device=device)
File "/content/drive/.shortcut-targets-by-id/1f-kIJloZwicV2WXZYcNGGXmZ9dyxIaeO/colab-sg3XL/stylegan_xl/metrics/metric_main.py", line 43, in calc_metric
results = _metric_dictmetric
File "/content/drive/.shortcut-targets-by-id/1f-kIJloZwicV2WXZYcNGGXmZ9dyxIaeO/colab-sg3XL/stylegan_xl/metrics/metric_main.py", line 83, in fid50k_full
fid = frechet_inception_distance.compute_fid(opts, max_real=None, num_gen=50000)
File "/content/drive/.shortcut-targets-by-id/1f-kIJloZwicV2WXZYcNGGXmZ9dyxIaeO/colab-sg3XL/stylegan_xl/metrics/frechet_inception_distance.py", line 35, in compute_fid
rel_lo=0, rel_hi=1, capture_mean_cov=True, max_items=num_gen, sfid=sfid).get_mean_cov()
File "/content/drive/.shortcut-targets-by-id/1f-kIJloZwicV2WXZYcNGGXmZ9dyxIaeO/colab-sg3XL/stylegan_xl/metrics/metric_utils.py", line 331, in compute_feature_stats_for_generator
w = gen_utils.get_w_from_seed(G, batch_gen, opts.device, truncation_psi=opts.G_kwargs.truncation_psi,
File "/content/drive/.shortcut-targets-by-id/1f-kIJloZwicV2WXZYcNGGXmZ9dyxIaeO/colab-sg3XL/stylegan_xl/dnnlib/util.py", line 47, in getattr
raise AttributeError(name)
AttributeError: truncation_psi`

Error with stylegan3-t config

When I run stylegan3-t config I now get an error that its missing layer_kwargs['conv_kernel'] and the use_radial_filters argument as well. I assume radial filters should be set to False but unsure what conv_kernel should be set to.

perhaps this? (line starting around 231 in train.py)

else:
        c.G_kwargs.class_name = 'training.networks_stylegan3_resetting.Generator'
        c.G_kwargs.magnitude_ema_beta = 0.5 ** (c.batch_size / (20 * 1e3))
        c.G_kwargs.channel_base *= 2  # increase for StyleGAN-XL
        c.G_kwargs.channel_max *= 2   # increase for StyleGAN-XL
        c.G_kwargs.conv_kernel = 3
        c.G_kwargs.use_radial_filters = False

        if opts.cfg == 'stylegan3-r':
            c.G_kwargs.use_radial_filters = True
            c.G_kwargs.conv_kernel = 1 
            c.G_kwargs.channel_base *= 2
            c.G_kwargs.channel_max *= 2

code and pretrained models

Dear StyleGAN_xl team,

Thank you for your great work. The results are amazing.

Do you have plan to release the code and pretrained models? When you will release them?

Thank you for your help.

Best Wishes,

Zongze

How to use the Discriminator of the pre-trained model

There is a clear statement to guide us how to use the generator of the pretrained model, but I want to use the discriminator of your pretrained model, then I could input an image, then use the D to evaluate it, can I ask how to use that?
Or the pre-trained models are all generators , do you have the plan to release D?

PTI inversion code

Hi,

Is there an expected date of release for the PTI code for inversion?

Non-saturating loss or least-square GAN loss

I noticed that the discriminator loss is the least-square GAN loss rather than the non-saturating loss of StyleGAN3, I wonder if the least-square GAN loss can achieve better results and where I could find the discussion on the GAN loss in the paper. Thank you a lot!

Hello, may I ask you how much data you used in your Pokemon image generation project

Hello, may I apply to you for sharing the data set of POKEMON? I've been working on a Pokemon image generation project recently, but the quality of the generated Pokemon is so poor (FID93) that they are like Cthulhu growling

imagenet_centroids

Dear stylegan_xl group,

Thank you for sharing this great work, I really like it.

If I understand correctly, the imagenet_centroids in this link is only for resolution 128, is this correct? If so, could you share the imagenet_centroids for other resolutions?

Thank you for your help.

Best Wishes,

Zongze

When trying to resume training, I got AttributeError: 'VisionTransformer' object has no attribute 'forward_flex'.

Any tips or suggestions will be greatly appreciated. Thanks.

About input c

Sorry to bother you with naive questions, but I don't figure out it after two days (stylegan2 ada is also not clear), I will really appreciate it if you could reply to me!
I wrote like below to input a image with label into the Discriminator of your pretrained model

import torch
import dnnlib
import pickle
import legacy
import PIL
import numpy as np
network_pkl ='./pretrained/imagenet64.pkl'
device = torch.device('cuda')
with dnnlib.util.open_url(network_pkl) as f:
        D = legacy.load_network_pkl(f)['D']
        D = D.eval().requires_grad_(False).to(device)
image =torch.from_numpy(np.array(PIL.Image.open( './data/01.JPEG'))).float().permute(2,0,1)
image = image.reshape((1,3,64,64))
image = image.to(device)

c = torch.randn([1,1000,3,3],dtype=torch.float).cuda()
print(D(image,c))

But I still don't know how to write the label , could I ask if you could give an example of input c, (suppose here is class 0)?
Thank you very much!

multiple nodes

Hello, I would like to ask if you can use multiple machines and cards for training. That is, using multiple nodes, because it looks like you can only support single multi-card

why clip produces bad results

To my knowledge, Clip is used to find the image that most closely matches the one described in the prompt in the latent space of the neural network. So I can not understand why when I type "a yellow tiger" on Stylegan XL + Clip this is what colab gives me back:

Doesn't Imagenet have a class for tigers? It shouldn't generate it any better? Even when I write "a cat", it gives me this result back.
It sure doesn't look like a cat.

Am I missing something or there is an explanation to these results not being of better quality like the ones obtained by sampling?

NameError: name 'Normalize' is not defined

New crash started today using the following:

!python train.py --outdir=/content/drive/MyDrive/results/Mushrooms --cfg=stylegan3-t --data=/content/drive/MyDrive/datasets/Mushrooms/Mushrooms64
--gpus=1 --batch=32 --mirror=1 --snap 10 --batch-gpu 8 --kimg 1000 --syn_layers 10
--superres --up_factor 2 --head_layers 7
--path_stem /content/drive/MyDrive/results/Mushrooms/00001-stylegan3-t-Mushrooms32-gpus1-batch32/best_model.pkl

I tried doing a fresh pull of the repo but no love.

Setting up augmentation...
Distributing across 1 GPUs...
Setting up training phases...
Traceback (most recent call last):
File "train.py", line 336, in
main() # pylint: disable=no-value-for-parameter
File "/usr/local/lib/python3.7/dist-packages/click/core.py", line 829, in call
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.7/dist-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.7/dist-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "train.py", line 321, in main
launch_training(c=c, desc=desc, outdir=opts.outdir, dry_run=opts.dry_run)
File "train.py", line 104, in launch_training
subprocess_fn(rank=0, c=c, temp_dir=temp_dir)
File "train.py", line 49, in subprocess_fn
training_loop.training_loop(rank=rank, **c)
File "/content/drive/MyDrive/colab-sg3XL/stylegan_xl/training/training_loop.py", line 232, in training_loop
loss = dnnlib.util.construct_class_by_name(device=device, G=G, G_ema=G_ema, D=D, augment_pipe=augment_pipe, **loss_kwargs) # subclass of training.loss.Loss
File "/content/drive/MyDrive/colab-sg3XL/stylegan_xl/dnnlib/util.py", line 303, in construct_class_by_name
return call_func_by_name(*args, func_name=class_name, **kwargs)
File "/content/drive/MyDrive/colab-sg3XL/stylegan_xl/dnnlib/util.py", line 298, in call_func_by_name
return func_obj(*args, **kwargs)
File "/content/drive/MyDrive/colab-sg3XL/stylegan_xl/training/loss.py", line 61, in init
self.norm = Normalize(normstats['mean'], normstats['std'])
NameError: name 'Normalize' is not defined

Hello, I want to run your pre-training model on CoLab, but there is this problem

Hello, I want to run your pre-training model on CoLab, but there is this problem in the middle, what should I do?

a question

when i try to train your great project, something wrong. do you know the reason for RuntimeError: No such operator aten::cudnn_convolution_backward_weight??? thank you very much~

Error Running Demo

After following the installation instructions, I get the following error running Cuda 11.6 on an RTX 2080ti

Traceback (most recent call last):
  File "/home/alex/Spring-2022/CV/GAN/resources/stylegan_xl/train.py", line 332, in <module>
    main()  # pylint: disable=no-value-for-parameter
  File "/home/alex/miniconda3/envs/sgxl/lib/python3.9/site-packages/click/core.py", line 1128, in __call__
    return self.main(*args, **kwargs)
  File "/home/alex/miniconda3/envs/sgxl/lib/python3.9/site-packages/click/core.py", line 1053, in main
    rv = self.invoke(ctx)
  File "/home/alex/miniconda3/envs/sgxl/lib/python3.9/site-packages/click/core.py", line 1395, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/alex/miniconda3/envs/sgxl/lib/python3.9/site-packages/click/core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "/home/alex/Spring-2022/CV/GAN/resources/stylegan_xl/train.py", line 317, in main
    launch_training(c=c, desc=desc, outdir=opts.outdir, dry_run=opts.dry_run)
  File "/home/alex/Spring-2022/CV/DogeGAN/resources/stylegan_xl/train.py", line 104, in launch_training
    subprocess_fn(rank=0, c=c, temp_dir=temp_dir)
  File "/home/alex/Spring-2022/CV/GAN/resources/stylegan_xl/train.py", line 49, in subprocess_fn
    training_loop.training_loop(rank=rank, **c)
  File "/home/alex/Spring-2022/CV/GAN/resources/stylegan_xl/training/training_loop.py", line 339, in training_loop
    loss.accumulate_gradients(phase=phase.name, real_img=real_img, real_c=real_c, gen_z=gen_z, gen_c=gen_c, gain=phase.interval, cur_nimg=cur_nimg)
  File "/home/alex/Spring-2022/CV/GAN/resources/stylegan_xl/training/loss.py", line 121, in accumulate_gradients
    loss_Gmain.backward()
  File "/home/alex/miniconda3/envs/sgxl/lib/python3.9/site-packages/torch/_tensor.py", line 363, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/home/alex/miniconda3/envs/sgxl/lib/python3.9/site-packages/torch/autograd/__init__.py", line 173, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
  File "/home/alex/miniconda3/envs/sgxl/lib/python3.9/site-packages/torch/autograd/function.py", line 253, in apply
    return user_fn(self, *args)
  File "/home/alex/Spring-2022/CV/GAN/resources/stylegan_xl/torch_utils/ops/conv2d_gradfix.py", line 144, in backward
    grad_weight = Conv2dGradWeight.apply(grad_output, input)
  File "/home/alex/Spring-2022/CV/GAN/resources/stylegan_xl/torch_utils/ops/conv2d_gradfix.py", line 173, in forward
    return torch._C._jit_get_operation(name)(weight_shape, grad_output, input, padding, stride, dilation, groups, *flags)
RuntimeError: No such operator aten::cudnn_convolution_transpose_backward_weight

high resolution model for ImageNet

Dear StyleGAN-XL team,

Thank you for sharing this wonderful project. Do you have plan to release pretrained model for ImageNet in high resolution (256, 512, 1024)?

Thank you for your help.

Best Wihses,

Zongze

add web demo/model to Huggingface

Hi, would you be interested in adding stylegan_xl to Hugging Face? The Hub offers free hosting, and it would make your work more accessible and visible to the rest of the ML community. There is already a autonomousvision organization on Huggingface: https://huggingface.co/autonomousvision to add models/datasets/spaces(web demos) to.

Example from other organizations:
Keras: https://huggingface.co/keras-io
Microsoft: https://huggingface.co/microsoft
Facebook: https://huggingface.co/facebook

Example spaces with repos:
github: https://github.com/salesforce/BLIP
Spaces: https://huggingface.co/spaces/salesforce/BLIP

github: https://github.com/facebookresearch/omnivore
Spaces: https://huggingface.co/spaces/akhaliq/omnivore

and here are guides for adding spaces/models/datasets to your org

How to add a Space: https://huggingface.co/blog/gradio-spaces
how to add models: https://huggingface.co/docs/hub/adding-a-model
uploading a dataset: https://huggingface.co/docs/datasets/upload_dataset.html

Please let us know if you would be interested and if you have any questions, we can also help with the technical implementation.

Link to StyleGAN3 is wrong

The link to StyleGAN3 at the end of the README is wrong: it links to StyleGAN1.

a question about expected training time

hello, xl-sr. thank you for your great project.
when i train the stem, although the resolution is 16, but when i train it on 1-v100, batch_size = 8, got 120sec/kimg
is it right?? but when i train the same zip using stylegan2-ada, batch_size = 32, got 8sec/kimg.
maybe the stylegan-xl is too slow???
can you answer my question? thank you very much. I really need your help

Possible to use transfer learning?

Is it possible to transfer learn from the largest resolution on a new dataset? This was a common trick that worked pretty well in other versions of StyleGAN and saved a ton of compute time.

I tried the following (but may have gotten some of the arguments wrong):

!python train.py --outdir=./training-runs/test --cfg=stylegan3-t --data=./data/my-dataset-256.zip \
  --gpus=1 --batch=64 --mirror=1 --snap 1 --batch-gpu 4 --kimg 10000 --syn_layers 10 --head_layers 4 \
  --resume=./pretrained/pokemon256.pkl --metrics=None

but I got the following error:
RuntimeError: output with shape [12] doesn't match the broadcast shape [12, 12]

Not an issue - why new StyleganXL -t is so "heavy"?

What is the difference between Stylegan3-r and t generators and StyleganXL-r and t generators? In Stylegan3 both were comparable in speed (+-50 %) of training using the same parameters, in StyleganXL the -t is like 5+ times slower in training in comparison to -r. Is it normal? Also noticed StyleganXL -t generator has something like 80M parameters, when -r has about 20M (Stylegan3 both -t and -r had 20-30M) (trying to write the figures from memory).

About Colab

Nice work! Could you please release Colab code? I want to have a try.

Order of Data Augmentation and Normalization

Hi,
first of all, thank you for your clean code; second, I think I found a (probably not that important) bug.
As far as I understand based on the DA paper, one should apply DiffAugment to images when they are in the (-1, 1) range, I'm saying this based on their codebase, they first scale the input from (0, 255) to (-1, 1) in here and then apply their DA function here
but in this repository, I think you first scale to (-1, 1) here and then scale to (0, 1) here then normalize based on the pretrained feature extractor (which is correct and it should be applied to (0, 1) images) and then finally you apply DA, isn't it more accurate to first apply DA then move to (0, 1) and then normalize?

adaptive discriminator augmentation?

Does stylegan_xl work with the AugmentPipe step? Looks like the pipeline step is checked in to the repository (https://github.com/autonomousvision/stylegan_xl/blob/main/training/augment.py) but it has been removed from the command-line options. Wondering if you'd had a chance to play with it?

StyleGan2 support

Hi,

I am trying to train a model with stylegan2 as the base config with class labels.

I got a couple of errors
(TypeError: __init__() got an unexpected keyword argument ‘num_layers’)
with the kwargs such as:
c.G_kwargs.num_layers = opts.syn_layers
c.G_kwargs.mapping_kwargs.rand_embedding = False
I was unable to find usage of these kwargs in /training/networks_stylegan2.py. I commented them out and continued to get the following error.
Also, the mapping network does not use pretrained class embedding as conditioning in stylegan2.py
self.embed = FullyConnectedLayer(c_dim, embed_features). I modified this based on SG3 code but then got the following error in metric calculation.
Also, self.w_avg in styleGAN2 mapping network is not label specific and this throws an error in metric calculation while synthesizing label specific w's.

Am I doing something wrong or there is no support for class conditional training with stylegan2 as the base?

Code Doesn't Run with cfg=stylegan Because Missing Stylegan2 Loss Function

loss.py is missing the StyleGAN2 loss function.

Using the command

python train.py --data ../datasets/scapes-faces-C-32px.zip --outdir results/stem1 --cfg stylegan3-t --gpus 1 --batch 64 --snap 10 --batch-gpu 16 --syn_layers 10

Results in the error:

Setting up augmentation... Distributing across 1 GPUs... Setting up training phases... Traceback (most recent call last): File "/home/HDD/ml/stylegan_xl/train.py", line 362, in <module> main() # pylint: disable=no-value-for-parameter File "/home/HDD/anaconda3/envs/sgxl/lib/python3.9/site-packages/click/core.py", line 1128, in __call__ return self.main(*args, **kwargs) File "/home/HDD/anaconda3/envs/sgxl/lib/python3.9/site-packages/click/core.py", line 1053, in main rv = self.invoke(ctx) File "/home/HDD/anaconda3/envs/sgxl/lib/python3.9/site-packages/click/core.py", line 1395, in invoke return ctx.invoke(self.callback, **ctx.params) File "/home/HDD/anaconda3/envs/sgxl/lib/python3.9/site-packages/click/core.py", line 754, in invoke return __callback(*args, **kwargs) File "/home/HDD/ml/stylegan_xl/train.py", line 347, in main launch_training(c=c, desc=desc, outdir=opts.outdir, dry_run=opts.dry_run) File "/home/HDD/ml/stylegan_xl/train.py", line 104, in launch_training subprocess_fn(rank=0, c=c, temp_dir=temp_dir) File "/home/HDD/ml/stylegan_xl/train.py", line 49, in subprocess_fn training_loop.training_loop(rank=rank, **c) File "/home/HDD/ml/stylegan_xl/training/training_loop.py", line 232, in training_loop loss = dnnlib.util.construct_class_by_name(device=device, G=G, G_ema=G_ema, D=D, augment_pipe=augment_pipe, **loss_kwargs) # subclass of training.loss.Loss File "/home/HDD/ml/stylegan_xl/dnnlib/util.py", line 303, in construct_class_by_name return call_func_by_name(*args, func_name=class_name, **kwargs) File "/home/HDD/ml/stylegan_xl/dnnlib/util.py", line 296, in call_func_by_name func_obj = get_obj_by_name(func_name) File "/home/HDD/ml/stylegan_xl/dnnlib/util.py", line 289, in get_obj_by_name module, obj_name = get_module_from_obj_name(name) File "/home/HDD/ml/stylegan_xl/dnnlib/util.py", line 269, in get_module_from_obj_name get_obj_from_module(module, local_obj_name) # may raise AttributeError File "/home/HDD/ml/stylegan_xl/dnnlib/util.py", line 283, in get_obj_from_module obj = getattr(obj, part) AttributeError: module 'training.loss' has no attribute 'StyleGAN2Loss'

Typo in feature_networks/constants.py?

Hi, thanks for releasing the code!
shouldn't https://github.com/autonomousvision/stylegan_xl/blob/main/feature_networks/constants.py#L104 rather
be VITS = VITS_IMAGENET + VITS_INCEPTION ? because I have the assertion at https://github.com/autonomousvision/stylegan_xl/blob/main/feature_networks/pretrained_builder.py#L196 which is raised when I try to run the training.

unexpected arguments to MetricOptions constructor

metric_main.calc_metric is called with run_dir and cur_nimg which are then forwarded to metric_utils.MetricOptions which doesn't expect those arguments in constructor - running example training scripts raises unexpected argument error.
https://github.com/autonomousvision/stylegan_xl/blob/main/training/training_loop.py#L477-L478
https://github.com/autonomousvision/stylegan_xl/blob/main/metrics/metric_utils.py#L35

encoder for stylegn-xl

Dear stlegan-xl group,

Thank you for sharing this great work, i really like it.

Have you try to train an encoder for the stylegan xl model pretrained on imagenet? Maybe the psp or the e4e encoder?

Thank you for your help.

Best Wishes,

Zongze

how to prepare imagenet dataset

Hello, your project mentions the use of imagenet dataset, but I have some problems in reproducing it, because there is no usage method for imagenet dataset in datatool.
Also, I would like to know how to effectively train the sota results you mentioned in paperswithcode, do you have your training plan? For example, what settings are used for training 16, 32, 64 stages, which are usually written in the yaml file, and how much time it takes to train Imagenet with these settings, if v100/day is used
It may be a bit long.

The codes will be released in today, right? Thank you for your help.

README: wrong link to class-labels

In the section Generating Samples & Interpolations of the README, a link is wrong.

For class-conditional models, you can pass the class index via --class.
A index-to-label dictionary for Imagenet can be found here.

As you can see the last link leads to a non-existing repository.

The link should be:

https://github.com/autonomousvision/stylegan_xl/blob/main/media/imagenet_idx2labels.txt

Not an issue - question on out of the box use case

basically had this idea to use this technology to take a pdf document and upgrade it to a higher quality.
Like a picture that's bad quality - I want to boost the quality of music score.

if you see the artifacts here / blacks are bad - etc I want a neural net to reimagine the document - not merely boost contrast.

Could it just be a case of training with higher quality docs (using this library) - then sending in an image and finding the latent?

Not resize image to 224 for the ViT-based discriminator

https://github.com/autonomousvision/stylegan_xl/blob/819c225e7fd60114cde0bad79f3cb5a4ba8621cd/pg_modules/discriminator.py#L180

Thank you for the great work! I noticed that in pg_modules/discriminator.py, there is a line `bb_name += f"_{i}" that will make bb_name from "deit_base_distilled_patch16_224" to "deit_base_distilled_patch16_224_0". Then in the forward pass, the input image will not be resized to 244 for the ViT model. Not sure if that is intentional or the image should be resized to 224 (especially for higher resolution such as 512x512 or 1024x1024)

Thanks again!

What if Efficientnet-lite1 and DeiT are combined?

In Table 2 of the projected-gan paper, I noticed that Efficientnet-lite1 achieves the best FID, I wonder what would happen if Efficientnet-lite1 and DeiT are jointly used as the stylegan-xl discriminators - will it improve the performance? Thank you a lot!

Update requirements and `environment.yml`

Hi, thanks for the code!

I was unable to use the environment.yml file as the error came from the clip installation via pip. I fixed this by changing it to clip-by-openai, and then everything was installed correctly, assuming this is indeed the required clip. However, the Requirements are in conflict with the environment.yml file (the versions of Python and PyTorch), so I assume the latter is the correct one. Any clarification here will be greatly appreciated.

Replicate video demo results: transition from Chimpanzees to Human face

Hello,

Thank you for sharing this great work!

I am particularly wondering how to replicate the video demo shown in the twitter (https://twitter.com/autovisiongroup/status/1522533826324578304?s=21&t=6c_FXyqAbXGUecPlDZMQVQ). Specifically, between 0:11 and 0:16, the video shows a transition from Chimpanzees to Human face.

I realize the code to generate a video has already been released, but to generate such transition, I assume we need a GAN that trained on both Chimpanzees and Human face. I guess Chimpanzees is from Imagenet and Humanface is from FFHQ. Do we have such a GAN model that can generate FFHQ face and Imagenet face simultaneously?

Thank you very much, and looking forward to hearing from you.

Extra ')' in feature_networks/pretrained_builder.py

On line 409, where it says

pretrained.CHANNELS, pretrained.RES_MULT = calc_dims(pretrained, is_vit=backbone in VITS) )

should be

pretrained.CHANNELS, pretrained.RES_MULT = calc_dims(pretrained, is_vit=backbone in VITS)

When will you release imagenet 256*256 model?

ResolvePackageNotFound on Windows

As I read, all these not found packages should go under pip section with different formatting, it is some kind of a bug, I understand (datitran/object_detector_app#41). This would be very time consuming to rename and reformat all these packages manually. Is there a way to make a requirements file for all platforms?

conda env create -f environment.yml
Collecting package metadata (repodata.json): done
Solving environment: failed

ResolvePackageNotFound:

setuptools==52.0.0=py37h06a4308_0
cudatoolkit==11.0.221=h6bb024c_0
nbconvert==6.1.0=py37h89c1867_0
zeromq==4.3.4=h2531618_0
jpeg==9b=h024ee3a_2
jupyter_core==4.7.1=py37h89c1867_0
mistune==0.8.4=py37h4abf009_1002
lz4-c==1.9.3=h2531618_0
libprotobuf==3.13.0.1=h8b12597_0
readline==8.1=h27cfd23_0
argon2-cffi==20.1.0=py37h4abf009_2
libpng==1.6.37=hbc83047_0
jedi==0.18.0=py37h89c1867_2
libstdcxx-ng==9.1.0=hdf63c60_0
yaml==0.2.5=h516909a_0
numpy==1.20.2=py37h2d18471_0
cryptography==3.1.1=py37h1ba5d50_0
jupyter_nbextensions_configurator==0.4.1=py37h89c1867_2
libuv==1.40.0=h7b6447c_0
brotlipy==0.7.0=py37h7b6447c_1000
zstd==1.4.9=haebb681_0
libsodium==1.0.18=h36c2ea0_1
freetype==2.10.4=h5ab3b9f_0
sqlite==3.35.4=hdfb4753_0
intel-openmp==2021.2.0=h06a4308_610
zlib==1.2.11=h7b6447c_3
mkl==2021.2.0=h06a4308_296
tk==8.6.10=hbc83047_0
python==3.7.10=hdb3f193_0
ipython==7.26.0=py37h6531663_0
ca-certificates==2021.10.8=ha878542_0
watchdog==0.10.4=py37h89c1867_0
pytorch==1.7.1=py3.7_cuda11.0.221_cudnn8.0.5_0
libtiff==4.1.0=h2733197_1
ninja==1.10.2=hff7bd54_1
mpi4py==3.0.3=py37hd955b32_1
lcms2==2.12=h3be6417_0
matplotlib-base==3.3.1=py37h817c723_0
pandas==1.1.3=py37he6710b0_0
libgfortran4==7.5.0=ha8ba4b0_17
pandoc==2.14.1=h7f98852_0
mkl-service==2.3.0=py37h27cfd23_1
libgcc-ng==9.1.0=hdf63c60_0
xz==5.2.5=h7b6447c_0
shortuuid==1.0.1=py37h89c1867_4
pip==21.1.2=py37h06a4308_0
ld_impl_linux-64==2.33.1=h53a641e_7
mkl_random==1.2.1=py37ha9443f7_2
pyyaml==5.3.1=py37hb5d75c8_1
lxml==4.6.3=py37h9120a33_0
openmpi==4.0.2=hb1b8bf9_1
cffi==1.14.3=py37he30daa8_0
libgfortran-ng==7.5.0=ha8ba4b0_17
libxml2==2.9.10=hb55368b_3
certifi==2021.10.8=py37h89c1867_1
numpy-base==1.20.2=py37hfae3a4d_0
ncurses==6.2=he6710b0_1
openssl==1.1.1n=h7f8727e_0
promise==2.3=py37h89c1867_4
tornado==6.1=py37h4abf009_0
pillow==8.2.0=py37he98fc37_0
jupyter_highlight_selected_word==0.2.0=py37h89c1867_1002
libxslt==1.1.34=hc22bd24_0
kiwisolver==1.3.1=py37hc928c03_0
libffi==3.3=he6710b0_2
icu==58.2=hf484d3e_1000
mkl_fft==1.3.0=py37h42c9631_2

How to train next super-resolution stage

Hi,

I might be a bit dense but I'm not sure how to train the second super-resolution stage once I have completed training the first super-resolution stage. Taking the pokemon dataset example from the README.md if I start with:

python train.py --outdir=./training-runs/pokemon --cfg=stylegan3-t --data=./data/pokemon16.zip \
    --gpus=8 --batch=64 --mirror=1 --snap 10 --batch-gpu 8 --kimg 10000 --syn_layers 10

And then train

python train.py --outdir=./training-runs/pokemon --cfg=stylegan3-t --data=./data/pokemon32.zip \
  --gpus=8 --batch=64 --mirror=1 --snap 10 --batch-gpu 8 --kimg 10000 --syn_layers 10 \
  --superres --up_factor 2 --head_layers 7 \
  --path_stem training-runs/pokemon/00000-stylegan3-t-pokemon16-gpus8-batch64/best_model.pkl

Now how do I train the 64x64 model from the 32x32 model? Is the next path stem training-runs/pokemon/00001-stylegan3-t-pokemon32-gpus8-batch64/best_model.pkl from the 32x32 model or is it from the original 16x16 model? Is the next syn_layers parameter a 10 or a 7?

Thanks in advance.

Discriminator pre-train models missing parameters

I am using both the generator and discriminator. Generator works well but the discriminator doesn't.
Here is part of my code:

////////////load D and G
with dnnlib.util.open_url(network_pkl) as f:
D = legacy.load_network_pkl(f)['D']
D = D.eval().requires_grad_(False).to(device)

with dnnlib.util.open_url(network_pkl) as f:
G = legacy.load_network_pkl(f)['G_ema']
G = G.eval().requires_grad_(False).to(device)

img = G.synthesis(w, update_emas=False)
D(img, torch.empty([1, G.c_dim], device=device))
////////////////

Error happened in the last line of code. I tried different pre-trained models, some time the error is missing AttributeError: 'Mlp' object has no attribute 'drop1', sometimes is missing "mean" and "std" for normalization layer.

Here is one error message:

/home/kaixuan/anaconda3/envs/GAN/lib/python3.9/site-packages/torch/functional.py:568: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:2228.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
Traceback (most recent call last):
File "/home/kaixuan/Projects/MyStyleGAN_XL/D_run_tmp.py", line 136, in
generate_images() # pylint: disable=no-value-for-parameter
File "/home/kaixuan/anaconda3/envs/GAN/lib/python3.9/site-packages/click/core.py", line 1130, in call
return self.main(*args, **kwargs)
File "/home/kaixuan/anaconda3/envs/GAN/lib/python3.9/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/home/kaixuan/anaconda3/envs/GAN/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/kaixuan/anaconda3/envs/GAN/lib/python3.9/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/home/kaixuan/anaconda3/envs/GAN/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/home/kaixuan/Projects/MyStyleGAN_XL/D_run_tmp.py", line 129, in generate_images
D(img1, torch.empty([1, G.c_dim], device=device))
File "/home/kaixuan/anaconda3/envs/GAN/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/home/kaixuan/Projects/MyStyleGAN_XL/pg_modules/discriminator.py", line 213, in forward
features = feat(x_n)
File "/home/kaixuan/anaconda3/envs/GAN/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/home/kaixuan/Projects/MyStyleGAN_XL/pg_modules/projector.py", line 114, in forward
out0, out1, out2, out3 = forward_vit(self.pretrained, x)
File "/home/kaixuan/Projects/MyStyleGAN_XL/feature_networks/vit.py", line 59, in forward_vit
_ = pretrained.model.forward_flex(x)
File "/home/kaixuan/Projects/MyStyleGAN_XL/feature_networks/vit.py", line 149, in forward_flex
x = blk(x)
File "/home/kaixuan/anaconda3/envs/GAN/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/home/kaixuan/anaconda3/envs/GAN/lib/python3.9/site-packages/timm/models/vision_transformer.py", line 230, in forward
x = x + self.drop_path(self.mlp(self.norm2(x)))
File "/home/kaixuan/anaconda3/envs/GAN/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/home/kaixuan/anaconda3/envs/GAN/lib/python3.9/site-packages/timm/models/layers/mlp.py", line 28, in forward
x = self.drop1(x)
File "/home/kaixuan/anaconda3/envs/GAN/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1185, in getattr
raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'Mlp' object has no attribute 'drop1'

NotImplementedError on initial stem training

Hello,
coming from Projected GAN I wanted to test Stylegan-XL. Upon installing/updating all dependencies and wanting to train a model from scratch on my windows PC, I am getting the following Error with the following command:
python train.py --outdir=training-runs --cfg=stylegan3-t --data=T:\Set_sg_xl_v1.zip --gpus=1 --mirror=1 --batch=64 --snap=4 --stem --syn_layers=10

Last output of cmd shell:

Constructing networks...
loaded imagenet embeddings from in_embeddings/tf_efficientnet_lite0.pkl: Embedding(1000, 320)
Traceback (most recent call last):
  File "I:\Github\stylegan_xl\train.py", line 362, in <module>
    main()  # pylint: disable=no-value-for-parameter
  File "C:\Users\[username]\AppData\Local\Programs\Python\Python39\lib\site-packages\click\core.py", line 1128, in __call__
    return self.main(*args, **kwargs)
  File "C:\Users\[username]\AppData\Local\Programs\Python\Python39\lib\site-packages\click\core.py", line 1053, in main
    rv = self.invoke(ctx)
  File "C:\Users\[username]\AppData\Local\Programs\Python\Python39\lib\site-packages\click\core.py", line 1395, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "C:\Users\[username]\AppData\Local\Programs\Python\Python39\lib\site-packages\click\core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "I:\Github\stylegan_xl\train.py", line 347, in main
    launch_training(c=c, desc=desc, outdir=opts.outdir, dry_run=opts.dry_run)
  File "I:\Github\stylegan_xl\train.py", line 104, in launch_training
    subprocess_fn(rank=0, c=c, temp_dir=temp_dir)
  File "I:\Github\stylegan_xl\train.py", line 49, in subprocess_fn
    training_loop.training_loop(rank=rank, **c)
  File "I:\Github\stylegan_xl\training\training_loop.py", line 171, in training_loop
    D = dnnlib.util.construct_class_by_name(**D_kwargs, **common_kwargs).train().requires_grad_(False).to(device) # subclass of torch.nn.Module
  File "I:\Github\stylegan_xl\dnnlib\util.py", line 303, in construct_class_by_name
    return call_func_by_name(*args, func_name=class_name, **kwargs)
  File "I:\Github\stylegan_xl\dnnlib\util.py", line 298, in call_func_by_name
    return func_obj(*args, **kwargs)
  File "I:\Github\stylegan_xl\pg_modules\discriminator.py", line 172, in __init__
    feat = F_RandomProj(bb_name, **backbone_kwargs)
  File "I:\Github\stylegan_xl\pg_modules\projector.py", line 109, in __init__
    self.normstats = get_backbone_normstats(backbone)
  File "I:\Github\stylegan_xl\pg_modules\projector.py", line 33, in get_backbone_normstats
    raise NotImplementedError

All dependencies as listed in the environment.yml are installed.
Additionally:

Python 3.9.10
PyTorch 1.10.2+cu113 with matching CUDA Toolkit (Was working 100% fine with projected_gan!)
Compiler is the previously working setup from ProjectedGAN with Visual Studio compilers

My extremely limited python understanding tells me that somehow the backbone variable in projector.py line 109 does not get populated somehow?

how to prepare imagenet dataset

hello, thank you for your great work.
I can't find the usage of preparing the imagenet dataset in dataset_tool.py
now, if i have a imagenet zip, how can i prepare this data.
thank you for your help

Training at 16x16

When starting a new training with 16x16 stylegan3-t it now errors out, however older trainings that are further along run fine. It seems to only be happening on the initial training without using a stem.

python train.py --outdir=/content/drive/MyDrive/results --cfg=stylegan3-t --data=/content/drive/MyDrive/datasets/16
--gpus=1 --batch=32 --mirror=1 --snap 10 --batch-gpu 8 --kimg 280 --syn_layers 10 --metrics=none

Constructing networks...
Traceback (most recent call last):
File "train.py", line 337, in
main() # pylint: disable=no-value-for-parameter
File "/usr/local/lib/python3.7/dist-packages/click/core.py", line 829, in call
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.7/dist-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.7/dist-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "train.py", line 322, in main
launch_training(c=c, desc=desc, outdir=opts.outdir, dry_run=opts.dry_run)
File "train.py", line 104, in launch_training
subprocess_fn(rank=0, c=c, temp_dir=temp_dir)
File "train.py", line 49, in subprocess_fn
training_loop.training_loop(rank=rank, **c)
File "/content/drive/MyDrive/colab-sg3XL/stylegan_xl/training/training_loop.py", line 170, in training_loop
G = dnnlib.util.construct_class_by_name(**G_kwargs, **common_kwargs).train().requires_grad_(False).to(device) # subclass of torch.nn.Module
File "/content/drive/MyDrive/colab-sg3XL/stylegan_xl/dnnlib/util.py", line 303, in construct_class_by_name
return call_func_by_name(*args, func_name=class_name, **kwargs)
File "/content/drive/MyDrive/colab-sg3XL/stylegan_xl/dnnlib/util.py", line 298, in call_func_by_name
return func_obj(*args, **kwargs)
File "/content/drive/MyDrive/colab-sg3XL/stylegan_xl/torch_utils/persistence.py", line 104, in init
super().init(*args, **kwargs)
File "/content/drive/MyDrive/colab-sg3XL/stylegan_xl/training/networks_stylegan3_resetting.py", line 583, in init
self.synthesis = SynthesisNetwork(w_dim=w_dim, img_resolution=img_resolution, img_channels=img_channels, **synthesis_kwargs)
File "/content/drive/MyDrive/colab-sg3XL/stylegan_xl/torch_utils/persistence.py", line 104, in init
super().init(*args, **kwargs)
File "/content/drive/MyDrive/colab-sg3XL/stylegan_xl/training/networks_stylegan3_resetting.py", line 495, in init
self.conv_kernel = layer_kwargs['conv_kernel']
KeyError: 'conv_kernel'

Question about normalization

Hi,
I notice that the normalization of input images is different from traditional normalization way ( x = (x-mean)/std. ):

def norm_with_stats(x, stats):
    x_ch0 = torch.unsqueeze(x[:, 0], 1) * (0.5 / stats['mean'][0]) + (0.5 - stats['std'][0]) / stats['mean'][0]
    x_ch1 = torch.unsqueeze(x[:, 1], 1) * (0.5 / stats['mean'][1]) + (0.5 - stats['std'][1]) / stats['mean'][1]
    x_ch2 = torch.unsqueeze(x[:, 2], 1) * (0.5 / stats['mean'][2]) + (0.5 - stats['std'][2]) / stats['mean'][2]
    x = torch.cat((x_ch0, x_ch1, x_ch2), 1)
    return x

Is there any paper or previous work introduces this kind of normalization method?

How to interpolate between imagenet classes?

It is suggested that this is possible. How would I be able to replicate this?

How can I change the output of the network?

Thank you for your excellent work. I am trying to get my hands on StyleganXL and I am trying to get the intermediate features of the pretrained StyleganXL. However, I am a little confused about the loading procedure and the @persistence decorator.

As I understand, the source code saved in pickle will be used when loading a persistence class. I can refer to the following code to get the pretrained intermediate feature. I should first load the pretrained StyleganXL. Then define a new network which modified the source code and outputs features.

       with open('old_pickle.pkl', 'rb') as f:
            old_net = pickle.load(f)
        new_net = MyNetwork(*old_obj.init_args, **old_obj.init_kwargs)
        misc.copy_params_and_buffers(old_net, new_net, require_all=True)

However, I noticed in networks_stylegan3.py, initializing the Generator relys on in_embeddings/tf_efficientnet_lite0.pkl. What does this pkl do? And how can I implement the above function?

autonomousvision / stylegan_xl Goto Github PK

stylegan_xl's Introduction

Related Projects

Requirements

Data Preparation

Training

Training the stem

Training the super-resolution stages

Training recommendations for datasets other than ImageNet

Generating Samples & Interpolations

Image Inversion

Image Editing

Pretrained Models

Quality Metrics

Further Information

stylegan_xl's People

Contributors

Stargazers

Watchers

Forkers

stylegan_xl's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs