GithubHelp home page GithubHelp logo

dorarad / gansformer Goto Github PK

View Code? Open in Web Editor NEW
1.3K 38.0 146.0 742 KB

Generative Adversarial Transformers

License: MIT License

Python 93.59% Cuda 5.31% C++ 1.10%
transformers gans generative-adversarial-networks image-generation scene-generation compositionality attention

gansformer's Introduction

Python 3.7 PyTorch 1.8 TensorFlow 1.14 cuDNN 7.3.1 License CC BY-NC

GANformer: Generative Adversarial Transformers

Drew A. Hudson & C. Lawrence Zitnick

Check out our new PyTorch version and the GANformer2 paper!

Update (Feb 21, 2022): We updated the weight initialization of the PyTorch version to the intended scale, leading to a substantial improvement in the model's learning speed!

This is an implementation of the GANformer model, a novel and efficient type of transformer, explored for the task of image generation. The network employs a bipartite structure that enables long-range interactions across the image, while maintaining computation of linearly efficiency, that can readily scale to high-resolution synthesis. The model iteratively propagates information from a set of latent variables to the evolving visual features and vice versa, to support the refinement of each in light of the other and encourage the emergence of compositional representations of objects and scenes. In contrast to the classic transformer architecture, it utilizes multiplicative integration that allows flexible region-based modulation, and can thus be seen as a generalization of the successful StyleGAN network.

1st Paper: https://arxiv.org/pdf/2103.01209
2nd Paper: https://arxiv.org/abs/2111.08960
Contact: [email protected]
Implementation: network.py (TF / Pytorch)

We now support both PyTorch and TF!

✅ Uploading initial code and readme
✅ Image sampling and visualization script
✅ Code clean-up and refacotiring, adding documentation
✅ Training and data-prepreation intructions
✅ Pretrained networks for all datasets
✅ Extra visualizations and evaluations
✅ Providing models trained for longer
✅ Releasing the PyTorch version
✅ Releasing pre-trained models for high-resolutions (up to 1024 x 1024)
⬜️ Releasing the GANformer2 model (supporting layout generation and conditional layout2image generation)

If you experience any issues or have suggestions for improvements or extensions, feel free to contact me either thourgh the issues page or at [email protected].

Bibtex

@article{hudson2021ganformer,
  title={Generative Adversarial Transformers},
  author={Hudson, Drew A and Zitnick, C. Lawrence},
  journal={Proceedings of the 38th International Conference on Machine Learning, {ICML} 2021},
  year={2021}
}

@article{hudson2021ganformer2,
  title={Compositional Transformers for Scene Generation},
  author={Hudson, Drew A and Zitnick, C. Lawrence},
  journal={Advances in Neural Information Processing Systems {NeurIPS} 2021},
  year={2021}
}

Sample Images

Using the pre-trained models (generated after training for 5-7x less steps than StyleGAN2 models! Training our models for longer will improve the image quality further):

Requirements

  • Python 3.6 or 3.7 are supported.
  • For the TF version: We recommend TensorFlow 1.14 which was used for development, but TensorFlow 1.15 is also supported.
  • For the Pytorch version: We support Pytorch >= 1.8.
  • The code was tested with CUDA 10.0 toolkit and cuDNN 7.5.
  • We have performed experiments on Titan V GPU. We assume 12GB of GPU memory (more memory can expedite training).
  • See requirements.txt (TF / Pytorch) for the required python packages and run pip install -r requirements.txt to install them.

Quickstart & Overview

Our repository supports both Tensorflow (at the main directory) and Pytorch (at pytorch_version). The two implementations follow a close code and files structure, and share the same interface. To switch from the TF to Pytorch, simply enter into pytorch_version), and install the requirements. Please feel free to open an issue or contact for any questions or suggestions about the new implementation!

A minimal example of using a pre-trained GANformer can be found at generate.py (TF / Pytorch). When executed, the 10-lines program downloads a pre-trained modle and uses it to generate some images:

python generate.py --gpus 0 --model gdrive:bedrooms-snapshot.pkl --output-dir images --images-num 32

You can use --truncation-psi to control the generated images quality/diversity trade-off.
We recommend trying out different values in the range of 0.6-1.0.

Pretrained models and High resolutions

We provide pretrained models for resolution 256×256 for all datasets, as well as 1024×1024 for FFHQ and 1024×2048 for Cityscapes.

To generate images for the high-resolution models, run the following commands: (We reduce their batch-size to 1 so that they can load onto a single GPU)

python generate.py --gpus 0 --model gdrive:ffhq-snapshot-1024.pkl --output-dir ffhq_images --images-num 32 --batch-size 1
python generate.py --gpus 0 --model gdrive:cityscapes-snapshot-2048.pkl --output-dir cityscapes_images --images-num 32 --batch-size 1 --ratio 0.5 # 1024 x 2048 cityscapes currently supported in the TF version only

We can train and evaluate new or pretrained model both quantitatively and qualitative with run_network.py (TF / Pytorch).
The model architecutre can be found at network.py (TF / Pytorch). The training procedure is implemented at training_loop.py (TF / Pytorch).

Data preparation

We explored the GANformer model on 4 datasets for images and scenes: CLEVR, LSUN-Bedrooms, Cityscapes and FFHQ. The model can be trained on other datasets as well. We trained the model on 256x256 resolution. Higher resolutions are supported too. The model will automatically adapt to the resolution of the images in the dataset.

The prepare_data.py (TF / Pytorch) can either prepare the datasets from our catalog or create new datasets.

Default Datasets

To prepare the datasets from the catalog, run the following command:

python prepare_data.py --ffhq --cityscapes --clevr --bedrooms --max-images 100000

See table below for details about the datasets in the catalog.

Useful options:

  • --data-dir the output data directory (default: datasets)
  • --shards-num to select the number of shards for the data (default: adapted to each dataset)
  • --max-images to store only a subset of the dataset, in order to reduce the size of the stored tfrecord/image files (default: max).
    This can be particularly useful to save space in case of large datasets, such as LSUN-bedrooms (originaly contains 3M images)

Custom Datasets

You can also use the script to create new custom datasets. For instance:

python prepare_data.py --task <dataset-name> --images-dir <source-dir> --format png --ratio 0.7 --shards-num 5

The script supports several formats: png, jpg, npy, hdf5, tfds and lmdb.

Dataset Catalog

Dataset # Images Resolution Download Size TFrecords Size Gamma
FFHQ 70,000 256×256 13GB 13GB 10
CLEVR 100,015 256×256 18GB 15.5GB 40
Cityscapes 24,998 256×256 1.8GB 8GB 20
LSUN-Bedrooms 3,033,042 256×256 42.8GB Up to 480GB 100

Use --max-images to reduce the size of the tfrecord files.

Training

Models are trained by using the --train option. To fine-tune a pretrained GANformer model:

python run_network.py --train --gpus 0 --ganformer-default --expname clevr-pretrained --dataset clevr \
  --pretrained-pkl gdrive:clevr-snapshot.pkl

We provide pretrained models for bedrooms, cityscapes, clevr and ffhq.

To train a GANformer in its default configuration form scratch:

python run_network.py --train --gpus 0 --ganformer-default --expname clevr-scratch --dataset clevr --eval-images-num 10000

By defualt, models training is resumed from the latest snapshot. Use --restart to strat a new experiment, or --pretrained-pkl to select a particular snapshot to load.

For comparing to state-of-the-art, we compute metric scores using 50,000 sample imaegs. To expedite training though, we recommend setting --eval-images-num to a lower number. Note though that this can impact the precision of the metrics, so we recommend using a lower value during training, and increasing it back up in the final evaluation.

We support a large variety of command-line options to adjust the model, training, and evaluation. Run python run_network.py -h for the full list of options!

we recommend exploring different values for --gamma when training on new datasets. If you train on resolution >= 512 and observe OOM issues, consider reducing --batch-gpu to a lower value.

Logging

  • During training, sample images and attention maps will be generated and stored at results/<expname>-<run-id> (--keep-samples).
  • Metrics will also be regularly commputed and reported in a metric-<name>.txt file. --metrics can be set to fid for FID, is for Inception Score and pr for Precision/Recall.
  • Tensorboard logs are also created (--summarize) that track the metrics, loss values for the generator and discriminator, and other useful statistics over the course of training.

Baseline models

The codebase suppors multiple baselines in addition to the GANformer. For instance, to run a vanilla GAN model:

python run_network.py --train --gpus 0 --baseline GAN --expname clevr-gan --dataset clevr 
  • Vanilla GAN: --baseline GAN, a standard GAN without style modulation.
  • StyleGAN2: --baseline StyleGAN2, with one global latent that modulates the image features.
  • k-GAN: --baseline kGAN, which generates multiple image layers independetly and then merge them into one shared image (supported only in the TF version).
  • SAGAN: --baseline SAGAN, which performs self-attention between all image features in low-resolution layer (e.g. 32x32) (supported only in the TF version).

Evaluation

To evalute a model, use the --eval option:

python run_network.py --eval --gpus 0 --expname clevr-exp --dataset clevr

Add --pretrained-pkl gdrive:<dataset>-snapshot.pkl to evalute a pretrained model.

Below we provide the FID-50k scores for the GANformer (using the pretrained checkpoints above) as well as baseline models.
Note that these scores are different than the scores reported in the StyleGAN2 paper since they run experiments for up to 7x more training steps (5k-15k kimg-steps in our experiments over all models, which takes about 3-4 days with 4 GPUs, vs 50-70k kimg-steps in their experiments, which take over 90 GPU-days).

Note regarding Generator/Discriminator: Following ablation experiments, we observed that incorporating the simplex and duplex attention to the generator (rather than to both the generator and discriminator) improve the models' performance. Accordingly, we are releasing pretrained models that incorporate attention to the generator only, and we have updated the paper to reflect that!

Model CLEVR LSUN-Bedroom FFHQ Cityscapes
GAN 25.02 12.16 13.18 11.57
kGAN 28.28 69.9 61.14 51.08
SAGAN 26.04 14.06 16.21 12.81
StyleGAN2 16.05 11.53 16.21 8.35
GANformer 9.24 6.15 7.42 5.23

Model Change-log

Compared to the original GANformer depicted in the paper, this repository make several additional improvments that contributed to the performance:

  • Use --mapping_ltnt2ltnt so that the latents communicate with each other directly through self-attention inside the mapping network before starting to generate the image.
  • Add an additional global latent (--style) to the k latent components, such that first the global latent modulates all the image features uniformly, and then the k latents modulate different regions based on the bipartite transformer's attention.
    The global latent is useful for coordinating holistic aspects of the image such as global lighting conditions, global style properties for e.g. faces, etc.
  • After making these changes, we observed no additional benefit from adding the transformer to the discriminator, and therefore for simplicity we disabled that.

Visualization

The code supports producing qualitative results and visualizations. For instance, to create attention maps for each layer:

python run_network.py --gpus 0 --vis --expname clevr-exp --dataset clevr --vis-layer-maps

Below you can see sample images and attention maps produced by the GANformer:

Command-line Options

In the following we list some of the most useful model options.

Training

  • --gamma: We recommend exploring different values for the chosen dataset (default: 10)
  • --truncation-psi: Controls the image quality/diversity trade-off. (default: 0.7)
  • --eval-images-num: Number of images to compute metrics over. We recommend selecting a lower number to expedite training (default: 50,000)
  • --restart: To restart training from sracth instead of resuming from the latest snapshot
  • --pretrained-pkl: To load a pretrained model, either a local one or from drive gdrive:<dataset>-snapshot.pkl for the datasets in the catalog.
  • --data-dir and --result-dir: Directory names for the datasets (tfrecords) and logging/results.

Model (most useful)

  • --transformer: To add transformer layers to the generator (GANformer)
  • --components-num: Number of latent components, which will attend to the image. We recommend values in the range of 8-16 (default: 1)
  • --latent-size: Overall latent size (default: 512). The size of each latent component will then be latent_size/components_num
  • --num-heads: Number of attention heads (default: 1)
  • --integration: Integration of information in the transformer layer, e.g. add or mul (default: mul)

Model (others)

  • --g-start-res and --g-end-res: Start and end resolution for the transformer layers (default: all layers up to resolution 28)
  • --kmeans: Track and update image-to-latents assignment centroids, used in the duplex attention
  • --mapping-ltnt2ltnt: Perform self-attention over latents in the mapping network
  • --use-pos: Use trainable positional encodings for the latents.
  • --style False: To turn-off one-vector global style modulation (StyleGAN2).

Visualization

  • Sample imaegs
    • --vis-images: Generate image samples
    • --vis-latents: Save source latent vectors
  • Attention maps
    • --vis-maps: Visualize attention maps of last layer and first head
    • --vis-layer-maps: Visualize attention maps of all layer and heads
    • --blending-alpha: Alpha weight when visualizing a bledning of images and attention maps
  • Image interpolations
    • --vis-interpolations: Generative interplations between pairs of source latents
    • --interpolation-density: Number of samples in between two end points of an interpolation (default: 8)
  • Others
    • --vis-noise-var: Create noise variation visualization
    • --vis-style-mix: Create style mixing visualization

Run python run_network.py -h for the full options list.

Sample images (more examples)




CUDA / Installation

The model relies on custom TensorFlow/Pytorch ops that are compiled on the fly using NVCC.

To set up the environment e.g. for cuda-10.0:

export PATH=/usr/local/cuda-10.0/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda10.0/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

To test that your NVCC installation is working correctly, run:

nvcc test_nvcc.cu -o test_nvcc -run
| CPU says hello.
| GPU says hello.

In the pytorch version, if you get the following repeating message:
"Failed to build CUDA kernels for upfirdn2d. Falling back to slow reference implementation"
make sure your cuda and pytorch versions match. If you have multiple CUDA installed, consider using setting CUDA_HOME to the matching one. E.g.

export CUDA_HOME=/usr/local/cuda-10.1

Architecture Overview

The GANformer consists of two networks:

Generator: which produces the images (x) given randomly sampled latents (z). The latent z has a shape [batch_size, component_num, latent_dim], where component_num = 1 by default (Vanilla GAN, StyleGAN) but is > 1 for the GANformer model. We can define the latent components by splitting z along the second dimension to obtain z_1,...,z_k latent components. The generator likewise consists of two parts:

  • Mapping network: converts sampled latents from a normal distribution (z) to the intermediate space (w). A series of Feed-forward layers. The k latent components either are mapped independently from the z space to the w space or interact with each other through self-attention (optional flag).
  • Synthesis network: the intermediate latents w are used to guide the generation of new images. Images features begin from a small constant/sampled grid of 4x4, and then go through multiple layers of convolution and up-sampling until reaching the desirable resolution (e.g. 256x256). After each convolution, the image features are modulated (meaning that their variance and bias are controlled) by the intermediate latent vectors w. While in the StyleGAN model there is one global w vectors that controls all the features equally. The GANformer uses attention so that the k latent components specialize to control different regions in the image to create it cooperatively, and therefore perform better especially in generating images depicting multi-object scenes.
  • Attention can be used in several ways
    • Simplex Attention: when attention is applied in one direction only from the latents to the image features (top-down).
    • Duplex Attention: when attention is applied in the two directions: latents to image features (top-down) and then image features back to latents (bottom-up), so that each representation informs the other iteratively.
    • Self Attention between latents: can also be used so to each direct interactions between the latents.
    • Self Attention between image features (SAGAN model): prior approaches used attention directly between the image features, but this method does not scale well due to the quadratic number of features which becomes very high for high-resolutions.

Discriminator: Receives and image and has to predict whether it is real or fake – originating from the dataset or the generator. The model perform multiple layers of convolution and downsampling on the image, reducing the representation's resolution gradually until making final prediction. Optionally, attention can be incorporated into the discriminator as well where it has multiple (k) aggregator variables, that use attention to adaptively collect information from the image while being processed. We observe small improvements in model performance when attention is used in the discriminator, although note that most of the gain in using attention based on our observations arises from the generator.

Codebase

This codebase builds on top of and extends the great StyleGAN2 and StyleGAN2-ADA repositories by Karras et al.

The GANformer model can also be seen as a generalization of StyleGAN: while StyleGAN has one global latent vector that control the style of all image features globally, the GANformer has k latent vectors, that cooperate through attention to control regions within the image, and thereby better modeling images of multi-object and compositional scenes.

Acknowledgement

I wish to thank Christopher D. Manning for the fruitful discussions and constructive feedback in developing the Bipartite Transformer, especially when explored within the language representation area, as well as for providing the kind financial support that allowed this work to happen! 🌻

If you have questions, comments or feedback, please feel free to contact me at [email protected], Thank you! :)

gansformer's People

Contributors

andy666fox avatar dorarad avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gansformer's Issues

Ganformer2

Thanks for your brilliant work of ganformer and ganformer2! May I ask is there a roughly timeline to when the ganformer2 model would be release? Thanks for your time!

PyTorch implementation generates same image samples

Hi, I'm getting the same output image samples (see below) when I train the PyTorch implementation on FFHQ from scratch. The only changes I made (due to some memory issues mentioned in #33) were adding --batch-gpu 1 and removing saving attention map functionality (commenting out pytorch_version/training/visualize.py lines 167-206).

python run_network.py --train --gpus 0 --batch-gpu 1 --ganformer-default --expname ffhq-scratch --dataset ffhq
000120
000240

kernel error in generate.py

In a python 3.7, tensorflow-gpu=1.15.0 cuda 10.0 and cuddn 7.5 I get this error in generate.py (which appeared to require cuddn 7.6.5, which brings a different error (see second part). Any advice?

... Could not load dynamic library 'libcudnn.so.7'; dlerror: libcudnn.so.7: cannot open shared object file

...........
Total 35894608

Generate images...
0%| | 0/8 [00:01<?, ?image (1 batches of 8 images)/s]
Traceback (most recent call last):
File "/vulcanscratch/yaser/miniconda3/envs/yygentransformer/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call
return fn(*args)
File "/vulcanscratch/yaser/miniconda3/envs/yygentransformer/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1350, in _run_fn
target_list, run_metadata)
File "/vulcanscratch/yaser/miniconda3/envs/yygentransformer/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1443, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: No OpKernel was registered to support Op 'FusedBiasAct' used by {{node Gs/_Run/Gs/G_mapping/AttLayer_0/FusedBiasAct}}with these attrs: [gain=1, T=DT_FLOAT, axis=1, alpha=0, grad=0, act=1]
Registered devices: [CPU, XLA_CPU, XLA_GPU]
Registered kernels:
device='GPU'; T in [DT_HALF]
device='GPU'; T in [DT_FLOAT]

     [[Gs/_Run/Gs/G_mapping/AttLayer_0/FusedBiasAct]]

CUDNN7.6.5 error
....
Total 35894608

Generate images...
0%| | 0/8 [00:01<?, ?image (1 batches of 8 images)/s]
Traceback (most recent call last):
File "/vulcanscratch/yaser/miniconda3/envs/yygentransformer/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call
return fn(*args)
File "/vulcanscratch/yaser/miniconda3/envs/yygentransformer/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1350, in _run_fn
target_list, run_metadata)
File "/vulcanscratch/yaser/miniconda3/envs/yygentransformer/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1443, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found.
(0) Internal: cudaErrorNoKernelImageForDevice
[[{{node Gs/_Run/Gs/G_mapping/global/Dense0_0/FusedBiasAct}}]]
[[Gs/_Run/Gs/maps_out/_3151]]
(1) Internal: cudaErrorNoKernelImageForDevice
[[{{node Gs/_Run/Gs/G_mapping/global/Dense0_0/FusedBiasAct}}]]
0 successful operations.
0 derived errors ignored.

`maps_in` in the graph?

Hi,

I'm trying to extract some intermediate values from the model. The graph requires 3 inputs, as listed by Gs.list_layers() and Gs.input_names:

('latents_in', <tf.Tensor 'Gs/latents_in:0' shape=(?, 17, 32) dtype=float32>, [])
('labels_in', <tf.Tensor 'Gs/labels_in:0' shape=(?, 0) dtype=float32>, [])
('maps_in', <tf.Tensor 'Gs/maps_in:0' shape=(?, 16, 256, 256) dtype=float32>, [])

latents_in is the sampled z vector, and it appears labels_in is not required (the last dimension is 0). However, I'm not able to understand what maps_in is. How can I get maps_in?

FID VQ-GAN

Thank you for open-sourcing your code :)

I was wondering about the generally very high FID values for the VQGAN. In the VQGAN paper, they report on, e.g., FFHQ 256x256 an FID of 11.4, whereas you report 63.1... Any idea why they are so different?

Thanks!

Training wont work, needs tensor.contrib which was removed in tf version 1.14

When running: python3 run_network.py --train --ganformer-default --expname test --dataset plant --eval-images-num 10000
The following error appears:

I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-10-11 14:56:30.661744: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0.
2022-10-11 14:56:30.690985: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2022-10-11 14:56:31.202500: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-11.7/lib64
2022-10-11 14:56:31.202557: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-11.7/lib64
2022-10-11 14:56:31.202565: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
Traceback (most recent call last):
File "/home/ali/gansformer/run_network.py", line 15, in
import pretrained_networks
File "/home/ali/gansformer/pretrained_networks.py", line 4, in
import dnnlib.tflib as tflib
File "/home/ali/gansformer/dnnlib/tflib/init.py", line 1, in
from . import autosummary
File "/home/ali/gansformer/dnnlib/tflib/autosummary.py", line 23, in
from . import tfutil
File "/home/ali/gansformer/dnnlib/tflib/tfutil.py", line 9, in
import tensorflow.contrib # requires TensorFlow 1.x!
ModuleNotFoundError: No module named 'tensorflow.contrib'

AssertionError with prepare_data.py

When using prepare_data.py to create a custom dataset for training, I keep encountering an AssertionError. I've looked at the code, but I'm not sure what exactly is causing this shape mismatch. I've also tried different environments to hopefully rule out anything related to that aspect.

Environments:

  • Official Tensorflow Docker: tensorflow/tensorflow:1.14.0-gpu-py3
  • Official Tensorflow Docker: tensorflow/tensorflow:1.15.5-gpu-py3-jupyter
  • Anaconda: CUDA10, Python 3.7.11, cudnn 7.6.5, TF GPU 1.14

Data:

  • COVIDx: 194,922 PNG images all preprocessed to 512x512

Command:
python prepare_data.py --task covidx --images-dir /data/2A_images --format png --ratio 0.7 --shards-num 20 --max-images 194922

Error:

Preparing the covidx dataset...
Loading images from /data/2A_images
  8%|██▋                               | 15340/194922 [09:52<1:55:31, 25.91it/s]
Traceback (most recent call last):
  File "prepare_data.py", line 217, in <module>
    run_cmdline(sys.argv)
  File "prepare_data.py", line 214, in run_cmdline
    prepare(**vars(args))
  File "prepare_data.py", line 185, in prepare
    shards_num = shards_num, max_imgs = max_images)
  File "prepare_data.py", line 78, in <lambda>
    "png": lambda tfdir, imgdir, **kwargs: dataset_tool.create_from_imgs(tfdir, imgdir, format = "png", **kwargs),
  File "/home/dev/gansformer/dataset_tool.py", line 696, in create_from_imgs
    tfr.add_img(img)
  File "/home/dev/gansformer/dataset_tool.py", line 84, in add_img
    assert img.shape == self.shape
AssertionError

The process always fails at exactly 15340. I've removed image 15340 (repeatedly down the line), but the error keeps happening.

Data Generation

Hi,
I have an image classification dataset, can I use this gansformer to generate synthetic data?

Custom Dataset Error

Hi!
Thank you for the amazing work!

I encountered a problem when preparing a custom dataset from a custom directory.

  File "prepare_data.py", line 211, in <module>
    run_cmdline(sys.argv)
  File "prepare_data.py", line 208, in run_cmdline
    prepare(**vars(args))
  File "prepare_data.py", line 156, in prepare
    print(misc.bold("Preparing the {} dataset...".format(c.name)))
AttributeError: 'dict' object has no attribute 'name'

I've followed the instruction in using the following commands, replacing the default values with my own values:

python prepare_data.py --task <dataset-name> --images-dir <source-dir> --format png --ratio 0.7 --shards-num 5

However, it seems that there is no name attribute in this command and the dictionary passed by specifying a custom task name lacks some keys.

Any tips will be helpful!

Cannot utilize multiple CPU cores

Hi-

Thank you for making such a fascinating project available here!

I'm trying to run ganformer within a conda environment, but am having problems getting ganformer to utilize multiple CPU cores.

Using Ubuntu 20.04. Here is the setup for the conda environment used:

conda create --name cuda10 python=3.7
conda activate cuda10
conda install tensorflow-gpu=1.14
conda install pillow h5py requests tqdm termcolor seaborn
pip install opencv-python lmdb gdown easydict

To run it

python gansformer/run_network.py --train --pretrained-pkl None --gpus 0,1 --ganformer-default --expname myDS_256 --dataset myDS --data-dir /data/myDS_256_tf --keep-samples --metrics none --result-dir training_runs/256_c1/ --num-threads 24 --minibatch-size 16

Everything seems to be running correctly, there are no errors or crashes. The only problem is slow training initialization and low GPU utilization during training. System Monitor shows that only one CPU core is used at a time, so I'm guessing this is the cause of both issues. Do you have any ideas of what might be causing the restriction to a single CPU core?

I always try to avoid raising an issue when something obvious might be wrong on my end, but this is my first time using conda so it might be that I'm simply using it incorrectly, or that I'm using your program incorrectly. I appreciate your patience if that is the case.

Thank you for your attention to this issue!

Two typos in pytorch_version/training/loss.py

                loss_D_real = 0
                if D_main:
                    if self.d_loss == "logistic":
                        loss_D_real = torch.nn.functional.softplus(-real_logits) # -log(sigmoid(real_logits))
                    elif self.d_loss == "hinge":
                        loss_D_real = torch.clamp(1.0 - real_logits, min = 0)
                    elif self.d_loss == "wgan":
                        loss_D_real = -real_logits + tf.square(real_logits) * wgan_epsilon

                    training_stats.report("Loss/D/loss", loss_D_gen + loss_D_real)

In line 142 of loss.py, it should be loss_D_real = -real_logits + torch.square(real_logits) * self.wgan_epsilon instead.

CUDA_ERROR_OUT_OF_MEMORY

I always get out of memory errors even when using all defaults and training low resolution.
8 * V100 16GB

Data generation

Hi,

I have instance segmentation dataset for street view. Can I use this project to synthesize novel data using the annotated data?

Regards,
Rakesh

The explicit form of duplex attention

According to the pinned issue, the explicit form of duplex attention is:

K = Attention( K, X, X ) or LayerNorm(K+Attention( K, X, X ))
X = gamma( Attention( X, K, V ) ) * w(X) + beta( Attention( X, K, V ) )
where Y=( K, V).

Am I right?

Seems dlatents_in variable in G_synthesis are not updated correctly?

Hi, Thank you for sharing the code!

I am reading the code in network.py recently. It seems the dlatents_in variables for different layers with different resolutions are not updated correctly.

In line 1199, it shows the dlatents_in is a [Bs, latents_num, num_layers, dlatent_size] tensor, where latents_num = k local region latent component + 1 global latent component.

dlatents_in.set_shape([None, latents_num, num_layers, dlatent_size])

In line 1420~1422, the dlatents in the scale of "4x4" is updated.

gansformer/training/network.py

Lines 1420 to 1422 in 556bbdd

with tf.variable_scope("Conv"):
x, dlatents, att_map, att_vars = layer(x, dlatents, layer_idx = 0, dim = nf(1),
kernel = 3, att_vars = att_vars)

Since the dlatents is initialized as None at the beginning (line 1394 ), the global and local latent codes are extracted by using the layer_idx in the layer function. (See line 1256, and line 1260~1261)

imgs_out, dlatents, att_maps = None, None, []

dlatent_global = get_global(dlatents_in, res)[:, layer_idx + 1]

gansformer/training/network.py

Lines 1260 to 1261 in 556bbdd

if dlatents is None:
dlatents = dlatents_in[:, :-1, layer_idx + 1]

However, for the resolution 8x8, the updated dlatents variable from 4x4 is fed into the block, and then fed into two layer structures. Since the dlatents is not None, the k local region latent code will not be extracted by using the layer_idx for the 8x8 layers. It seems the updated dlatents variable from "4x4" layer will be consistently injected and updated by stacking block structures. For other different scales of layers, the other dlatents variables in the dlatent_in tensor will not be extracted and updated.

gansformer/training/network.py

Lines 1429 to 1433 in 556bbdd

for res in range(3, resolution_log2 + 1):
with tf.variable_scope("%dx%d" % (2**res, 2**res)):
# Generator block: transformer, convolution and upsampling
x, dlatents, _att_maps, att_vars = block(x, res, dlatents, dim = nf(res-1), att_vars = att_vars)
att_maps += _att_maps

gansformer/training/network.py

Lines 1337 to 1345 in 556bbdd

def block(x, res, dlatents, dim, att_vars, up = True): # res = 3..resolution_log2
t = x
with tf.variable_scope("Conv0_up"):
x, dlatents, att_map1, att_vars = layer(x, dlatents, layer_idx = res*2-5,
dim = dim, kernel = 3, up = up, att_vars = att_vars)
with tf.variable_scope("Conv1"):
x, dlatents, att_map2, att_vars = layer(x, dlatents, layer_idx = res*2-4,
dim = dim, kernel = 3, att_vars = att_vars)

Is that a bug or is that a special setting in your model? Sorry for writing such a long question.

Thanks a lot! :)

Memory issue when training 1024 resolution

I'm trying to train a 1024x1024 database on a V100 GPU.
I tried both the tensorflow version and the pytorch version.
Despite setting batch-gpu to 1, the tensorflow version always run out of system RAM(after the first tick, system ram total 51gb), and
the pytorch version alway run out of cuda memory(before the first tick).

Here are my training settings:

python run_network.py --train --metrics 'none' --gpus 0 --batch-gpu 1 --resolution 1024 \
 --ganformer-default --expname art1 --dataset 1024art

Also, I always encounter the warning:
tcmalloc: large alloc

CLEVR pretrained model gives FID 22

Hi, kudos for great work!

I've just noticed that with recommended preprocessing and evaluation, the metrics on gdrive:cityscapes work as expected (FID ~5.2), while for CLEVR exactly two same lines:

python prepare_data.py --clevr --max-images 100000
python run_network.py --eval --gpus 0 --expname clevr-exp --dataset clevr --pretrained-pkl gdrive:clevr-snapshot.pkl

give ~22 FID, not 9.2. Can you please double-check if the provided snapshot is correct? Or am I missing smth here?

Thanks in advance!

KMeans for Duplex

Thank you again for sharing the code!
If I understand correctly, when we use KMeans, k(Y) is deprecated. Another set of latent codes K is exploited to explicitly form spatial aware attention between image X and latent Y. I have some questions with regards to intuition as well as implementation:

  1. In Sec 3.1.2. A notation of K = a(Y, X) is used. I am super confused about this line.

However, from line https://github.com/dorarad/gansformer/blob/main/training/network.py#L688. The Y is not used to compute centroid. Actually, if I understand correctly, the centroid is computed by K and the HxW image features.

  1. A very detailed question: why do you concat _queries and queries - _queries in line?

    from_elements = tf.concat([_queries, queries - _queries], axis = -1)

  2. Seems positional encoding for latent is never used in k-means + simplex setting? The gradient only flows to g_mapping

Thank you again and looking forward to your reply!

data resolution

Hi, thanks for ur repo. Does it support the rectangular images for training, for example, a 1024x512 image?

Setting up TensorFlow plugin 'fused_bias_act.cu': Loading... Failed!

Hi Drew, I'm getting the following error both when I train a GANformer model on the clevr dataset from scratch or when I fine-tune a pretrained model. I didn't have this issue before the repo was updated with PyTorch implementation. I've also tried this and this without luck. Do you have any ideas?

Environment:
Python 3.6.13
tensorflow-gpu 1.14.0
CUDA 9.1
cudnn 7

Start model training from scratch
Local submit - run_dir: results/clevr-scratch-000
dnnlib: Running training.training_loop.training_loop() on localhost...
Streaming data using training.dataset.TFRecordDataset datasets...
Dataset shape:  [3, 256, 256]
Dynamic range:  [0, 255]
Constructing networks...
Setting up TensorFlow plugin 'fused_bias_act.cu': Loading... Failed!
Traceback (most recent call last):
  File "run_network.py", line 556, in <module>
    main()
  File "run_network.py", line 553, in main
    run(**vars(args))
  File "run_network.py", line 368, in run
    dnnlib.submit_run(**kwargs)
  File "/datadrive/kwhuang/gansformer/dnnlib/submission/submit.py", line 346, in submit_run
    return farm.submit(submit_config, host_run_dir)
  File "/datadrive/kwhuang/gansformer/dnnlib/submission/internal/local.py", line 16, in submit
    return run_wrapper(submit_config)
  File "/datadrive/kwhuang/gansformer/dnnlib/submission/submit.py", line 254, in run_wrapper
    run_func_obj(**submit_config.run_func_kwargs)
  File "/datadrive/kwhuang/gansformer/training/training_loop.py", line 194, in training_loop
    label_size = dataset.label_size, **cG.args)
  File "/datadrive/kwhuang/gansformer/dnnlib/tflib/network.py", line 100, in __init__
    self._init_graph()
  File "/datadrive/kwhuang/gansformer/dnnlib/tflib/network.py", line 159, in _init_graph
    out_expr = self._build_func(*self.input_templates, **build_kwargs)
  File "/datadrive/kwhuang/gansformer/training/networks.py", line 868, in Generator
    components.synthesis = tflib.Network("G_synthesis", func_name = globals()[synthesis_func], **kwargs)
  File "/datadrive/kwhuang/gansformer/dnnlib/tflib/network.py", line 100, in __init__
    self._init_graph()
  File "/datadrive/kwhuang/gansformer/dnnlib/tflib/network.py", line 159, in _init_graph
    out_expr = self._build_func(*self.input_templates, **build_kwargs)
  File "/datadrive/kwhuang/gansformer/training/networks.py", line 1423, in G_synthesis
    kernel = 3, att_vars = att_vars)
  File "/datadrive/kwhuang/gansformer/training/networks.py", line 1267, in layer
    resample_kernel = resample_kernel, fused_modconv = _fused_modconv, modulate = style, noconv = noconv)
  File "/datadrive/kwhuang/gansformer/training/networks.py", line 390, in modulated_conv2d_layer
    s = dense_layer(y, dim = get_shape(x)[1], weight_var = mod_weight_var, bias_var = mod_bias_var) + 1 # [BI]
  File "/datadrive/kwhuang/gansformer/training/networks.py", line 77, in dense_layer
    x = apply_bias_act(x, act, lrmul, bias_var, name)
  File "/datadrive/kwhuang/gansformer/training/networks.py", line 85, in apply_bias_act
    return fused_bias_act(x, b = b, act = act)
  File "/datadrive/kwhuang/gansformer/dnnlib/tflib/ops/fused_bias_act.py", line 62, in fused_bias_act
    return impl_dict[impl](x=x, b=b, axis=axis, act=act, alpha=alpha, gain=gain)
  File "/datadrive/kwhuang/gansformer/dnnlib/tflib/ops/fused_bias_act.py", line 116, in _fused_bias_act_cuda
    cuda_kernel = _get_plugin().fused_bias_act
  File "/datadrive/kwhuang/gansformer/dnnlib/tflib/ops/fused_bias_act.py", line 10, in _get_plugin
    return custom_ops.get_plugin(os.path.splitext(__file__)[0] + '.cu')
  File "/datadrive/kwhuang/gansformer/dnnlib/tflib/custom_ops.py", line 156, in get_plugin
    plugin = tf.load_op_library(bin_file)
  File "/anaconda/envs/gansformer/lib/python3.6/site-packages/tensorflow/python/framework/load_library.py", line 61, in load_op_library
    lib_handle = py_tf.TF_LoadLibrary(library_filename)
tensorflow.python.framework.errors_impl.NotFoundError: /datadrive/kwhuang/gansformer/dnnlib/tflib/_cudacache/fused_bias_act_1.14_.so: undefined symbol: _ZN10tensorflow12OpDefBuilder4AttrENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE

Pytorch version

Hi,

This work is great!
I have a question that does the PyTorch version has full features (training, inference on all datasets) like the Tensorflow version?

Thanks!

Generation of attention maps problem

Hey, I've been trying to generate attention maps, but I've been stuck with some errors.

Here is the way I used your codebase after cloning it :

  • Download cityscapes dataset : 'python prepare_data.py --cityscapes --max-images 100000'
  • Download pre-trained network for this dataset (and generating some images at the same time) : 'python generate.py --gpus 0 --model gdrive:cityscapes-snapshot.pkl --output-dir images --images-num 32'

Then I wanted to generate some attention maps : 'python run_network.py --gpus 0 --vis --pretrained-pkl cityscapes-snapshot.pkl --dataset cityscapes --vis-layer-maps'

At first, the code executed correctly, but no file was saved, there was only an empty 'layer_maps' folder.
When forcing the parameter 'attention' to be True (visualise.py, l.72), I got this error :

visualize.py", line 170, in vis
pallete = np.expand_dims(misc.get_colors(k - 1), axis = [2, 3])
numpy.AxisError: axis 3 is out of bounds for array of dimension 3

I don't know if I'm not using the codebase correctly or not, as a beginner.

About the layer ordering in the figure

Hi,

Thank you for your great work.

I just want to mention the order of layers based on the code seems different than the figure that you shared.
image
Shouldnt there be modulated convolutional layer before and after attention layer?
Based on the code, a synthesizer block consists of conv0_up layer + attention + noise_adding + conv1 + attention + noise_adding + Resnet conv.

So why do you say there is a (up_sampling x 2) and 2 x resnet convolution after attention layer in that figure?

Issues with docker

Hi,

I'm trying to dockerize using this image - tensorflow/tensorflow:1.14.0-gpu-py3.

FROM tensorflow/tensorflow:1.14.0-gpu-py3

ARG USER="test"
ARG WORK_DIR="/home/$USER"

WORKDIR $WORK_DIR

RUN apt-get update && apt-get install build-essential

RUN apt-get install ffmpeg libsm6 libxext6  -y

RUN pip install --upgrade pip setuptools wheel

COPY . ./

RUN pip install -r requirements.txt

RUN python generate.py --gpus 0 --model gdrive:bedrooms-snapshot.pkl --output-dir images --images-num 4

However, I am getting this error:

Downloading https://drive.google.com/uc?id=1-2L3iCBpP_cf6T2onf3zEQJFAAzxsQne .... done

2021-04-06 08:32:44 UTC -- Setting up TensorFlow plugin 'upfirdn_2d.cu': Preprocessing... Compiling... Loading... bin_file:  /home/test/dnnlib/tflib/_cudacache/upfirdn_2d_1.14_.so

2021-04-06 08:32:44 UTC -- Failed!

2021-04-06 08:32:44 UTC -- Traceback (most recent call last):

2021-04-06 08:32:44 UTC --   File "generate.py", line 49, in <module>

2021-04-06 08:32:44 UTC --     main()

2021-04-06 08:32:44 UTC --   File "generate.py", line 46, in main

2021-04-06 08:32:44 UTC --     run(**vars(args))

2021-04-06 08:32:44 UTC --   File "generate.py", line 22, in run

2021-04-06 08:32:44 UTC --     G, D, Gs = load_networks(model)                             # Load pre-trained network

2021-04-06 08:32:44 UTC --   File "/home/test/pretrained_networks.py", line 30, in load_networks

2021-04-06 08:32:44 UTC --     G, D, Gs = pickle.load(stream, encoding = "latin1")[:3]

2021-04-06 08:32:44 UTC --   File "/home/test/dnnlib/tflib/network.py", line 306, in __setstate__

2021-04-06 08:32:44 UTC --     self._init_graph()

2021-04-06 08:32:44 UTC --   File "/home/test/dnnlib/tflib/network.py", line 159, in _init_graph

2021-04-06 08:32:44 UTC --     out_expr = self._build_func(*self.input_templates, **build_kwargs)

2021-04-06 08:32:44 UTC --   File "<string>", line 2371, in G_synthesis_stylegan2

2021-04-06 08:32:44 UTC --   File "/home/test/dnnlib/tflib/ops/upfirdn_2d.py", line 229, in downsample_2d

2021-04-06 08:32:44 UTC --     return _simple_upfirdn_2d(x, k, down=factor, pad0=(p+1)//2, pad1=p//2, data_format=data_format, impl=impl)

2021-04-06 08:32:44 UTC --   File "/home/test/dnnlib/tflib/ops/upfirdn_2d.py", line 358, in _simple_upfirdn_2d

2021-04-06 08:32:44 UTC --     y = upfirdn_2d(y, k, upx=up, upy=up, downx=down, downy=down, padx0=pad0, padx1=pad1, pady0=pad0, pady1=pad1, impl=impl)

2021-04-06 08:32:44 UTC --   File "/home/test/dnnlib/tflib/ops/upfirdn_2d.py", line 61, in upfirdn_2d

2021-04-06 08:32:44 UTC --     return impl_dict[impl](x=x, k=k, upx=upx, upy=upy, downx=downx, downy=downy, padx0=padx0, padx1=padx1, pady0=pady0, pady1=pady1)

2021-04-06 08:32:44 UTC --   File "/home/test/dnnlib/tflib/ops/upfirdn_2d.py", line 139, in _upfirdn_2d_cuda

2021-04-06 08:32:44 UTC --     return func(x)

2021-04-06 08:32:44 UTC --   File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/custom_gradient.py", line 162, in decorated

2021-04-06 08:32:44 UTC --     return _graph_mode_decorator(f, *args, **kwargs)

2021-04-06 08:32:44 UTC --   File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/custom_gradient.py", line 183, in _graph_mode_decorator

2021-04-06 08:32:44 UTC --     result, grad_fn = f(*args)

2021-04-06 08:32:44 UTC --   File "/home/test/dnnlib/tflib/ops/upfirdn_2d.py", line 131, in func

2021-04-06 08:32:44 UTC --     y = _get_plugin().up_fir_dn2d(x=x, k=kc, upx=upx, upy=upy, downx=downx, downy=downy, padx0=padx0, padx1=padx1, pady0=pady0, pady1=pady1)

2021-04-06 08:32:44 UTC --   File "/home/test/dnnlib/tflib/ops/upfirdn_2d.py", line 14, in _get_plugin

2021-04-06 08:32:44 UTC --     return custom_ops.get_plugin(os.path.splitext(__file__)[0] + '.cu')

2021-04-06 08:32:44 UTC --   File "/home/test/dnnlib/tflib/custom_ops.py", line 162, in get_plugin

2021-04-06 08:32:44 UTC --     plugin = tf.load_op_library(bin_file)

2021-04-06 08:32:44 UTC --   File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/load_library.py", line 61, in load_op_library

2021-04-06 08:32:44 UTC --     lib_handle = py_tf.TF_LoadLibrary(library_filename)

2021-04-06 08:32:44 UTC -- tensorflow.python.framework.errors_impl.NotFoundError: /home/test/dnnlib/tflib/_cudacache/upfirdn_2d_1.14_.so: undefined symbol: _ZN10tensorflow12OpDefBuilder4AttrENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE

2021-04-06 08:32:44 UTC -- error building image: error building stage: failed to execute command: waiting for process to exit: exit status 1

Please help to check and advise. Thanks!

Can I use FFHQ 1024 pre-trained model with PyTorch?

I executed PyTorch_version/loader.py with ffhq-snapshot.pkl and it worked well.

However, I didn't work with ffhq-snapshot-1024.pkl.

How can I resolve this issue?

Error messages are below.

Loading ffhq-snapshot-1024.pkl...
synthesis.b1024.conv_last.weight [32, 32, 3, 3]
Traceback (most recent call last):
  File "loader.py", line 324, in <module>
    convert_network_pickle()
  File "/home/ubuntu/anaconda3/envs/pytorch1.7.1_p37/lib/python3.7/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/home/ubuntu/anaconda3/envs/pytorch1.7.1_p37/lib/python3.7/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/home/ubuntu/anaconda3/envs/pytorch1.7.1_p37/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/ubuntu/anaconda3/envs/pytorch1.7.1_p37/lib/python3.7/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "loader.py", line 317, in convert_network_pickle
    data = load_network_pkl(f)
  File "loader.py", line 39, in load_network_pkl
    G = convert_tf_generator(tf_G)
  File "loader.py", line 233, in convert_tf_generator
    r".*\.grid_pos",                                    None,
  File "loader.py", line 83, in _populate_module_params
    assert found
AssertionError

Thank you in advance.

Pretrained Stylegan v2 Model on clevr and cityscapes dataset

Hi Drew,
I tried reaching out via email.
Is it possible to release the StyleGAN v2 pre-trained weights for clver and cityscapes datasets?
You use these models as your baseline for the GANsformer model but I had trouble replicating StyleGAN v2 fid scored on these models

Thanks,
Krish

Do we still need bidirectional interaction attention?

Hi Drew, Thank you a lot for sharing the code!

I notice that all checkpoints you released don't have AttLayer_n2l. So they are simplex + kmeans models?
I also notice that number reported in github has already a bit better than paper. Does it mean we can safely ignore bidirectional attention for now? Does it further help and boost cityscpaes FID better than 5.23?

Conditional training

Hi, I want to give a condition when training a model, but I don't know where to start.

How Can I Solve this issue?

question on duplex attention (k means) code

First, thank you for this amazing work!

I am suspecting that an indentation is missing at the following position of the code:

# Compute attention scores based on dot products between

The reason why it raises my suspicion is that, if the code is executed as it is, it seems like the actual key values (to_tensor) are never involved in the computation of the attention scores when k means is enabled. If I am mistaken, would you mind explain why line 787 replaces the original attention scores with the values computed here (where the embedding "to_centroids" seems to be initialized to be a mapping of the queries)?

Metrics PR Error

Dear authors,

Thank you for your wonderful contribution!!!

When I tried to get precision and recall values during training by adding option, --metric pr, I got the following error


\precision_recall.py", line 179, in _evaluate
feats = self._gen_feats(Gs, inception, minibatch_size, num_gpus, Gs_kwargs)
NameError: name 'inception' is not defined

So, I have changed the lines in precision_recall.py. After the modification, It seems to work.
I would greatly appreciate it if you kindly review my modification.


def _evaluate(self, Gs, Gs_kwargs, num_gpus, num_imgs, paths = None, **kwargs):

       if paths is not None: 
           # Extract features for local sample image files (paths)
----->  eval_features = self._paths_to_feats(paths, feat_func, minibatch_size, num_gpus, num_imgs)
       else:
           # Extract features for newly generated fake imgs
----->  eval_features = self._gen_feats(Gs, feature_net, minibatch_size, num_imgs, num_gpus, Gs_kwargs)

       # Compute precision and recall
       state = knn_precision_recall_features(ref_features = ref_features, eval_features = eval_features,
           feature_net = feature_net, nhood_sizes = [self.nhood_size], row_batch_size = self.row_batch_size,
----->  col_batch_size = self.row_batch_size, num_gpus = num_gpus, num_imgs = num_imgs)
       self._report_result(state.knn_precision[0], suffix = "_precision")
       self._report_result(state.knn_recall[0], suffix = "_recall")

-------------------------------------------------------------------------

Ganformer2

Hi,
Are you planning to release the ganformer2 anytime soon? Btw the article is very interesting and i would love to test it!

Errors when running generate.py

Hi,

Thank you very much for your interesting work and for making your code public.

I ran generate.py but it resulted in errors related to dnnlib as shown below. I am using TF 1.14, cuda 10.0, cudnn 7.6.1, gcc 7.5.0.
Here are the things I have tried:

  1. I looked at your response to issue #5 and verified that nvcc test_nvcc.cu generates the expected response (Hello CPU and Hello GPU)
  2. Also I ran the fused_bias_act cmd you have mentioned in issue #5 and it completes without errors and generates several fused_bias_act* files.
  3. I have also run the original TF stylegan2 code (training and generation) on the same machine with the same version of TF and cudnn without any issues.

I would appreciate any help you can provide that can help fix the errors below.

Thank you

python generate.py --gpus 0 --model gdrive:bedrooms-snapshot.pkl --output-dir images --images-num 32

Loading networks...
Setting up TensorFlow plugin 'upfirdn_2d.cu': Preprocessing... Compiling... Loading... Failed!
Traceback (most recent call last):
File "generate.py", line 49, in
main()
File "generate.py", line 46, in main
run(**vars(args))
File "generate.py", line 22, in run
G, D, Gs = load_networks(model) # Load pre-trained network
File "/external_code/gan/gansformer/pretrained_networks.py", line 30, in load_networks
G, D, Gs = pickle.load(stream, encoding = "latin1")[:3]
File "/external_code/gan/gansformer/dnnlib/tflib/network.py", line 306, in setstate
self._init_graph()
File "/external_code/gan/gansformer/dnnlib/tflib/network.py", line 159, in _init_graph
out_expr = self._build_func(*self.input_templates, **build_kwargs)
File "", line 2371, in G_synthesis_stylegan2
File "/external_code/gan/gansformer/dnnlib/tflib/ops/upfirdn_2d.py", line 229, in downsample_2d
return _simple_upfirdn_2d(x, k, down=factor, pad0=(p+1)//2, pad1=p//2, data_format=data_format, impl=impl)
File "/external_code/gan/gansformer/dnnlib/tflib/ops/upfirdn_2d.py", line 358, in _simple_upfirdn_2d
y = upfirdn_2d(y, k, upx=up, upy=up, downx=down, downy=down, padx0=pad0, padx1=pad1, pady0=pad0, pady1=pad1, impl=impl)
File "/external_code/gan/gansformer/dnnlib/tflib/ops/upfirdn_2d.py", line 61, in upfirdn_2d
return impl_dict[impl](x=x, k=k, upx=upx, upy=upy, downx=downx, downy=downy, padx0=padx0, padx1=padx1, pady0=pady0, pady1=pady1)
File "/external_code/gan/gansformer/dnnlib/tflib/ops/upfirdn_2d.py", line 139, in _upfirdn_2d_cuda
return func(x)
File "/platforms/lib/python3.6/site-packages/tensorflow/python/ops/custom_gradient.py", line 162, in decorated
return _graph_mode_decorator(f, *args, **kwargs)
File "/platforms/lib/python3.6/site-packages/tensorflow/python/ops/custom_gradient.py", line 183, in _graph_mode_decorator
result, grad_fn = f(*args)
File "/external_code/gan/gansformer/dnnlib/tflib/ops/upfirdn_2d.py", line 131, in func
y = _get_plugin().up_fir_dn2d(x=x, k=kc, upx=upx, upy=upy, downx=downx, downy=downy, padx0=padx0, padx1=padx1, pady0=pady0, pady1=pady1)
File "/external_code/gan/gansformer/dnnlib/tflib/ops/upfirdn_2d.py", line 14, in _get_plugin
return custom_ops.get_plugin(os.path.splitext(file)[0] + '.cu')
File "/external_code/gan/gansformer/dnnlib/tflib/custom_ops.py", line 156, in get_plugin
plugin = tf.load_op_library(bin_file)
File "/platforms/lib/python3.6/site-packages/tensorflow/python/framework/load_library.py", line 61, in load_op_library
lib_handle = py_tf.TF_LoadLibrary(library_filename)
tensorflow.python.framework.errors_impl.NotFoundError: /external_code/gan/gansformer/dnnlib/tflib/cudacache/upfirdn_2d_1.14.so: undefined symbol: _ZN10tensorflow12OpDefBuilder4AttrENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE

About the Duplex attention

Hi, Thanks for sharing the code!

I have a few questions about Section 3.1.2. Duplex attention.

  1. I am confused by the notation in the section. For example, in this section, "Y=(K^{P\times d}, V^{P\times d}), where the values store the content of the Y variables (e.g. the randomly sampled latents for the case of GAN)". Does it mean that V^{P\times d} is sampled from the original variable Y? how to set the number of P in your code?

  2. "keys track the centroids of the attention-based assignments from X to Y, which can be computed as K=a_b(Y, X)", does it mean K is calculated by using the self-attention module but with (Y, X) as input? If so, how to understand “the keys track the centroid of the attention-based assignments from X to Y”? BTW, how to get the centroids?

  3. For the update rule in duplex attention, what does the a() function mean? Does it denote a self-attention module like a_b() in Section 3.1.1, where X as query, K as keys, and V as values, if so, K is calculated from another self-attention module as mentioned in question 2, so the output of a_b(Y, X) will be treated as Keys, so the update rule contains two self-attention operations? is that right? Does it mean ’Duplex‘ attention?

  4. But finally I find I may be wrong when I read the last paragraph in this section. As mentioned in this section, "to support bidirectional interaction between elements, we can chain two reciprocal simplex attentions from X to Y and from Y to X, obtaining the duplex attention" So, does it mean, first, we calculate the Y by using a simplex attention module u^a(Y, X), and then use this Y as input of u^d(X, Y) to update X? Does it mean the duplex attention module contains three self-attention operations?

Thanks a lot! :)

GANformer2?

Is GANformer2 code currently included in this repro? I can't seem to find anything on the two stage method anywhere in the code. If not included at the moment, is there a timeline or estimation to when it'd be release?

Can gansformer be used as an image-to-image translation model?

Hi! I notice that in the paper, gansformer is compared with many image-to-image translation models such as SPADE. I find that in the pytorch-version code, the model can be feed by a condition information, I guess this is something like CGAN because it asserts len(self.label_shape) == 1. But can this condition be like a semantic mask? If so this model can be used as a image-to-image translation?
Thanks for your help :)

Hosting models on Hugging Face

Hello! Thank you for open-sourcing this work, this is amazing 😊 I was wondering if you'd be interested in mirroring the pretrained model weights over on the Hugging Face model hub. I'm sure our community would love to see your work, and (among other things) hosting checkpoints on the Hub helps a lot with discoverability. We've got a guide here on how to upload models, but I'm also happy to help out with it if you'd like!

Something error in fused_bias_act

Thanks again to open your code.
But I have error in my environment and I encountered this error. I'm not good at TF.

image

How Can I solve this??
My TF version is 1.14, CUDA is 11.2 , CUDNN is 10.0.

Do you have any plans to export a pytorch version?

Hi, I am not too familiar with tensorflow...
If there are no such plans currently, do you have quick pointers to:

  1. the GANsformer model, especially where and how you deal with the latents (based on your paper, you split the latents?)
  2. what kind of optimizers are you using? and how do you implemented it? Is it similar to what we did in NLP (warmup, etc);
  3. did you ever tried using the standard feedforward after your duplex attention layer instead of 3x3? Did it still work?

Thanks again for your kind attention!
Best,

Some Errors On Training

Thank you for your great work. I appreciate it a lot.

I just tried to train a model with your codes, however there are lots of undefined variables used. For example:

batch_size = get_shape(maps_in)[0]

It throw out undefined variable error for 'maps_in'. When I fix that with a constant, I get another error from

if gen_mod != "non" or gen_cond:

again gen_mod and gen_cond are not defined. When I fix that with a constant again, I get another error which says:

gansformer-main/gansformer-main/training/network.py", line 1127, in G_synthesis
grid_poses = get_positional_embeddings(resolution_log2, pos_dim or dlatent_size, pos_type, pos_directions_num, init = pos_init, **_kwargs)
TypeError: get_positional_embeddings() got an unexpected keyword argument 'label_size'

Am i missing something or is there a problem?

Projecting to the latent space experiment

Dear authors,

Thank you sharing code for the amazing work!

I wonder if you have tried finding a set of latent codes for a given image (i.e. latent space projection)?

Would be great to see how it should work!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.