GithubHelp home page GithubHelp logo

boschresearch / oasis Goto Github PK

View Code? Open in Web Editor NEW
322.0 13.0 55.0 13.06 MB

Official implementation of the paper "You Only Need Adversarial Supervision for Semantic Image Synthesis" (ICLR 2021)

License: GNU Affero General Public License v3.0

Python 99.23% Shell 0.77%
semantic-image-synthesis gan image-to-image-translation computer-vision multi-modal generative-adversarial-networks deep-learning pytorch image-generation label-to-image-translation

oasis's Introduction

You Only Need Adversarial Supervision for Semantic Image Synthesis

Official PyTorch implementation of the ICLR 2021 paper "You Only Need Adversarial Supervision for Semantic Image Synthesis". An extended version of the paper has been published in IJCV. The code allows the users to reproduce and extend the results reported in the study. Please cite the paper when reporting, reproducing or extending the results.

[ICLR website] [IJCV website] [Arxiv] [5min Video Summary]

Overview

This repository implements the OASIS model, which generates realistic looking images from semantic label maps. In addition, many different images can be generated from any given label map by simply resampling a noise vector (first two rows of the figure below). The model also allows to just resample parts of the image (see the last two rows of the figure below). Check out the paper for details, as well as the appendix, which contains many additional examples.

Setup

First, clone this repository:

git clone https://github.com/boschresearch/OASIS.git
cd OASIS

The code is tested for Python 3.7.6 and the packages listed in oasis.yml. The basic requirements are PyTorch and Torchvision. The easiest way to get going is to install the oasis conda environment via

conda env create --file oasis.yml
source activate oasis

Datasets

For COCO-Stuff, Cityscapes or ADE20K, please follow the instructions for the dataset preparation as outlined in https://github.com/NVlabs/SPADE.

Training the model

To train the model, execute the training scripts in the scripts folder. In these scripts you first need to specify the path to the data folder. Via the --name parameter the experiment can be given a unique identifier. The experimental results are then saved in the folder ./checkpoints, where a new folder for each run is created with the specified experiment name. You can also specify another folder for the checkpoints using the --checkpoints_dir parameter. If you want to continue training, start the respective script with the --continue_train flag. Have a look at config.py for other options you can specify.
Training on 4 NVIDIA Tesla V100 (32GB) is recommended. Tip: For significantly faster training, set the num_workers parameter of the dataloader to a higher number, e.g. 8 (the default is 0).

Testing the model

To test a trained model, execute the testing scripts in the scripts folder. The --name parameter should correspond to the experiment name that you want to test, and the --checkpoints_dir should the folder where the experiment is saved (default: ./checkpoints). These scripts will generate images from a pretrained model in ./results/name/.

Measuring FID

The FID is computed on the fly during training, using the popular PyTorch FID implementation from https://github.com/mseitzer/pytorch-fid. At the beginning of training, the inception moments of the real images are computed before the actual training loop starts. How frequently the FID should be evaluated is controlled via the parameter --freq_fid, which is set to 5000 steps by default. The inception net that is used for FID computation automatically downloads a pre-trained inception net checkpoint. If that automatic download fails, for instance because your server has restricted internet access, get the checkpoint named pt_inception-2015-12-05-6726825d.pth from here and place it in /utils/fid_folder/. In this case, do not forget to replace load_state_dict_from_url function accordingly.

Pretrained models

The checkpoints for the pre-trained models are available here as zip files. Copy them into the checkpoints folder (the default is ./checkpoints, create it if it doesn't yet exist) and unzip them. The folder structure should be

checkpoints_dir
├── oasis_ade20k_pretrained                   
├── oasis_cityscapes_pretrained  
└── oasis_coco_pretrained

You can generate images with a pre-trained checkpoint via test.py. Using the example of ADE20K:

python test.py --dataset_mode ade20k --name oasis_ade20k_pretrained \
--dataroot path_to/ADEChallenge2016

This script will create a folder named ./results in which the resulting images are saved.

If you want to continue training from this checkpoint, use train.py with the same --name parameter and add --continue_train --which_iter best.

Additional information

Poster

Video Summary

video summary

Citation

If you use this work please cite

@inproceedings{
sch{\"o}nfeld2021you,
title={You Only Need Adversarial Supervision for Semantic Image Synthesis},
author={Edgar Sch{\"o}nfeld and Vadim Sushko and Dan Zhang and Juergen Gall and Bernt Schiele and Anna Khoreva},
booktitle={International Conference on Learning Representations},
year={2021},
url={https://openreview.net/forum?id=yvQKLaqNE6M}
}

License

This project is open-sourced under the AGPL-3.0 license. See the LICENSE file for details.

For a list of other open source components included in this project, see the file 3rd-party-licenses.txt.

Purpose of the project

This software is a research prototype, solely developed for and published as part of the publication cited above. It will neither be maintained nor monitored in any way.

Contact

Please feel free to open an issue or contact us personally if you have questions, need help, or need explanations. Write to one of the following email addresses, and maybe put one other in the cc:

[email protected]
[email protected]
[email protected]
[email protected]

oasis's People

Contributors

edgarschnfld avatar sushkovadim avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

oasis's Issues

Integrate OASIS in gan-compression

@edgarschnfld @SushkoVadim Hi there,
OASIS seems to have great potential,

thought it still seems that it can achieve greater efficiency by integrating it into the Gan-Compression workflow,
which theoretically could reduce the parameters by 4 times, and reduce the MACs by 9 times perhaps.
They already have an implementation of SPADE (GauGAN) which you can build on top of that.

Waiting for your reply

ADE20K-outdoor

Hi @SushkoVadim @edgarschnfld ,

Thanks for the excellent work. Starting from SPADE, I noted that a series of papers provided the benchmark results on the ADE20K-outdoor dataset, a subset of ADE20K that only contains outdoor scenes, used in Qi et al. In their repo, I only found the 1,035 ADE20K-outdoor test labels their provided 'result.zip' file.

Could you please share how I can find the list/data of ADE20K-outdoor and how to train the model (train: full, test: subset; Or: train: subset, test: subset) to make a fair comparison? I ask this question only because I wish to test my model on ADE20K-outdoor.

Thank you very much for any help you may provide.

Replicate lpips results from paper

Hi, thanks for the great work! Could you please specify which lpips model and version you used to replicate LPIPS results you report in the paper ?

how to add the local noise?

when i reproduce the local noise, i cant find the code ,so how should i add the local noise in the segment image

beta2 value reasoning

Hey!

Could you explain the reasoning behind beta2 being equal to 0.999 instead of 0.9 (like in SPADE)? Didn't find any mention of this neither in the paper nor in code

Any idea on why OASIS discriminator can encourage repetitive patterns?

While doing my experiments I've replaced SPADE's discriminator with OASIS one and tried this with VGG being both disabled and enabled. In any case images seems to be visually better (even though FID is higher on CityScapes) but what is very noticeable (unfortunately I can't share images) is that OASIS discriminator for some reason encourages repetitive patterns in result images, specifically in broad semantic regions like road. Do you have any idea what could cause that given that the only change in my model I did is replacing MultiscaleDiscriminator with OASISDiscriminator?

I'm sorry for a vague question but I just wonder if there any obvious things I can try to fight this. I also use LabelMix loss with Lambda = 10.

Train the model with custom dataset

Hi,

I really don`t know what exactly I need to change in the model in order to run the training.
It will be very helpful if you can tell me how to prepare the dataset for the model.
I wonder how the semantic label should look like?
Let's say we have 3 classes (pedestrian, cow, sheep) and what the target and label folder looks like.
This is how I prepared the dataset (example)
Input (note mask can be in any color other than red, white for example)

0002
Target


Your help is highly appreciated.

number of iterations

hi, i wanted to know how many iterations does each epoch have. can u help me pls?

thanks

Create label Map

How to create a label map so that it can be given as input to the pre-trained model along with the original image.

The purpose of collecting running stats when updating EMA before FID computation, image or network saving

Hi @SushkoVadim @edgarschnfld,

Thanks for the excellent work. I noted that when updating EMA, you collect the running stats for BatchNorm before FID computation, image or network saving (see https://github.com/boschresearch/OASIS/blob/master/utils/utils.py#L133).

May I ask about the purpose and intuition of this operation? How significant would it affect the model performance? Thank you in advance.

Question on SpectralNorm (Discriminator)

Hello @edgarschnfld, @SushkoVadim,

thank you for sharing your work! I was wondering if you have explored the importance of the spectral norm in the discriminator in more detail. I also noticed that you applied the spectral norm to every layer except the last convolution, see https://github.com/boschresearch/OASIS/blob/master/models/discriminator.py#L23 Is this a specific design decision?

It would be great if you could share some insights here.
Thanks,

Nikolai

RuntimeError: Creating MTGP constants failed

Hi, I am trying to implement this repo.
I've downloaded the ade20k checkpoints and created a conda env following your yaml file.

When I run the testing command python test.py --name oasis_ade20k --dataset_mode ade20k --gpu_ids 0 \ azureuser@ivan-fantasia-default --dataroot test_images --batch_size 1 I get the following error:

/opt/conda/conda-bld/pytorch_1544176307774/work/aten/src/THC/THCTensorScatterGather.cu:176: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [232,0,0], thread: [101,0,0] Assertion `indexValue >= 0 && indexValue < tensor.sizes[dim]` failed.
Traceback (most recent call last):
  File "test.py", line 25, in <module>
    generated = model(None, label, "generate", None)
  File "/anaconda/envs/oasis/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/anaconda/envs/oasis/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 141, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "/anaconda/envs/oasis/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/azureuser/IM/OASIS/models/models.py", line 72, in forward
    fake = self.netEMA(label)
  File "/anaconda/envs/oasis/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/azureuser/IM/OASIS/models/generator.py", line 36, in forward
    z = torch.randn(seg.size(0), self.opt.z_dim, dtype=torch.float32, device=dev)
RuntimeError: Creating MTGP constants failed. at /opt/conda/conda-bld/pytorch_1544176307774/work/aten/src/THC/THCTensorRandom.cu:35

I am running on test_image folder which are some ade20k images.

Any suggestion?
Thanks ;)

Generated images are very similar

Hi,

Appreciate your work very much! But when I run the test script multiple times, the model always generates very similar images under the same label map. Can I know why, and how to obtain diverse images from a label map?

Identify trainable layers

Hi I am trying to apply StyleGAN-NADA Clip method onto OASIS as suggested by the author, can you help me to identify which layer of OASIS is trainable?

Reproduce of mIOU on ADE20K

Hello!
Thank you for the avaliable code!
I'm trying to reproduce the results for ADE20K. For FID, I use the pretrained model and get a similar results in the paper.
However, for mIOU, I used upernet101 on the previously generated data and get 42.76 mIOU, which has a little gap with that reported the paper. I think this is caused by the different testing configuration. I want to make sure wheter you use the following settings when testing.
`DATASET:
root_dataset: "./results/oasis_ade20k_pretrained/best/"
list_train: "./data/training.odgt"
list_val: "./data/validation_oasis.odgt"
num_class: 150
imgSizes: (256)
imgMaxSize: 1000
padding_constant: 32
segm_downsampling_rate: 4
random_flip: True

MODEL:
arch_encoder: "resnet101"
arch_decoder: "upernet"
fc_dim: 2048

TRAIN:
batch_size_per_gpu: 2
num_epoch: 40
start_epoch: 0
epoch_iters: 5000
optim: "SGD"
lr_encoder: 0.02
lr_decoder: 0.02
lr_pow: 0.9
beta1: 0.9
weight_decay: 1e-4
deep_sup_scale: 0.4
fix_bn: False
workers: 16
disp_iter: 20
seed: 304

VAL:
visualize: False
checkpoint: "epoch_50.pth"

TEST:
checkpoint: "epoch_50.pth"
result: "./"

DIR: "ckpt/ade20k-resnet101-upernet"
`
The content of the file oasis.odgt is as follow:
{'fpath_img': 'image/ADE_val_00000502.png', 'fpath_segm': 'label/ADE_val_00000502.png', 'width': 256, 'height': 256}

Thank you!

Reproduce miou of cityscapes. checkpoint of drn-d-105

Thank you for your sharing. Having noticed that your DRN is at 256x512, we have questions about how did you train your DRN.
We can reproduce mIoU of SPADE with the official checkpoint of DRN-d-105 by Upsampling the generated images to 1024x2048. But can not reproduce your mIoU yet.
Can you share the checkpoints of DRN or tell us your training details?
Thank you for your patience!

noisy data set

hello dear author @edgarschnfld

I downloaded the pre-trained model and added noise to the labels that are input for the network but the test failed and the model didn't work for the noisy dataset at all.
what should I do for testing the model on noise?

Random crop for ADE and COCO are not used in training

When training the model, the dataloader_train and dataloader_val share the same option object. So opt.load_size is first assigned to 286 in dataloader_train, and then is replaced with 256 in dataloader_val.

class Ade20kDataset(torch.utils.data.Dataset):
    def __init__(self, opt, for_metrics):
        if opt.phase == "test" or for_metrics:
            opt.load_size = 256
        else:
            opt.load_size = 286

Finally, two dataloader have same opt.load_size (256), and the random crop augmentation for training are not used in this case.

# resize
new_width, new_height = (self.opt.load_size, self.opt.load_size)
image = TR.functional.resize(image, (new_width, new_height), Image.BICUBIC)
label = TR.functional.resize(label, (new_width, new_height), Image.NEAREST)
# crop
crop_x = random.randint(0, np.maximum(0, new_width -  self.opt.crop_size))
crop_y = random.randint(0, np.maximum(0, new_height - self.opt.crop_size))
image = image.crop((crop_x, crop_y, crop_x + self.opt.crop_size, crop_y + self.opt.crop_size))
label = label.crop((crop_x, crop_y, crop_x + self.opt.crop_size, crop_y + self.opt.crop_size))

Computation of the loss reweighting

Hey,
I've noticed that the weight reweighting that is used in the code doesn't match the papers equation
Indeed, here, the formula is BxHxW/(num_pixels_of_class_i*num_of_non_zeros_classes_in_batch) whereas the paper showcase the following equation: BxHxW/(num_pixels_of_class_i).
Did i get something wrong? If not can you clarify which one is the correct equation?
Thanks

Subset for ADE20K

Thank you for your work!
Could you prove a list of images and classes that you used for training on ADE20K?

How do you compute FID?

To reproduce your results, one thing I am not sure, could you please help me to make sure?

For cityscapes dataset, from https://github.com/NVlabs/SPADE, and [https://github.com/mseitzer/pytorch-fid](FID computation). Without doubt, your synthesized images are from your codes.

But how about the real images?

  1. Resize. resize real images to the same size with your synthesized (256 * 512) in nearest down sampling?
  2. what are the real images? only val image froms cityscapes (gtFine/val) or all images (gtFine/val, gtFine/test, gtFine/train)?

Thanks a lot.

Questioin on losses.py

Hello, Thank you for sharing your great work!
I noticed that on line 36 of the losses.py file,
coefficients = torch.reciprocal(class_occurence) * torch.numel(label) / (num_of_classes * label.shape[1])
it seems that label.shape[1] should be modified to label.shape[0] here so that the solution is in line with the equation in the paper (HxW), not sure if I understand it correctly? Or is there some detail I'm not noticing?
Thanks.

The label of output by the discriminator is inconsistent with the label of the dataset

Hi, I'm trying to use OASIS discriminator with my own dataset's semantic segmantation. In short, my ground truth onehot segmantation has 3 channels, and it is shaped like: channel 0: background, channel 1: human, and channel 2 for the fake pixel label as paper said. But when calculate cross entropy loss, the target label seems to have been processed into an one channel label like: 1:background, 2:human whenis_real=Trueor 0: fake when is_real=False. The model still works properly, but the predicted semantic segmentation display order is not quite correct when visualizing.
Can you please tell me if my process is correct, any suggestions would be very helpful, thanks.

Reproduce the mIoU for cityscapes

Hello,
Thank you for making the code available.
I'm trying to reproduce the mIoU for cityscapes. I want to make sure how should I test the generated images using drn --ms.
What is the resolution when you calculate the mIoU using DRN ? do you upsample the genereted results before testing mIoU or downsample the labels?

Thanks again

Replicate the pretrained model

Hi, great work!

I try to train the model with the default settings on ADE20K dataset, but find that the performance of this model is lower than the given pretrained model (FID/mIoU/acc: 29/37/80 vs. 27/45/82). Since the random seed is fixed, I'm not sure why the performance of trained model is different with the pretrained one. The following are the losses of my experiment and the pretrained one:
图片
图片
Any idea why?

EMA, 3D noise

Hi. @edgarschnfld @SushkoVadim

I am a student studying semantic image synthesis. Thank you for the great work. I have two questions about the difference between paper and code.

  1. EMA
    As you cite [Yaz et al., 2018], exponential moving average is a good technique for training GAN. However, in your code

    OASIS/utils/utils.py

    Lines 125 to 132 in 6e728ec

    def update_EMA(model, cur_iter, dataloader, opt, force_run_stats=False):
    # update weights based on new generator weights
    with torch.no_grad():
    for key in model.module.netEMA.state_dict():
    model.module.netEMA.state_dict()[key].data.copy_(
    model.module.netEMA.state_dict()[key].data * opt.EMA_decay +
    model.module.netG.state_dict()[key].data * (1 - opt.EMA_decay)
    )

    I think below code might be added
model.module.netG.state_dict()[key].data.copy_(
    model.module.netEMA.state_dict()[key].data
)

If not, netG is not trained using EMA.

Yaz, Yasin, et al. "The unusual effectiveness of averaging in GAN training." International Conference on Learning Representations. 2018.

  1. 3D noise

If I do not misunderstand your paper, the paper says that the noise of OASIS has been sampled from a 3D normal distribution. And this is one of the main differences with SPADE.
However, in your code at,

if not self.opt.no_3dnoise:
dev = seg.get_device() if self.opt.gpu_ids != "-1" else "cpu"
z = torch.randn(seg.size(0), self.opt.z_dim, dtype=torch.float32, device=dev)
z = z.view(z.size(0), self.opt.z_dim, 1, 1)
z = z.expand(z.size(0), self.opt.z_dim, seg.size(2), seg.size(3))
seg = torch.cat((z, seg), dim = 1)

Noise is not sampled from the 3D normal distribution. It was also sampled from a 1D normal distribution. Then expand it to 3D, which replicates the same vector spatial way.
In my opinion, this code should be replaced by

z = torch.randn(seg.shape, ...)

I think both two parts are pretty crucial for your paper. If there is any reason for these choices or my fault, please let me know.

Thank you.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.