Status: Archive (code is provided as-is, no updates expected)
code for the paper "Improved Techniques for Training GANs"
MNIST, SVHN, CIFAR10 experiments in the mnist_svhn_cifar10 folder
imagenet experiments in the imagenet folder
Code for the paper "Improved Techniques for Training GANs"
Home Page: https://arxiv.org/abs/1606.03498
In [1]: from model import DCGAN
Traceback (most recent call last):
File "/home/marcinic/miniconda3/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2910, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "", line 1, in
from model import DCGAN
File "/home/marcinic/hdd/projects/reinforcement/retro/improved-gan/imagenet/model.py", line 47
self.out_init_b = out_init_b
^
TabError: inconsistent use of tabs and spaces in indentation
A couple lines in imagenet/model.py use tabs instead of spaces
I don't know if this question belongs here, but I am currently making a custom tf keras gan with feature matching loss and I am struggling to understand when to use inference mode on a model, that is, making use of training layers like dropout and updating batch norm parameters. This goes both for discriminator and generator as I understand that they should be trained separately.
What is disc_param_avg used for here? Why its updates should be considered into parameters update. If gradients already be got by disc_param_updates, why couldn't we directly apply these gradients to the parameters in layers?
Thanks for your answer very much!
I am finding a difference between the loss function explained in the paper and the loss functions in the code.
For the supervised loss, in the code, I understand that minimizing loss_lab
is equivalent to making T.sum(T.exp(output_before_softmax_lab))
go to 1 and also making max D(x_lab)
equal to 1 for the correct label.
However, what I don't understand is the expression of loss_unl
. How is it equivalent to the loss function L_unsupervised
in the paper which aims to make the discriminator predict class K+1
when the data is fake and predict not K+1
when the data is unlabelled?
Edit: I accidentally clicked to submit issue before finishing writing it.
Edit: This is kind of similar to issue #14 which didn't receive any answer.
Now I am trying to regenerate the best Inception score in the paper. It seems the default parameter setup of the cifar mininatch discrimination may not be the one used in the paper. Can anyone successfully regentate the experiment result?
I run train_mnist_feature_matching.py with CPU,after 8 hours nothing output.Do you have any similar questions, Is there any mistake? thanks!
I'm hitting the error below when using using the code with a different generator and optimizer.
Python3 and tensorflow 1.3.0. with any number between 2 or 4 gpus. The error doesn't appear with tensorflow 0.12.
Any thoughts on how to fix it? Note that the layer at which it happens changes seems to change randomly, e.g. from enc_0 to enc_4
ValueError: Variable g_ae/enc_0/W/RMSProp/ does not exit, or was not created with tf.get_variable(). Did you mean to set reuse=None in VarScope?
hello, when I run the bash train_imagenet.sh, here is an issue:
TRAINING
train_imagenet.sh: line 5: 4668 Segmentation fault CUDA_VISIBLE_DEVICES=0 python train_${word}.py --dataset imagenet_train --is_train True --checkpoint_dir gan/checkpoint_${word} --image_size ${pixels} --is_crop True --sample_dir gan/samples_${word} --image_width ${pixels} --batch_size 16
I am trying to run the train_mnist_feature_matching.py code with python 3.5 but getting the error as below:
File "[path]/lib/python3.5/site-packages/nn/tf.py", line 1, in
from tensorflow import *
AttributeError: 'module' object has no attribute 'absolute_import'
Is this a bug?
13zb1hQbWVsc2S7ZTZnP2G4undNNpdh5so
Originally posted by @Esi-m3 in openai/openai-cookbook#1345 (comment)
What is f2
? I think f1
is the whole thing based on the definition in the paper. Moreover, the second one will give a empty slice, which will give a exception in tensorflow from my experience (not tried yet).
def half(tens, second):
m, n, _ = tens.get_shape()
m = int(m)
n = int(n)
return tf.slice(tens, [0, 0, second * self.batch_size], [m, n, self.batch_size])
f1 = tf.reduce_sum(half(masked, 0), 2) / tf.reduce_sum(half(mask, 0))
f2 = tf.reduce_sum(half(masked, 1), 2) / tf.reduce_sum(half(mask, 1))
Thank you for clarification.
IMO,there are tow ways to calculate Inception score of conditional GANs.
The init_params()
function seems to be buggy because it accepts an argument x_lab
but does not use it at all. Also the comment reads data based initialization. How is this data based initialization achieved? Is this input to the init_params
function intentional?
Is there a difference between the implementations of batch normalization and ADAM in the mnist_svhn_cifar10.nn
module and those in Lasagne?
the input of Inception is zero meaned(see part 5in the paper ).
but the code has a condition
assert(np.max(images[0]) > 10)
assert(np.min(images[0]) >= 0.0)
can you explain the difference?
Thanks!
Hi, I am getting this error when I run cifar_feature_matching or cifar_minibatch_discrimination but not when I run mnist. Please help.
Traceback (most recent call last):
File "train_cifar_feature_matching.py", line 51, in <module>
gen_dat = ll.get_output(gen_layers[-1])
File "/usr/local/lib/python2.7/dist-packages/lasagne/layers/helper.py", line 185, in get_output
all_outputs[layer] = layer.get_output_for(layer_inputs, **kwargs)
File "/home/bmi/Downloads/improved-gan-master/mnist_svhn_cifar10/nn.py", line 120, in get_output_for
op = T.nnet.abstract_conv.AbstractConv2d_gradInputs(imshp=self.target_shape, kshp=self.W_shape, subsample=self.stride, border_mode='half')
AttributeError: 'module' object has no attribute 'abstract_conv'
The supervisied loss function used for mnist feature matching is
loss_lab = -T.mean(l_lab) + T.mean(z_exp_lab)
Why not use softmax loss function?
The loss with regard to the unlabeled data l_unl
is defined as:
loss_unl = -0.5*T.mean(l_unl) + 0.5*T.mean(T.nnet.softplus(l_unl)) + 0.5*T.mean(T.nnet.softplus(l_gen))
But the first term contradicts with the second term, which obsess me a lot.
In train_mnist_feature_matching.py and other similar files, the generator uses nn.batch_norm, but it seems your implementation self.bn_updates = [(self.avg_batch_mean, new_m), (self.avg_batch_var, new_v)]
is not updated in the train_mnist_feature_matching.py. (I saw the init_updates really got updated in the file.) I print the avg_batch_mean values, they are always 0 and the avg_batch_var values are always 1.
That means the batch normalization in your code does not normalize inputs when testing. Is it a bug? Thanks.
Has anyone managed to reproduce the exact results for semi-supervised learning using train_cifar_feature_matching.py
? With the default hyperparameters and 4000 labeled examples, I'm overfitting and getting 32% test error after 48 epochs. Getting 0.5% training error. Paper claims to get test error of only 18.6% on this task.
Do I need to train longer (for the full 1200 epochs?), or are others having this same problem?
The Inception Score calculation has 3 mistakes.
It turns out that the 1008 size softmax output is an artifact of dimension back-compatibility with a older, Google-internal system. Newer versions of the inception model have 1001 output classes, where one is an "other" class used in training. You shouldn't need to pay any attention to the extra 8 outputs.
Fix: See link for the new inception Model.
scipy.stats.entropy
should be used.kl = part * (np.log(part) - np.log(np.expand_dims(np.mean(part, 0), 0)))
kl = np.mean(np.sum(kl, 1))
Fix: Replace the above with something along the lines of the following:
py = np.mean(part, axis=0)
l = np.mean([entropy(part[i, :], py) for i in range(part.shape[0])])
Here is the code in inception_score.py
which does this:
scores.append(np.exp(kl))
return np.mean(scores), np.std(scores)
This is clearly problematic, as can easily be seen in a very simple case with a x~Bernoulli(0.5) random variable that E[e^x] = .5(e^(0) + e^(1)) != e^(.5(0)+.5(1)) = e^[E[x]]. This can further be seen with an example w/ a uniform random variable, where the split-mean over-estimates the exponential.
import numpy as np
data = np.random.uniform(low=0., high=15., size=1000)
split_data = np.split(data, 10)
np.mean([np.exp(np.mean(x)) for x in split_data]) # 1608.25
np.exp(np.mean(data)) # 1477.25
Fix: Do not calculate the mean of the exponential of the split, and instead calculate the exponential of the mean of the KL-divergence over all 50,000 inputs.
I have a custom dataset like cifar10 dataset... How do I specify the num of classes or labels??
Though we can pass it to y_dim, but it is not being used anywhere..
Hi, could you please share your script train_imagenet.sh to launch training on ImageNet?
It is mentioned in the ImageNet README, but is not present in the repo.
Thanks!
With Tensorflow 1.6.0, the line pred = sess.run(softmax, {'ExpandDims:0': inp})
raises an error because the input to the softmax in the Inception classifier expects a batch dimension of 1:
ValueError: Cannot feed value of shape (100, 32, 32, 3) for Tensor u'ExpandDims:0', which has shape '(1, ?, ?, 3)'
I assume this didn't used to be the case? For now I've patched my local fork to hardcode bs = 1
, but I assume that's not the optimal fix here.
If I drop a breakpoint right before that line to show what I'm calling this with:
ipdb> len(images)
1000
ipdb> inp.shape
(100, 32, 32, 3)
ipdb> tf.get_default_graph().get_tensor_by_name("ExpandDims:0")
<tf.Tensor 'ExpandDims:0' shape=(1, ?, ?, 3) dtype=float32>
these lines puzzle me:
w = sess.graph.get_operation_by_name("softmax/logits/MatMul").inputs[1]
logits = tf.matmul(tf.squeeze(pool3, [1, 2]), w)
softmax = tf.nn.softmax(logits)
I'm wondering, why not just use sess.graph.get_tensor_by_name('softmax:0')
? Why bother to manually do the matrix multiplication and apply softmax? also, why not add the bias term?
on the calculation of inception score, after pool3 = sess.graph.get_tensor_by_name('pool_3:0')
, I get pool3
with shape of [?, 2048], which makes the other line tf.matmul(tf.squeeze(pool3, [1, 2]), w)
hard to understand. why do you need to squeeze pool3
?
Hi,when I run "python train_cifar_minibatch_discrimination.py", I met the problem"Theano said out of menory, allocated 5000000 bytes error",but not really, there are plenty memory left.
How can I solve this problem? Thank you.
Hi, I've been trying to reproduce a minibatch discrimination GAN for MNIST based on the paper, but I keep getting poor results. Would it be possible for "train_mnist_minibatch_discrimination.py" to be uploaded to the repo? I assume it exists, since MNIST digits generated via minibatch discrimination were shown in the paper. Thanks for your time!
Quoting Nicolas Carlini:
attack = FastGradientMethod(model, sess)
adv_1 = attack.generate_np(test_data, eps=.5)
adv_2 = attack.generate_np(test_data, eps=.2)
will result in adv_1 == adv_2, a rather unexpected result.
This is because generate_np
just stores one TensorFlow graph. It needs to have something like a dictionary mapping from argument values to graphs.
I see most code here is under the MIT License, but what is the copyright status for the paper published on Arxiv? Is it under any copyleft license?
I would love to upload and distribute it on my website, but cannot do so unless the copyright allows it.
I tried to find historical averaging implementation anywhere, but no-one seemed to imlement this.
Will it be implemented in this repo?
Hello,
I am trying to use your code to train a generator on a different dataset besides imagenet. I have prepared my data in the same format by rewriting the convert to records tool.
Because I want to generate 256 sized images, I added two new layers to generator and discriminator so that they can take correct sized inputs.
Unfortunately, my results have been very poor. Can you explain why the num_classes = 1001 in the discriminator as opposed to 1000? Imagenet has 1000 classes, not 1001 and the class indices are from 0 to 999. So what is the last one for?
Besides simply adding those new layers, is there anything else I need to do to handle this sized data and work on a different data set? I should point out my code is training, I just can't seem to get any meaninful results. I have manually verified data is loading correctly in and out of the models.
I am also using a batch size of 8 due to memory issues with 256 sized images
Thanks
Chris
What's the meaning of the parameter splits(default=10)? Can we choose an appropriate splits according to the obtained inception_score
the readme seems not very specific.What file path should I edited? Download the dataset and set the file path in the code?
Hope for help.
When running on ImageNet, I got this error.
ValueError: Variable d_h0_conv/w/Adam/ does not exist, or was not created with tf.get_variable(). Did you mean to set reuse=None in VarScope?
After searching online for a while, I found that this issue is introduced by using TensorFlow 0.12.
While using tf.variable_scope()
solve the issue on DCGAN, I found it difficult to apply the same techniques on improved-gan.
If anyone faced the same issue, can you please share with me how you solve it?
In your implementation, you choose to multiply the feature map with a 3D tensor and then derive a proxy for the closeness between samples. Have you tried anything else (2D tensor etc). If so what were the results ?
Could you please upload the trained Imagenet models?
shape = [s.value for s in shape]
File "D:\python_anaconda\lib\site-packages\tensorflow\python\framework\tensor_shape.py", line 481, in iter
raise ValueError("Cannot iterate over a shape with unknown rank.")
ValueError: Cannot iterate over a shape with unknown rank.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.