openai / iaf Goto Github PK

View Code? Open in Web Editor NEW

512.0 192.0 133.0 138 KB

Code for reproducing key results in the paper "Improving Variational Inference with Inverse Autoregressive Flow"

Home Page: https://arxiv.org/abs/1606.04934

License: MIT License

Python 100.00%

paper

iaf's Introduction

Status: Archive (code is provided as-is, no updates expected)

Improve Variational Inference with Inverse Autoregressive Flow

Code for reproducing key results in the paper Improving Variational Inference with Inverse Autoregressive Flow by Diederik P. Kingma, Tim Salimans, Rafal Jozefowicz, Xi Chen, Ilya Sutskever, and Max Welling.

Prerequisites

Make sure that recent versions installed of:
- Python (version 2.7 or higher)
- Numpy (e.g. pip install numpy)
- Theano (e.g. pip install Theano)
Set floatX = float32 in the [global] section of Theano config (usually ~/.theanorc). Alternatively you could prepend THEANO_FLAGS=floatX=float32 to the python commands below.
Clone this repository, e.g.:

git clone https://github.com/openai/iaf.git

Download the CIFAR-10 dataset (get the Python version) and create an environment variable CIFAR10_PATH that points to the subdirectory with CIFAR-10 data. For example:

export CIFAR10_PATH="$HOME/cifar-10"

Syntax of train.py

Example:

python train.py with problem=cifar10 n_z=32 n_h=64 depths=[2,2,2] margs.depth_ar=1 margs.posterior=down_iaf2_NL margs.kl_min=0.25

problem is the problem (dataset) to train on. I only tested cifar10 for this release.

n_z is the number of stochastic featuremaps in each layer.

n_h is the number of deterministic featuremaps used throughout the model.

depths is an array of integers that denotes the depths of the levels in the model. Each level is a sequence of layers. Each subsequent level operates over spatially smaller featuremaps. In case of CIFAR-10, the first level operates over 16x16 featuremaps, the second over 8x8 featuremaps, etc.

Some possible choices for margs.posterior are:

up_diag: bottom-up factorized Gaussian
up_iaf1_nl: bottom-up IAF, mean-only perturbation
up_iaf2_nl: bottom-up IAF
down_diag: top-down factorized Gaussian
down_iaf1_nl: top-down IAF, mean-only perturbation
down_iaf2_nl: top-down IAF

margs.depth_ar is the number of hidden layers within IAF, and can be any non-negative integer.

margs.kl_min: the minimum information constraint. Should be a non-negative float (where 0 is no constraint).

Results of Table 3

(3.28 bits/dim)

python train.py with problem=cifar10 n_h=160 depths=[10,10] margs.depth_ar=2 margs.posterior=down_iaf2_nl margs.prior=diag margs.kl_min=0.25

More instructions will follow.

Multi-GPU TensorFlow implementation

Prerequisites

Make sure that recent versions installed of:

Python (version 2.7 or higher)
TensorFlow
tqdm

CIFAR10_PATH environment variable should point to the dataset location.

Syntax of tf_train.py

Training script:

python tf_train.py --logdir <logdir> --hpconfig depth=1,num_blocks=20,kl_min=0.1,learning_rate=0.002,batch_size=32 --num_gpus 8 --mode train

It will run the training procedure on a given number of GPUs. Model checkpoints will be stored in <logdir>/train directory along with TensorBoard summaries that are useful for monitoring and debugging issues.

Evaluation script:

python tf_train.py --logdir <logdir> --hpconfig depth=1,num_blocks=20,kl_min=0.1,learning_rate=0.002,batch_size=32 --num_gpus 1 --mode eval_test

It will run the evaluation on the test set using a single GPU and will produce TensorBoard summary with the results and generated samples.

To start TensorBoard:

tensorboard --logdir <logdir>

For the description of hyper-parameters, take a look at get_default_hparams function in tf_train.py.

Loading from the checkpoint

The best IAF model trained on CIFAR-10 reached 3.15 bits/dim when evaluated with a single sample. With 10,000 samples, the estimation of log likelihood is 3.111 bits/dim. The checkpoint is available at link. Steps to use it:

download the file
create directory <logdir>/train/ and copy the checkpoint there
run the following command:

python tf_train.py --logdir <logdir> --hpconfig depth=1,num_blocks=20,kl_min=0.1,learning_rate=0.002,batch_size=32 --num_gpus 1 --mode eval_test

The script will run the evaluation on the test set and generate samples stored in TensorFlow events file that can be accessed using TensorBoard.

iaf's People

Contributors

Stargazers

Watchers

Forkers

gburt vseledkin medusagit sandy4321 enlighterorg qsnipp nagyistge maniacs-ops hal2001 soumakm mnlmaclean josephdhf tomasian mlzxy benjamesbabala pb-pravin yorkerlin codeaudit zdx3578 dcnhan mehramoh jianbo-lab jdc08161063 stevenlol jzhang45 jinyu0310 zhengkaifu allensmile geoffreyroeder caifazhou pukkapies ajiljalal silkspace ymcidence bkayalibay vkhokhla alexachi tspannhw jramapuram wangtingc ducta-qc ehfo0 cw-huang vpomponiu liqunchen0606 lucianoadonis guanjinning shrtckt ystallonne renyi533 joh4n jaedukseo cold-blue tenaflyyy lovetocommit craffel pyzhangbit lethienhoa ak-kat johnyjyu campbell-2589 dacson quantumgame andyshenas mu94w aravindsrinivas woodfrog dchatterjee172 alexxnica kryndex hadisalman junjuncui zhang-jian sunxh16 tomlcs ykankaya wgwangang ourobouros neverjoe tinyloop afcarl zmsunnyday edialp winwinjjiang geektrovert larsmaaloee lturing ganow will-rice wh-forker robert-giaquinto raeony christopherhesse sky-xian dpranantha bdemo 355380o726602 daiviet01 swansealeo uberstig

iaf's Issues

Possible bug in the source

On line 373 and 374 of models.py, you have "if posterior_conv3 != None: modules.append(posterior_conv4)", which looks strongly like a bug based on the surrounding context. I might be mistaken since I have only started to look at the source, but I wanted to make you aware in case it is a bug. The context of these lines is below

361:    def postup(updates, w):
362-        modules = [up_conv1,up_conv2,down_conv1,down_conv2]
363-        if downsample and downsample_type == 'conv':
364-            modules += [up_conv3,down_conv3]
365-        if prior_conv1 != None:
366-            modules.append(prior_conv1)
367-        if posterior_conv1 != None:
368-            modules.append(posterior_conv1)
369-        if posterior_conv2 != None:
370-            modules.append(posterior_conv2)
371-        if posterior_conv3 != None:
372-            modules.append(posterior_conv3)
373-        if posterior_conv3 != None:
374-            modules.append(posterior_conv4)
375-        for m in modules:
376-            updates = m.postup(updates, w)

Small bug in tf_utils/layers.py

In line 70 I think there is a small mistake, it should be:
int(input_shape[2] * strides[2]), int(input_shape[3] * strides[3])]
It's not a major bug, as it would throw an error if the output shape didn't work out correctly. I think it runs OK just because the input is square.

#8 causes NaNs almost immediately during training

If I run the TensorFlow version of this code (tf_train.py) with #8 applied, I get a NaN within the first few iterations and training stops. If I remove that change, training proceeds fine. @pukkapies were you ever able to get the model training appropriately with your changes applied? If so, what hyperparameter settings were you using?

Update to support TF1.1 +

After some fixes to the summary & split calls (i.e. they were refactored in tf1.0) I still can't get this code to work:

(.venv) ➜  iaf git:(master) ✗ CIFAR10_PATH="./CIFAR10" optirun -b primus python tf_train.py --logdir ./logs --hpconfig depth=1,num_blocks=20,kl_min=0.1,learning_rate=0.002,batch_size=32 --num_gpus 1 --mode train
                      
Traceback (most recent call last):
  File "tf_train.py", line 397, in <module>
    tf.app.run()
  File "/home/jramapuram/.venv/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 48, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "tf_train.py", line 391, in main
    run(hps)
  File "tf_train.py", line 237, in run
    model = CVAE1(hps, "train", x)
  File "tf_train.py", line 152, in __init__
    self.train_op = opt.apply_gradients(grad, global_step=self.global_step)
  File "/home/jramapuram/.venv/lib/python2.7/site-packages/tensorflow/python/training/optimizer.py", line 446, in apply_gradients
    self._create_slots([_get_variable_for(v) for v in var_list])
  File "/home/jramapuram/Dropbox/projects/iaf/tf_utils/adamax.py", line 37, in _create_slots
    self._zeros_slot(v, "m", self._name)
  File "/home/jramapuram/.venv/lib/python2.7/site-packages/tensorflow/python/training/optimizer.py", line 766, in _zeros_slot
    named_slots[_var_key(var)] = slot_creator.create_zeros_slot(var, op_name)
  File "/home/jramapuram/.venv/lib/python2.7/site-packages/tensorflow/python/training/slot_creator.py", line 174, in create_zeros_slot
    colocate_with_primary=colocate_with_primary)
  File "/home/jramapuram/.venv/lib/python2.7/site-packages/tensorflow/python/training/slot_creator.py", line 146, in create_slot_with_initializer
    dtype)
  File "/home/jramapuram/.venv/lib/python2.7/site-packages/tensorflow/python/training/slot_creator.py", line 66, in _create_slot_var
    validate_shape=validate_shape)
  File "/home/jramapuram/.venv/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 1049, in get_variable
    use_resource=use_resource, custom_getter=custom_getter)
  File "/home/jramapuram/.venv/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 948, in get_variable
    use_resource=use_resource, custom_getter=custom_getter)
  File "/home/jramapuram/.venv/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 356, in get_variable
    validate_shape=validate_shape, use_resource=use_resource)
  File "/home/jramapuram/.venv/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 341, in _true_getter
    use_resource=use_resource)
  File "/home/jramapuram/.venv/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 671, in _get_single_variable
    "VarScope?" % name)
ValueError: Variable model/model/dec_log_stdv/Adamax/ does not exist, or was not created with tf.get_variable(). Did you mean to set reuse=None in VarScope?

The same result utiziling opt = tf.train.AdamOptimizer(hps.learning_rate)

(.venv) ➜  iaf git:(master) ✗ CIFAR10_PATH="./CIFAR10" optirun -b primus python tf_train.py --logdir ./logs --hpconfig depth=1,num_blocks=20,kl_min=0.1,learning_rate=0.002,batch_size=32 --num_gpus 1 --mode train                                                                                                                                                                      
                      
Traceback (most recent call last):
  File "tf_train.py", line 397, in <module>
    tf.app.run()
  File "/home/jramapuram/.venv/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 48, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "tf_train.py", line 391, in main
    run(hps)
  File "tf_train.py", line 237, in run
    model = CVAE1(hps, "train", x)
  File "tf_train.py", line 152, in __init__
    self.train_op = opt.apply_gradients(grad, global_step=self.global_step)
  File "/home/jramapuram/.venv/lib/python2.7/site-packages/tensorflow/python/training/optimizer.py", line 446, in apply_gradients
    self._create_slots([_get_variable_for(v) for v in var_list])
  File "/home/jramapuram/.venv/lib/python2.7/site-packages/tensorflow/python/training/adam.py", line 122, in _create_slots
    self._zeros_slot(v, "m", self._name)
  File "/home/jramapuram/.venv/lib/python2.7/site-packages/tensorflow/python/training/optimizer.py", line 766, in _zeros_slot
    named_slots[_var_key(var)] = slot_creator.create_zeros_slot(var, op_name)
  File "/home/jramapuram/.venv/lib/python2.7/site-packages/tensorflow/python/training/slot_creator.py", line 174, in create_zeros_slot
    colocate_with_primary=colocate_with_primary)
  File "/home/jramapuram/.venv/lib/python2.7/site-packages/tensorflow/python/training/slot_creator.py", line 146, in create_slot_with_initializer
    dtype)
  File "/home/jramapuram/.venv/lib/python2.7/site-packages/tensorflow/python/training/slot_creator.py", line 66, in _create_slot_var
    validate_shape=validate_shape)
  File "/home/jramapuram/.venv/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 1049, in get_variable
    use_resource=use_resource, custom_getter=custom_getter)
  File "/home/jramapuram/.venv/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 948, in get_variable
    use_resource=use_resource, custom_getter=custom_getter)
  File "/home/jramapuram/.venv/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 356, in get_variable
    validate_shape=validate_shape, use_resource=use_resource)
  File "/home/jramapuram/.venv/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 341, in _true_getter
    use_resource=use_resource)
  File "/home/jramapuram/.venv/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 671, in _get_single_variable
    "VarScope?" % name)
ValueError: Variable model/model/dec_log_stdv/Adam/ does not exist, or was not created with tf.get_variable(). Did you mean to set reuse=None in VarScope?

I tried setting reuse=None to no avail. I'm probably missing something stupid here.
Here is my fork with the changes: https://github.com/jramapuram/iaf/tree/hotfix/tf1.0

RuntimeError: curand error generating random normals 102

Using gpu device 0: GeForce GTX 980 (CNMeM is enabled with initial size: 80.0% of memory, cuDNN 5105)
/home/vismod/anaconda2/envs/tensorflow/lib/python2.7/site-packages/theano/sandbox/cuda/init.py:600: UserWarning: Your cuDNN version is more recent than the one Theano officially supports. If you see any problems, try updating Theano or downgrading cuDNN to version 5.
warnings.warn(warn)
[graphy] floatX = float32
Logpath: /media/vismod/148fd670-6cd2-4eda-9e3c-339a23098e8d/vismod/iaf-master/log
WARNING - Deep VAE - No observers have been added to this run
INFO - Deep VAE - Running command 'train'
INFO - Deep VAE - Started
CVAE1 with {'depths': [2, 2, 2], 'nl': u'elu', 'n_h2': 64, 'n_z': 32, 'shape_x': [3, 32, 32], 'optim': u'adamax', 'weightsharing': False, 'px': u'logistic', 'kernel_x': [5, 5], 'n_h1': 64, 'prior': u'diag', 'posterior': u'down_iaf2_nl', 'pad_x': 0, 'beta2': 0.001, 'beta1': 0.1, 'depth_ar': 1, 'alpha': 0.002, 'kl_min': 0.25, 'downsample_type': u'nn', 'kernel_h': [3, 3]}
ERROR - Deep VAE - Failed after 0:00:01!
Traceback (most recent calls WITHOUT Sacred internals):
File "/media/vismod/148fd670-6cd2-4eda-9e3c-339a23098e8d/vismod/iaf-master/train.py", line 185, in train
model = construct_model(data_init)
File "/media/vismod/148fd670-6cd2-4eda-9e3c-339a23098e8d/vismod/iaf-master/train.py", line 128, in construct_model
model = models.cvae1(**margs)
File "/media/vismod/148fd670-6cd2-4eda-9e3c-339a23098e8d/vismod/iaf-master/models.py", line 543, in cvae1
f_encode_decode(w)
File "/media/vismod/148fd670-6cd2-4eda-9e3c-339a23098e8d/vismod/iaf-master/models.py", line 446, in f_encode_decode
h = layers[i][j].up(h, w)
File "/media/vismod/148fd670-6cd2-4eda-9e3c-339a23098e8d/vismod/iaf-master/models.py", line 144, in up
qz[0] = N.rand.gaussian_diag(qz_mean, 2*qz_logsd)
File "/media/vismod/148fd670-6cd2-4eda-9e3c-339a23098e8d/vismod/iaf-master/graphy/nodes/rand.py", line 81, in gaussian_diag
eps = G.rng_curand.normal(size=mean.shape)
File "/home/vismod/anaconda2/envs/tensorflow/lib/python2.7/site-packages/theano/sandbox/cuda/rng_curand.py", line 368, in normal
self.next_seed())
File "/home/vismod/anaconda2/envs/tensorflow/lib/python2.7/site-packages/theano/sandbox/cuda/rng_curand.py", line 108, in new_auto_update
o_gen, sample = self(generator, cast(v_size, 'int32'))
File "/home/vismod/anaconda2/envs/tensorflow/lib/python2.7/site-packages/theano/gof/op.py", line 668, in call
required = thunk()
File "/home/vismod/anaconda2/envs/tensorflow/lib/python2.7/site-packages/theano/gof/op.py", line 883, in rval
fill_storage()
File "/home/vismod/anaconda2/envs/tensorflow/lib/python2.7/site-packages/theano/gof/cc.py", line 1707, in call
reraise(exc_type, exc_value, exc_trace)
File "", line 2, in reraise
RuntimeError: curand error generating random normals 102

Memory space is increasing

I am executing tf_train.py (num_gpus=1). In forward function for two for loops it is running fine for i=0,j=0;i=0,j=1; but continuously it is taking huge memory. For i=1,j=0 of for loop in forward function it calls sub_layer.up which in turn calls conv2d, it got almost very slow and memory space is increasing at execution of line no. 49 (layer.py) of con2d function. Anyone please help me to resolve this increased memory issue.

Command Line for MNIST

What command for the lasagne implementation was passed for the MNIST experiment in the paper? In particular, what option modifies the fully connected layer to have 450 neurons?

errors upon running the project

Hi, I was trying to test the project implementation but I'm running on some errors running the following command python train.py with problem=cifar10 n_z=32 n_h=64 depths=[2,2,2] margs.depth_ar=1 margs.posterior=down_iaf2_NL margs.kl_min=0.25

[graphy] floatX = float32 Traceback (most recent call last): File "train.py", line 1, in <module> import graphy as G File "/home/user/projects/python/theano/iaf/graphy/__init__.py", line 45, in <module> import misc.data File "/home/user/projects/python/theano/iaf/graphy/misc/data.py", line 6, in <module> basepath = os.environ['ML_DATA_PATH'] File "/home/user/anaconda2/lib/python2.7/UserDict.py", line 40, in __getitem__ raise KeyError(key) KeyError: 'ML_DATA_PATH'

Any suggestions much appreciated!

Constant variance for the generating network of autoencoder

Why are we using constant variance for the generating network of autoencoder instead of learning it like mean from the network itself. What advantage does it have over the learnable variance? This is done in the models.py file at line numbers 473 and 681.

mean_x = T.clip(output+.5, 0+1/512., 1-1/512.)
logsd_x = 0*mean_x + w['logsd_x']

10 Python 3 syntax errors

flake8 testing of https://github.com/openai/iaf on Python 3.6.3

$ flake8 . --count --select=E901,E999,F821,F822,F823 --show-source --statistics

./models.py:339:49: E999 SyntaxError: invalid syntax
            print "TODO: SAMPLES FROM MADE PRIOR"
                                                ^
./train.py:5:29: E999 SyntaxError: invalid syntax
from __builtin__ import False
                            ^
./graphy/__init__.py:15:26: E999 SyntaxError: invalid syntax
print '[graphy] floatX = '+floatX
                         ^
./graphy/function.py:16:44: E999 SyntaxError: invalid syntax
                print '*** NaN detected ***'
                                           ^
./graphy/ndict.py:157:19: E999 SyntaxError: invalid syntax
            print d.keys()
                  ^
./graphy/misc/data.py:49:33: E999 SyntaxError: invalid syntax
        print "Full training set"
                                ^
./graphy/misc/optim.py:10:15: E999 SyntaxError: invalid syntax
    print 'SGD', 'alpha:',alpha
              ^
./graphy/nodes/__init__.py:72:68: E999 SyntaxError: invalid syntax
        print 'WARNING: constant rescale, these weights arent saved'
                                                                   ^
./graphy/nodes/ar.py:61:68: E999 SyntaxError: invalid syntax
        print 'WARNING: constant rescale, these weights arent saved'
                                                                   ^
./graphy/nodes/conv.py:87:115: E999 SyntaxError: invalid syntax
        print 'new code, requires that the minibatch size "x.tag.test_value.shape[0]" is the same during execution' 
                                                                                                                  ^
10    E999 SyntaxError: invalid syntax
10

Incorrect initialisation in tf_utils/layers.py

In line 93 of the above file, it looks like the normalisation is performed over the out_channels, instead of in_channels. I think instead it should be:
v_norm = tf.nn.l2_normalize(v, [0, 1, 3])

conv2d running very slowly

Hi,

I am running the tf_train.py and tf_utils code out of the box. Our tensorflow's version is 1.3.0 and GPU is GeForce GTX TITAN X. The conv2d function in tf_utils/layers.py is running very slowly. Specifically, the following two lines in conv2d take a long time:
_ = tf.get_variable("g", initializer=tf.log(scale_init) / 3.0)
_ = tf.get_variable("b", initializer=-m_init * scale_init)

I think due to lazy evaluation, what is actually taking time is this line:
m_init, v_init = tf.nn.moments(x_init, [0, 2, 3])
as both m_init and scale_init depend on the moments.

When running conv2d, nvidia-smi shows 'No running processes found' and 'GPU-Util Compute M.' is 0%. The CPU utilization is ~ 95%, which means it isn't exploiting the multi-core CPU architecture either. I wonder how I can speed it up.

Thank you!

openai / iaf Goto Github PK

iaf's Introduction

Improve Variational Inference with Inverse Autoregressive Flow

Prerequisites

Syntax of train.py

Results of Table 3

Multi-GPU TensorFlow implementation

Prerequisites

Syntax of tf_train.py

Loading from the checkpoint

iaf's People

Contributors

Stargazers

Watchers

Forkers

iaf's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs