brain-research / acai Goto Github PK

Code for "Understanding and Improving Interpolation in Autoencoders via an Adversarial Regularizer"

License: Apache License 2.0

Python 71.60% Jupyter Notebook 19.81% Shell 8.59%

acai's Introduction

Adversarially Constrained Autoencoder Interpolations (ACAI)

Code for the paper "Understanding and Improving Interpolation in Autoencoders via an Adversarial Regularizer" by David Berthelot, Colin Raffel, Aurko Roy, and Ian Goodfellow.

This is not an officially supported Google product.

Setup

Config with virtualenv

sudo apt install virtualenv

cd <path_to_code>
virtualenv --system-site-packages env2
. env2/bin/activate
pip install -r requirements.txt

Config environment variables

Choose a folder where to save the datasets, for example ~/Data

export AE_DATA=~/Data

Installing datasets

python create_datasets.py

Training

CUDA_VISIBLE_DEVICES=0 python acai.py \
--train_dir=TEMP \
--latent=16 --latent_width=2 --depth=16 --dataset=celeba32

All training from the paper can be found in folder runs.

Models

These are the maintained models:

aae.py
acai.py
baseline.py
denoising.py
dropout.py
vae.py
vqvae.py

Classifiers / clustering

classifier_fc.py: fully connected single layer from raw pixels, see runs/classify.sh for examples.
Auto-encoder classification is trained at the same as the auto-encoder.
cluster.py: K-means clustering, see runs/cluster.sh for examples.

Utilities

create_datasets.py: see Installing datsets for more info.

Unofficial implementations

Kyle McDonald created a Pytorch version of ACAI here.

acai's People

Contributors

Stargazers

Watchers

acai's Issues

equation 2 lambda

Sorry, this is more of a question than a bug. I'm trying to understand the connection between the paper and the code.

In equation 2 you refer to a lamba term, but as far as I can tell it does not appear in the code (lambda = 1). Is that correct?

edit: sorry, i found it... it's just a few lines down as advweight https://github.com/brain-research/acai/blob/master/acai.py#L89

Why so many xxx<<10?

hi ,dear author:
I notice you used a lot of xxx<<10 in your source as:

            summary_hook = tf.train.SummarySaverHook(
                save_steps=(report_kimg << 10) // batch_size,
                output_dir=self.summary_dir,
                summary_op=tf.summary.merge_all())
            stop_hook = tf.train.StopAtStepHook(last_step=1 + (FLAGS.total_kimg << 10) // batch_size)
            report_hook = utils.HookReport(report_kimg << 10, batch_size)

I dnt understand this because both report_img(1<<6) and FLAGS.total_kimg(1<<14) are the real number given already.
Why you enlarge them by <<10 once again?

The function to get the embedding of dataset

I want to use ACAI to pretrain, so i just want to use ACAI to generate the embedding of datasets(such as celeba), and i wonder what is the procedure?

And another question is the dimension of the embedding should be setup by which parameter?Or maybe the combination of which parameters?

And i didn't find the runs/classify.sh. Wish your feedback!

error message (FailedPreconditionError) from train.py via latest tensorflow_gpu

Hi,

I tried to run your code for minist database on cluster, where I cannot install tensorflow_gpu==1.8 (ImportError: libcublas.so.9.0: cannot open shared object file
). So I installed the latest tensorflow_gpu, and then the error message pompted up:

2019-07-06 11:50:03.069848: W tensorflow/compiler/jit/mark_for_compilation_pass.cc:1412] (One-time warning): Not using XLA:CPU for cluster because envvar TF_XLA_FLAGS=--tf_xla_cpu_global_jit was not set. If you want XLA:CPU, either set that envvar, or use experimental_jit_scope to enable XLA:CPU. To confirm that XLA is active, pass --vmodule=xla_compilation_cache=1 (as a proper command-line flag, not via TF_XLA_FLAGS) or set the envvar XLA_FLAGS=--xla_hlo_profile.
I0706 11:50:03.083035 140122140165952 session_manager.py:500] Running local_init_op.
I0706 11:50:03.100641 140122140165952 session_manager.py:502] Done running local_init_op.
I0706 11:50:03.797316 140122140165952 basic_session_run_hooks.py:606] Saving checkpoints for 0 into TRAIN/mnist32/AEBaseline_depth16_latent16_scales3/tf/model.ckpt.
2019-07-06 11:50:04.067226: W tensorflow/core/framework/op_kernel.cc:1479] OP_REQUIRES failed at flat_map_dataset_op.cc:36 : Failed precondition: Could not find required function definition __inference_Dataset_flat_map_read_one_file_11
2019-07-06 11:50:04.067304: E tensorflow/core/common_runtime/executor.cc:641] Executor failed to create kernel. Failed precondition: Could not find required function definition __inference_Dataset_flat_map_read_one_file_11
[[{{node OptimizeDataset/FlatMapDataset}}]]
2019-07-06 11:50:04.067379: W tensorflow/core/framework/op_kernel.cc:1502] OP_REQUIRES failed at iterator_ops.cc:973 : Failed precondition: Could not find required function definition __inference_Dataset_flat_map_read_one_file_11
[[{{node OptimizeDataset/FlatMapDataset}}]]
Traceback (most recent call last):
File "/home/wuy/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1356, in _do_call
return fn(*args)
File "/home/wuy/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1341, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/home/wuy/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1429, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.FailedPreconditionError: Could not find required function definition __inference_Dataset_flat_map_read_one_file_11
[[{{node OptimizeDataset/FlatMapDataset}}]]
[[OneShotIterator]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "baseline.py", line 104, in
app.run(main)
File "/home/wuy/.local/lib/python3.6/site-packages/absl/app.py", line 300, in run
_run_main(main, args)
File "/home/wuy/.local/lib/python3.6/site-packages/absl/app.py", line 251, in _run_main
sys.exit(main(argv))
File "baseline.py", line 94, in main
model.train()
File "/scratch/wuy/acai-master/lib/train.py", line 162, in train
self.train_step(data_in, ops)
File "/scratch/wuy/acai-master/lib/train.py", line 93, in train_step
x = self.tf_sess.run(data)
File "/home/wuy/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 950, in run
run_metadata_ptr)
File "/home/wuy/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1173, in _run
feed_dict_tensor, options, run_metadata)
File "/home/wuy/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1350, in _do_run
run_metadata)
File "/home/wuy/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1370, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.FailedPreconditionError: Could not find required function definition __inference_Dataset_flat_map_read_one_file_11
[[{{node OptimizeDataset/FlatMapDataset}}]]
[[OneShotIterator]]

What's the mininum GPU memory size required?

Hi, when I run python create_dataset.py and an error of OUT OF GPU MEMORY occured.
I wonder what's the minimum GPU memory size required.
My own hardware is GTX 1050 with 4GB mem.

application in musicvae

Hey

Could this be applied to MusicVAE? I'm not sure about whether this can help with interpolation of sequences of MIDI melodies. Any advice on where in the code I could look for that?

Thanks

Model not converging on 224x224

Dear code author, it's really great to see that you have answered all queries till now. I was wondering if you could kindly help me a bit too.
I am trying to train ACAI on 224x224 size images with following args but after 50-70 epochs the loss becomes constant, and even after 1000+ epochs the images created are hazy. I have played with various learning rates, latent size etc. but nothing helped. Could you please guide how we can train larger size (and possibly colored) images with this model.

args = {
'epochs': 10000,
'width': 224,
'latent_width': 10,
'depth': 16,
'advdepth': 16,
'advweight': 0.5,
'reg': 0.2,
'latent': 2,
'colors': 1,
'lr': 0.01,
'batch_size': 64,
'device': 'cuda'
}

Appreciate you giving your time to read this.

Do you write baseline.py/aae.py/dropout.py/denoising.py from scratch?

Dear author:
you compared your acai method with several baselines. I wonder do you implement these baselines by yourself from scratch, or you simply refereed others implementation.
If you implement these baselines from scratch, your coding skill is really excellent.

Logic of op training order

Dear author:
I found a potential logic problem:
In baseline.py, Line 58:

acai/baseline.py

Line 58 in 6f26d68

with tf.control_dependencies(update_ops):

, you train train_op = train_op.minimize(loss + xloss, tf.train.get_global_step()) meanwhile. But from my personal perspective, I think you should train loss op for some iterations and THEN train a single layer classifier wit xloss op to test the classification accuracy.
Do you feel the same way? Thanks.

Quesition on self.sess and self.tf_sess

Hi, dear author:
I wonder what's difference between self.sess and self.tf_sess?
I DNT understand why you use x = self.tf_sess.run(data) to run data op while using self.sess.run(ops.train_op, feed_dict={ops.x: x, ops.label: label}) to run ops.train_op?
Could you give me some illustrations?

In file lib/train.py:

    @property
    def tf_sess(self):
        return self.sess._tf_sess()

    def train_step(self, data, ops):
        x = self.tf_sess.run(data)
        x, label = x['x'], x['label']
        self.sess.run(ops.train_op, feed_dict={ops.x: x, ops.label: label})

Why No reuse=True setted?

Dear author:
I feel sorry cause this is a tensorflow technical problem, but I will greatly appreciate that if you can answer that.
In lib/train.py, you try to get variable with name name and put it into summary. However, my question is , You have not set reuse=True so will that be a potential problem??

Thanks.

    @staticmethod
    def add_summary_var(name):
        """
        add variable name into summary, and name it 'name' in summary
        :param name: 
        :return: 
        """
        v = tf.get_variable(name, [], trainable=False, initializer=tf.initializers.zeros())
        tf.summary.scalar(name, v)
        return v

You call it without setting reuse=true:

        with tf.Graph().as_default():

            data_in = self.train_data.make_one_shot_iterator().get_next()
            global_step = tf.train.get_or_create_global_step()

            self.latent_accuracy = self.add_summary_var('latent_accuracy')
            self.mean_smoothness = self.add_summary_var('mean_smoothness')
            self.mean_distance = self.add_summary_var('mean_distance')

the code could not be run under tf2.0

I try to change code “data_in = self.train_data.make_one_shot_iterator().get_next()” to “for data_in in self.train_data:”，

then below error occur.

/content/gdrive/MyDrive/Colab Notebooks/acai-master/lib/train.py:136 train  *
    for data_in in self.train_data:
/usr/local/lib/python3.7/dist-packages/tensorflow/python/autograph/operators/control_flow.py:422 for_stmt
    iter_, extra_test, body, get_state, set_state, symbol_names, opts)
/usr/local/lib/python3.7/dist-packages/tensorflow/python/autograph/operators/control_flow.py:727 _tf_dataset_for_stmt
    _verify_loop_init_vars(init_vars, symbol_names)
/usr/local/lib/python3.7/dist-packages/tensorflow/python/autograph/operators/control_flow.py:191 _verify_loop_init_vars
    raise ValueError(error_msg)

ValueError: 'self.sess' may not be None before the loop.

No tf.sigmoid on last layer of layers.decode

Hi, dear author:
I notice You ignore the last layer activation func: tf.sigmoid of the last layer layers.decocde. I guess you should notice this and do it on purpose.
Could you give some help?

def decoder(x, scales, depth, colors, scope):
    """

    :param x:
    :param scales:
    :param depth:
    :param colors:
    :param scope:
    :return:
    """
    # [?, 4, 4, 16]
    activation = tf.nn.leaky_relu
    conv_op = functools.partial(tf.layers.conv2d, padding='same', kernel_initializer=MyInit(0.2))
    y = x
    with tf.variable_scope(scope, reuse=tf.AUTO_REUSE):
        for scale in range(scales - 1, -1, -1):
            #          input, filter,    kernel_size
            y = conv_op(y, depth << scale, 3, activation=activation)
            y = conv_op(y, depth << scale, 3, activation=activation)
            y = upscale2d(y, 2)
        y = conv_op(y, depth, 3, activation=activation)
        y = conv_op(y, colors, 3) # [?, 32, 32, 1]
        return y