GithubHelp home page GithubHelp logo

importance-sampling's Introduction

Importance Sampling

This python package provides a library that accelerates the training of arbitrary neural networks created with Keras using importance sampling.

# Keras imports

from importance_sampling.training import ImportanceTraining

x_train, y_train, x_val, y_val = load_data()
model = create_keras_model()
model.compile(
    optimizer="adam",
    loss="categorical_crossentropy",
    metrics=["accuracy"]
)

ImportanceTraining(model).fit(
    x_train, y_train,
    batch_size=32,
    epochs=10,
    verbose=1,
    validation_data=(x_val, y_val)
)

model.evaluate(x_val, y_val)

Importance sampling for Deep Learning is an active research field and this library is undergoing development so your mileage may vary.

Relevant Research

Ours

  • Not All Samples Are Created Equal: Deep Learning with Importance Sampling [preprint]
  • Biased Importance Sampling for Deep Neural Network Training [preprint]

By others

  • Stochastic optimization with importance sampling for regularized loss minimization [pdf]
  • Variance reduction in SGD by distributed importance sampling [pdf]

Dependencies & Installation

Normally if you already have a functional Keras installation you just need to pip install keras-importance-sampling.

  • Keras > 2
  • A Keras backend among Tensorflow, Theano and CNTK
  • blinker
  • numpy
  • matplotlib, seaborn, scikit-learn are optional (used by the plot scripts)

Documentation

The module has a dedicated documentation site but you can also read the source code and the examples to get an idea of how the library should be used and extended.

Examples

In the examples folder you can find some Keras examples that have been edited to use importance sampling.

Code examples

In this section we will showcase part of the API that can be used to train neural networks with importance sampling.

# Import what is needed to build the Keras model
from keras import backend as K
from keras.layers import Dense, Activation, Flatten
from keras.models import Sequential

# Import a toy dataset and the importance training
from importance_sampling.datasets import MNIST
from importance_sampling.training import ImportanceTraining


def create_nn():
    """Build a simple fully connected NN"""
    model = Sequential([
        Flatten(input_shape=(28, 28, 1)),
        Dense(40, activation="tanh"),
        Dense(40, activation="tanh"),
        Dense(10),
        Activation("softmax") # Needs to be separate to automatically
                              # get the preactivation outputs
    ])

    model.compile(
        optimizer="adam",
        loss="categorical_crossentropy",
        metrics=["accuracy"]
    )

    return model


if __name__ == "__main__":
    # Load the data
    dataset = MNIST()
    x_train, y_train = dataset.train_data[:]
    x_test, y_test = dataset.test_data[:]

    # Create the NN and keep the initial weights
    model = create_nn()
    weights = model.get_weights()

    # Train with uniform sampling
    K.set_value(model.optimizer.lr, 0.01)
    model.fit(
        x_train, y_train,
        batch_size=64, epochs=10,
        validation_data=(x_test, y_test)
    )

    # Train with importance sampling
    model.set_weights(weights)
    K.set_value(model.optimizer.lr, 0.01)
    ImportanceTraining(model).fit(
        x_train, y_train,
        batch_size=64, epochs=2,
        validation_data=(x_test, y_test)
    )

Using the script

The following terminal commands train a small VGG-like network to ~0.65% error on MNIST (the numbers are from a CPU). .. code:

$ # Train a small cnn with mnist for 500 mini-batches using importance
$ # sampling with bias to achieve ~ 0.65% error (on the CPU).
$ time ./importance_sampling.py \
>   small_cnn \
>   oracle-gnorm \
>   model \
>   predicted \
>   mnist \
>   /tmp/is \
>   --hyperparams 'batch_size=i128;lr=f0.003;lr_reductions=I10000' \
>   --train_for 500 --validate_every 500
real    1m41.985s
user    8m14.400s
sys     0m35.900s
$
$ # And with uniform sampling to achieve ~ 0.9% error.
$ time ./importance_sampling.py \
>   small_cnn \
>   oracle-loss \
>   uniform \
>   unweighted \
>   mnist \
>   /tmp/uniform \
>   --hyperparams 'batch_size=i128;lr=f0.003;lr_reductions=I10000' \
>   --train_for 3000 --validate_every 3000
real    9m23.971s
user    47m32.600s
sys     3m4.188s

importance-sampling's People

Contributors

angeloskath avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

importance-sampling's Issues

Error when running mnist example

I tried to run mnist example and got the following error. It seems to be related to Transparent_keras.
I've installed transparent-keras 0.2. How to fix that?

transparent_keras/transparent_model.pyc in _get_train_updates(self)
77 updates += model.optimizer.get_updates(
78 model._collected_trainable_weights,
---> 79 model.constraints,
80 model.total_loss
81 )

AttributeError: 'Model' object has no attribute 'constraints'

Compatibility with tensorflow = 2.5.0 and Keras = 2.4

I tried to run the example code below and receive the following error:
model = tf.keras.Sequential( layers=tf.keras.layers.Dense(units=2), name="Linear") model.compile(loss=tf.keras.losses.MeanSquaredError(), optimizer="adam")
ImportanceTraining(model).fit( x_train, y_train, batch_size=32, epochs=10, verbose=1,validation_data=(x_val, y_val) )

The error is below:

`---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
in
----> 1 ImportanceTraining(model).fit(
2 x_train, y_train,
3 batch_size=32,
4 epochs=10,
5 verbose=1,validation_data=(x_val, y_val)

~/myvenv/mykears3.9/lib/python3.9/site-packages/importance_sampling/training.py in init(self, model, presample, tau_th, forward_batch_size, score, layer)
368
369 # Call the parent to wrap the model
--> 370 super(ImportanceTraining, self).init(model, score, layer)
371
372 def sampler(self, dataset, batch_size, steps_per_epoch, epochs):

~/myvenv/mykears3.9/lib/python3.9/site-packages/importance_sampling/training.py in init(self, model, score, layer)
338 self._reweighting = BiasedReweightingPolicy(1.0) # no bias
339
--> 340 super(_UnbiasedImportanceTraining, self).init(model, score, layer)
341
342 @Property

~/myvenv/mykears3.9/lib/python3.9/site-packages/importance_sampling/training.py in init(self, model, score, layer)
33 self._check_model(model)
34 self.original_model = model
---> 35 self.model = OracleWrapper(
36 model,
37 self.reweighting,

~/myvenv/mykears3.9/lib/python3.9/site-packages/importance_sampling/model_wrappers.py in init(self, model, reweighting, score, layer)
167 # Augment the model with reweighting, scoring etc
168 # Save the new model and the training functions in member variables
--> 169 self._augment_model(model, score, reweighting)
170
171 def _gnorm_layer(self, model, layer):

~/myvenv/mykears3.9/lib/python3.9/site-packages/importance_sampling/model_wrappers.py in _augment_model(self, model, score, reweighting)
203 loss = model.loss
204 optimizer = model.optimizer.class(**model.optimizer.get_config())
--> 205 output_shape = model.get_output_shape_at(0)[1:]
206 if isinstance(loss, str) and loss.startswith("sparse"):
207 output_shape = output_shape[:-1] + (1,)

~/myvenv/mykears3.9/lib/python3.9/site-packages/tensorflow/python/keras/engine/base_layer.py in get_output_shape_at(self, node_index)
1986 RuntimeError: If called in Eager mode.
1987 """
-> 1988 return self._get_node_attribute_at_index(node_index, 'output_shapes',
1989 'output shape')
1990

~/myvenv/mykears3.9/lib/python3.9/site-packages/tensorflow/python/keras/engine/base_layer.py in _get_node_attribute_at_index(self, node_index, attr, attr_name)
2582 """
2583 if not self._inbound_nodes:
-> 2584 raise RuntimeError('The layer has never been called '
2585 'and thus has no defined ' + attr_name + '.')
2586 if not len(self._inbound_nodes) > node_index:

RuntimeError: The layer has never been called and thus has no defined output shape.
`

How do I get the serial Numbers of important samples

hi,
I read the examples/mnist_cnn.The algorithm judges whether the current sample is important or not and justify using important sampling or not.How do I get the serial Numbers of important samples.Where should I add "print()".
thanks.

First impressions

Hi, thanks for this cool package. I enjoyed reading your paper.

I noticed that Keras >= 2.0.8 is required so that the clone_model method is available.

Also, when running your Quick start code, the model doesn't learn anything 😕
I get an accuracy of 0.09 in all epochs, meaning it is just random selection. I don't have an explanation at the moment. I get the same thing whether I used your ImportanceTraining class or not, so perhaps it isn't related to your code specifically.

ModuleNotFoundError: No module named 'training'

@angeloskath I am trying to use the importance sampling and am getting the following error:

\AppData\Local\Continuum\Anaconda3\envs\tensorflow-gpu\lib\site-packages\importance_sampling_init_.py

in()
14 version = "0.2"
15
---> 16 from training import ImportanceTraining, ApproximateImportanceTraining

ModuleNotFoundError: No module named 'training'

I installed in my conda environment using "pip install keras-importance-sampling" and almost all requirements were already met except it had to also install blinker and transparent-keras.

Any idea what might be going wrong? I am using Windows, so not sure if that is a compatibility issue?

A confusion when reading your paper

Hi,

Thanks for your awesome work!

For Figure 3 in your paper, the test error curve of 'uniform' is below that of Loshchilov & Hutter (2015). However, in your analysis in Section 4.2, you said 'The results are depicted in figure 3. We observe that in
the relatively easy CIFAR10 dataset, all methods can provide some speedup over uniform sampling'. There is a conflict between Figure 3 and this analysis.

For Figure 6, the curve of 'uniform' test error curve almost matches your 'upper-bound' one and it beats other compared methods.

I am a little confused. The 'uniform' refers to the basic routine of training neural networks. (uniformly sampling data into mini-batch). Why does it beat other compared methods? I think other methods are also designed to improve this 'uniform' baseline. But the results show the contrary. Could you please give some explanations?

Adjust Learning rate

I try to use LearningRateScheduler to adjust the learning rate when running ImportanceTraining(model).fit. But it seems that the real learning rate keeps unchanged during the whole training process and equals to the lr in optimizer=SGD(lr). Would you please show me how to correctly adjust the learning rate? I use tf=1.4.0 and keras=2.2.0.

Port to tensorflow 2

Running this module using tensorflow 2.0.0 yields an error:
AttributeError: module 'tensorflow' has no attribute 'get_default_graph'

Are there any plans to port the library to tensorflow 2.0.0?

Perhaps all you have to do is change import keras to from tensorflow import keras and perhaps use fully qualified names when using layers...

Would it be possible to calculate b from tau_threshold?

Hi,

I'm trying to adapt your method to my deep learning problem. I noticed that you set the b as hard parameter. But I wondered would it actually be possible to solve b from the inequality tau > (B+3b)/(3b) and use that? I actually tried this, but it seems to perform worse than this hard setting. Do you have any idea why this might fail?

best, Juuso

TypeError: _standardize_user_data() got an unexpected keyword argument 'check_batch_axis'

I trained one of the example models as shown below but I had the error:

TypeError: _standardize_user_data() got an unexpected keyword argument 'check_batch_axis'

from future import print_function
import keras
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout
from keras.optimizers import RMSprop

from importance_sampling.training import ImportanceTraining

batch_size = 128
num_classes = 10
epochs = 3

the data, shuffled and split between train and test sets

(x_train, y_train), (x_test, y_test) = mnist.load_data()

x_train = x_train.reshape(60000, 784)
x_test = x_test.reshape(10000, 784)
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')

convert class vectors to binary class matrices

y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

model = Sequential()
model.add(Dense(512, activation='relu', input_shape=(784,)))
model.add(Dropout(0.2))
model.add(Dense(512, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(10, activation='softmax'))

model.summary()

model.compile(loss='categorical_crossentropy',
optimizer=RMSprop(),
metrics=['accuracy'])

history = ImportanceTraining(model, forward_batch_size=1024).fit(
x_train, y_train,
batch_size=batch_size,
epochs=epochs,
verbose=1
)
score = model.evaluate(x_test, y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])

Had this error:

TypeError Traceback (most recent call last)
in ()
47 batch_size=batch_size,
48 epochs=epochs,
---> 49 verbose=1
50 )
51 score = model.evaluate(x_test, y_test, verbose=0)

/content/.local/lib/python3.6/site-packages/importance_sampling/training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, steps_per_epoch)
97 steps_per_epoch=steps_per_epoch,
98 verbose=verbose,
---> 99 callbacks=callbacks
100 )
101

/content/.local/lib/python3.6/site-packages/importance_sampling/training.py in fit_dataset(self, dataset, steps_per_epoch, batch_size, epochs, verbose, callbacks)
208
209 # Importance sampling is done here
--> 210 idxs, (x, y), w = sampler.sample(batch_size)
211 # Train on the sampled data
212 loss, metrics, scores = self.model.train_batch(x, y, w)

/content/.local/lib/python3.6/site-packages/importance_sampling/samplers.py in sample(self, batch_size)
60 def sample(self, batch_size):
61 # Get the importance scores of some samples
---> 62 idxs1, scores, xy = self._get_samples_with_scores(batch_size)
63
64 # Sample from the available ones

/content/.local/lib/python3.6/site-packages/importance_sampling/samplers.py in _get_samples_with_scores(self, batch_size)
126 idxs = np.random.choice(self.N, self.large_batch)
127 x, y = self.dataset.train_data[idxs]
--> 128 scores = self.model.score(x, y, batch_size=self.forward_batch_size)
129
130 return (

/content/.local/lib/python3.6/site-packages/importance_sampling/model_wrappers.py in score(self, x, y, batch_size)
93 result = np.hstack([
94 self.score_batch(xi, yi).T
---> 95 for xi, yi in self._iterate_batches(x, y, batch_size)
96 ]).T
97

/content/.local/lib/python3.6/site-packages/importance_sampling/model_wrappers.py in (.0)
93 result = np.hstack([
94 self.score_batch(xi, yi).T
---> 95 for xi, yi in self._iterate_batches(x, y, batch_size)
96 ]).T
97

/content/.local/lib/python3.6/site-packages/importance_sampling/model_wrappers.py in score_batch(self, x, y)
228 dummy_target = np.zeros((y.shape[0], 1))
229 inputs = _tolist(x) + [y, dummy_weights]
--> 230 outputs = self.model.test_on_batch(inputs, dummy_target)
231
232 return outputs[self.SCORE].ravel()

/content/.local/lib/python3.6/site-packages/transparent_keras/transparent_model.py in test_on_batch(self, x, y, sample_weight)
187 x, y,
188 sample_weight=sample_weight,
--> 189 check_batch_axis=True
190 )
191

TypeError: _standardize_user_data() got an unexpected keyword argument 'check_batch_axis'

Support of networks with multiple outputs

Hi!
First thanks for this nice package, really enjoyed the paper too.
Is there an extension planned to use the library for multi-output networks?
I would also like to contribute, just thought about how to do it... Would be happy to discuss with you how to do it.
Thanks and best regards,
Max

All layer names must be unique

Hi I tried using importance sampling to keras pretrained Xception models, and this error happens. Is this normal?
RuntimeError: ('The name "input_1" is used 2 times in the model. All layer names should be unique. Layer names: ', ['input_1', 'block1_conv1', 'block1_conv1_bn', 'block1_conv1_act', 'block1_conv2', 'block1_conv2_bn', 'block1_conv2_act', 'block2_sepconv1', 'block2_sepconv1_bn', 'block2_sepconv2_act', 'block2_sepconv2', 'block2_sepconv2_bn', 'conv2d_1', 'block2_pool', 'batch_normalization_1', 'add_1', 'block3_sepconv1_act', 'block3_sepconv1', 'block3_sepconv1_bn', 'block3_sepconv2_act', 'block3_sepconv2', 'block3_sepconv2_bn', 'conv2d_2', 'block3_pool', 'batch_normalization_2', 'add_2', 'block4_sepconv1_act', 'block4_sepconv1', 'block4_sepconv1_bn', 'block4_sepconv2_act', 'block4_sepconv2', 'block4_sepconv2_bn', 'conv2d_3', 'block4_pool', 'batch_normalization_3', 'add_3', 'block5_sepconv1_act', 'block5_sepconv1', 'block5_sepconv1_bn', 'block5_sepconv2_act', 'block5_sepconv2', 'block5_sepconv2_bn', 'block5_sepconv3_act', 'block5_sepconv3', 'block5_sepconv3_bn', 'add_4', 'block6_sepconv1_act', 'block6_sepconv1', 'block6_sepconv1_bn', 'block6_sepconv2_act', 'block6_sepconv2', 'block6_sepconv2_bn', 'block6_sepconv3_act', 'block6_sepconv3', 'block6_sepconv3_bn', 'add_5', 'block7_sepconv1_act', 'block7_sepconv1', 'block7_sepconv1_bn', 'block7_sepconv2_act', 'block7_sepconv2', 'block7_sepconv2_bn', 'block7_sepconv3_act', 'block7_sepconv3', 'block7_sepconv3_bn', 'add_6', 'block8_sepconv1_act', 'block8_sepconv1', 'block8_sepconv1_bn', 'block8_sepconv2_act', 'block8_sepconv2', 'block8_sepconv2_bn', 'block8_sepconv3_act', 'block8_sepconv3', 'block8_sepconv3_bn', 'add_7', 'block9_sepconv1_act', 'block9_sepconv1', 'block9_sepconv1_bn', 'block9_sepconv2_act', 'block9_sepconv2', 'block9_sepconv2_bn', 'block9_sepconv3_act', 'block9_sepconv3', 'block9_sepconv3_bn', 'add_8', 'block10_sepconv1_act', 'block10_sepconv1', 'block10_sepconv1_bn', 'block10_sepconv2_act', 'block10_sepconv2', 'block10_sepconv2_bn', 'block10_sepconv3_act', 'block10_sepconv3', 'block10_sepconv3_bn', 'add_9', 'block11_sepconv1_act', 'block11_sepconv1', 'block11_sepconv1_bn', 'block11_sepconv2_act', 'block11_sepconv2', 'block11_sepconv2_bn', 'block11_sepconv3_act', 'block11_sepconv3', 'block11_sepconv3_bn', 'add_10', 'block12_sepconv1_act', 'block12_sepconv1', 'block12_sepconv1_bn', 'block12_sepconv2_act', 'block12_sepconv2', 'block12_sepconv2_bn', 'block12_sepconv3_act', 'block12_sepconv3', 'block12_sepconv3_bn', 'add_11', 'block13_sepconv1_act', 'block13_sepconv1', 'block13_sepconv1_bn', 'block13_sepconv2_act', 'block13_sepconv2', 'block13_sepconv2_bn', 'conv2d_4', 'block13_pool', 'batch_normalization_4', 'add_12', 'block14_sepconv1', 'block14_sepconv1_bn', 'block14_sepconv1_act', 'block14_sepconv2', 'block14_sepconv2_bn', 'block14_sepconv2_act', 'avg_pool', 'dropout_1', 'input_1', 'predictions', 'loss_layer_2', 'input_2', 'loss_layer_1', 'external_reweighting_1', 'multiply_1']) Exception ignored in: <bound method BaseSession.__del__ of <tensorflow.python.client.session.Session object at 0x7f6b32fd88d0>> Traceback (most recent call last):

High GPU memory usage

Hello,

Is it normal that training with ImportanceTraining takes more GPU memory than training without it? if yes, how much of an overhead should be expected?

I am trying to finetune EfficientNetB3 on an RTX 2070 (8GB), and while it normally works fine (up to a batch size of ~60), it is unusable when using importance sampling (even with batch size of 1). Am I doing anything wrong?

It does work fine when training on CPU, and EfficientNetB0 does work on GPU with IS, which leads me to believe it is not a bug in my code...

Multiple inputs with Functional API error

I got this error when running the models with multiple inputs and Model() api in Keras 2.0.6

/lib/python2.7/site-packages/importance_sampling/training.pyc in init(self, model, k, smooth, adaptive_smoothing, presample, forward_batch_size)
263
264 # Call the parent to wrap the model
--> 265 super(ImportanceTraining, self).init(model)
266
267 # Create the sampler factory, the workhorse of the whole deal :-)

/lib/python2.7/site-packages/importance_sampling/training.pyc in init(self, model)
26 # and can be used in an importance sampling training scheme
27 self.original_model = model
---> 28 self.model = OracleWrapper(model, self.reweighting)
29
30 @Property

/lib/python2.7/site-packages/importance_sampling/model_wrappers.pyc in init(self, model, reweighting, score)
144
145 def init(self, model, reweighting, score="loss"):
--> 146 self.model = self._augment_model(model, score, reweighting)
147 self.reweighting = reweighting
148

/lib/python2.7/site-packages/importance_sampling/model_wrappers.pyc in _augment_model(self, model, score, reweighting)
192 inputs=[model.input, y_true, pred_score],
193 outputs=[weighted_loss],
--> 194 observed_tensors=[loss_tensor, weighted_loss, score_tensor] + metrics
195 )
196 new_model.compile(

/lib/python2.7/site-packages/transparent_keras/transparent_model.pyc in init(self, inputs, outputs, observed_tensors, **kwargs)
23 inputs=inputs,
24 outputs=outputs,
---> 25 **kwargs
26 )
27
/keras/engine/topology.pyc in init(self, inputs, outputs, name)
1498
1499 # Check for redundancy in inputs.
-> 1500 if len(set(self.inputs)) != len(self.inputs):
1501 raise ValueError('The list of inputs passed to the model '
1502 'is redundant. '

TypeError: unhashable type: 'list'

How to solve it?

Question regarding Eq. 29

Hi,

May I ask how do you get the square out of the expectation in Eq. 29 in your appendix?

I have this naïve computation plug in "gi" and I get:

sum gi * (wi)^2 * ||Gi||^2 = sum ||Gi||/(B^2)

Could you tell me how to calculate your expectation? Thanks!

5 x slower in RNN

Just FYI. I haven´t read the paper yet, but trying to use it out-of-the-box appyling ImportanceTraining to my RNN language model, the training time went from 16 hours to 83 hours... :(
I´ll read the paper and the code later.
Cheers

Memory leak?

I am running multiple ImportanceSampling trainings in a row to compare the performance using different models. It looks like the memory is not free'd after the training of each model is over (main memory, not video memory).

Here is how it looks after training 4 models (on a system with 100gb of RAM):

Screenshot 2020-03-12 at 10 16 16

If I don't use ImportanceSampling, the memory usage remains stable, and I can train hundreds of successive models in a row.

The result seems okay, but there are a few confusing things

Hi,

I happened to cross by the Importance Sampling topic and I crossed by your project. Thanks for providing us a great project so we can give a try on this idea.

I modified a very standard code for mnist classification from this link (Note I also drop the Dropout layer for a reason I will explain later): https://github.com/keras-team/keras/blob/master/examples/mnist_cnn.py

I noticed the result seems okay, at least using Importance Sampling seems gives good result faster. But it is not so clear whether it improves the result, though!

Baseline (Note I drop the Dropout layer in the baseline as well):
Epoch 1 - val_accuracy: 0.9815
Epoch 2 - val_accuracy: 0.9839
Epoch 3 - val_accuracy: 0.9897
Epoch 4 - val_accuracy: 0.9916
Epoch 5 - val_accuracy: 0.9897

Using Importance Sampling
Epoch 1 - val_accuracy: 0.9828
Epoch 2 - val_accuracy: 0.9893
Epoch 3 - val_accuracy: 0.9874
Epoch 4 - val_accuracy: 0.9902
Epoch 5 - val_accuracy: 0.9897

I wanted to ask three questions:

  1. Did you observe significant improvements when using Importance Sampling than NOT using Importance Sampling? Or the main goal of using Importance Sampling is rather getting a good result faster (at least faster in terms of training with a smaller number of epochs)
  2. Training with Importance Sampling seems pretty slow (running time for each epoch), actually. Is there any chance to speed this up?
  3. Finally, I noticed we should take care of using Dropout and BatchNormalization in Importance Sampling, as: "Those layers may affect the importance calculations and you are advised to exchange them for LayerNormalization or BatchNormalization in test mode and L2 regularization." Why is it the case?

Many thanks in advance.

Best,


just in case: My code for training with Importance Sampling


import keras
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten
from keras.layers import Conv2D, MaxPooling2D
from keras import backend as K
from importance_sampling.training import ImportanceTraining
from keras.layers import Dense, Activation

batch_size = 128
num_classes = 10
epochs = 12

# input image dimensions
img_rows, img_cols = 28, 28

# the data, split between train and test sets
(x_train, y_train), (x_test, y_test) = mnist.load_data()

if K.image_data_format() == 'channels_first':
    x_train = x_train.reshape(x_train.shape[0], 1, img_rows, img_cols)
    x_test = x_test.reshape(x_test.shape[0], 1, img_rows, img_cols)
    input_shape = (1, img_rows, img_cols)
else:
    x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, 1)
    x_test = x_test.reshape(x_test.shape[0], img_rows, img_cols, 1)
    input_shape = (img_rows, img_cols, 1)

x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255
print('x_train shape:', x_train.shape)
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')

# convert class vectors to binary class matrices
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3),
                 activation='relu',
                 input_shape=input_shape))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dense(num_classes))
model.add(Activation("softmax"))


model.compile(loss=keras.losses.categorical_crossentropy,
              optimizer=keras.optimizers.Adadelta(),
              metrics=['accuracy'])

ImportanceTraining(model).fit(x_train, y_train,
          batch_size=batch_size,
          epochs=epochs,
          verbose=1,
          validation_data=(x_test, y_test)
)
score = model.evaluate(x_test, y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])

Some confusion about the fast grads calculation when converting to Pytorch.

Hello,

Great thanks to you for your great efforts. After reading your paper and code, I found that it's in fact a nice and solid work and I really enjoy it.

To utilize this method in my model training, I try to implement your method using the Pytorch framework. I notice that you use the following code to calculate the gradient norm in a fast mode:

if self.fast:
    grads = K.sqrt(sum([
        self._sum_per_sample(K.square(g))
        for g in K.gradients(losses, self.parameter_list)
    ]))

As far as I am concerned, this line of the code [self._sum_per_sample(K.square(g)) for g in K.gradients(losses, self.parameter_list)] has computed the gradients square and summed them per sample. I am confused about why not directly use K.sqrt() function to get the gradient norm of each sample but introduce another sum() function behind the K.sqrt()?

Besides, I have checked the results of sum([self._sum_per_sample(K.square(g)) for g in K.gradients(losses, self.parameter_list)]) and [self._sum_per_sample(K.square(g)) for g in K.gradients(losses, self.parameter_list)], and found that they were equal, which is really amazing. And if I remove the sum() function behind the K.sqrt(), it will raise the data type error. Therefore, does this sum() function only convert the data type and not perform summation?

Expect your reply and I will share my Pytorch implementation once they are ready.

Best,
Shun Lu

About the complete importance sampling code

Hello, after reading this article which called“Not All Samples Are Created Equal Deep Learning with Importance Sampling”, I experimented with your importance sampling code. What does the number 937 mean? Do you have the complete code for MNIST handwritten character recognition mentioned in the article? I'm looking forward to your reply.Thanks very much.

Failed when using ModelCheckpoint callback

I got an error after the first training epoch when trying to save the current model using the ModelCheckpoint callback:

File "/home/iarganda/anaconda2/envs/env_tfmDani/lib/python3.6/site-packages/keras/callbacks.py", line 79, in on_epoch_end
callback.on_epoch_end(epoch, logs)
File "/home/iarganda/anaconda2/envs/env_tfmDani/lib/python3.6/site-packages/keras/callbacks.py", line 446, in on_epoch_end
self.model.save(filepath, overwrite=True)
File "/home/iarganda/anaconda2/envs/env_tfmDani/lib/python3.6/site-packages/keras/engine/network.py", line 1090, in save
save_model(self, filepath, overwrite, include_optimizer)
File "/home/iarganda/anaconda2/envs/env_tfmDani/lib/python3.6/site-packages/keras/engine/saving.py", line 382, in save_model
_serialize_model(model, f, include_optimizer)
File "/home/iarganda/anaconda2/envs/env_tfmDani/lib/python3.6/site-packages/keras/engine/saving.py", line 134, in _serialize_model
'loss': model.loss,
AttributeError: 'Model' object has no attribute 'loss'

Any clue what's going on?

Upstream contribution

Hi I've tried to open an upstream issue at tensorflow/tensorflow#17509 now that keras api is officially in tf.keras cause I think that it is an important topic and it need to be exposed to a broad audience as API.

Do you have a plan to contribute an upstream PR to keras or to tf.estimator?

Fail when metric given as function

I have a model that I compile with

model.compile(loss='binary_crossentropy', optimizer='adam', metrics=[matthews_correlation])

and it fails after the first epoch

17/18 [===========================>..] - ETA: 0s - loss: 14.9240 - <function matthews_correlation at 0x7f61611d0c80>: 0.0000e+00Traceback (most recent call last):
  File "<stdin>", line 19, in <module>
  File "/opt/conda/lib/python3.6/site-packages/importance_sampling/training.py", line 137, in fit
    on_scores=on_scores
  File "/opt/conda/lib/python3.6/site-packages/importance_sampling/training.py", line 289, in fit_dataset
    batch_size=batch_size
  File "/opt/conda/lib/python3.6/site-packages/importance_sampling/model_wrappers.py", line 75, in evaluate
    for xi, yi in self._iterate_batches(x, y, batch_size)
  File "/opt/conda/lib/python3.6/site-packages/importance_sampling/model_wrappers.py", line 75, in <listcomp>
    for xi, yi in self._iterate_batches(x, y, batch_size)
  File "/opt/conda/lib/python3.6/site-packages/importance_sampling/model_wrappers.py", line 298, in evaluate_batch
    print(len(outputs))
  File "/opt/conda/lib/python3.6/site-packages/numpy/core/shape_base.py", line 288, in hstack
    return _nx.concatenate(arrs, 1)
ValueError: all the input arrays must have same number of dimensions

when I run with 'accuracy' as a metric e.g.

model.compile(loss='binary_crossentropy', optimizer='adam',metrics=['accuracy'])

everything is fine.

I've tried it even with a dummy metric

def test_metric(x,y):
    return tf.constant(1.0, dtype=tf.float32)

and it also fails.

I am using CUDA 10.0, TF '1.13.0-rc1', Keras 2.2.4 and latest keras-imporatance-sampling installed via pip. Without ImportanceTraning everything runs fine.

ImportError: No module named 'transparent_model'

I used "pip install keras-importance-sampling" to install it in python 3.5 . My environment is :
keras 2.08
win10 64,
Anaconda2
python 3.5

When I tried to run the example named mnist_mlp.py, then I got the errors "No module named 'transparent_model'". Then I tried to pip install transparent_model, and it did not work. Did I miss something ?

Import error with resnet

Hi,

I am using keras 2.2.4, tensorflow 1.8.0, python 3.5.2 in a virtualenv.

I encountered two issues using your framework when trying the command line examples and the cifar10_resnet example. (I am not using the keras package but the source code directly)

Here are the errors code I get:

File "/home/gpu_user/antoine/importance-sampling/examples/importance_sampling/pretrained.py", line 8, in <module> from keras.applications.resnet50 import WEIGHTS_PATH as RESNET50_WEIGHTS_PATH ImportError: cannot import name 'WEIGHTS_PATH'

I am not sure whether it is an issue on my Keras installation or your something related to your code, but I fixed it by modifying the importance_sampling/pretrained.py:
I replaced:

from keras.applications.resnet50 import WEIGHTS_PATH as RESNET50_WEIGHTS_PATH

With:

import keras.applications.resnet50 as RNET
RESNET50_WEIGHTS_PATH = RNET.resnet50.WEIGHTS_PATH

Also had the following issue:
File "cifar10_resnet.py", line 102, in <module> model = wide_resnet(args.depth, args.width)(dset.shape, dset.output_size) File "/home/gpu_user/antoine/importance-sampling/examples/importance_sampling/models.py", line 438, in wide_resnet_impl x = group1(x) File "/home/gpu_user/antoine/importance-sampling/examples/importance_sampling/models.py", line 420, in inner for i in range(n): TypeError: 'float' object cannot be interpreted as an integer

Fixed it by modifying line 427 of importance_sampling/models.py with:

n = int((L-4)/6))

it ran fine afterwards

Otherwise I liked reading your paper(Not All Sample Are Created Equal: Deep Learning with Importance Sampling), but I have couple questions/observations:

  • In your paper in equation 6 in the last 2 lines I think some transpose operators are missing.
  • In the same paper I am not sure to understand how the \omega_i parameter is used in the SGD. Especially when using the importance sampling.
  • At some point you mention that you let the network warmup for a bit before starting the importance sampling did you do a grid search to find optimal values ? Can you recommend a number of epoch to let it warm up for ? Or you used the Tth defined in the paper as a condition to trigger the importance sampling ?

Thanks,

Antoine

RuntimeError: The layer has never been called and thus has no defined output shape.

when i run "python examples/mnist_cnn.py" according to "https://www.idiap.ch/~katharas/importance-sampling/examples/", but i have errors like below.
(cf, python examples/mnist_cnn.py --uniform, it is ok)

Do you have any idea to fix it?

==============================
Traceback (most recent call last):
File "examples/mnist_cnn.py", line 67, in
wrapped = ConstantTimeImportanceTraining(model)
File "C:\Users\danie\AppData\Roaming\Python\Python37\site-packages\importance_sampling\training.py", line 444, in init
layer
File "C:\Users\danie\AppData\Roaming\Python\Python37\site-packages\importance_sampling\training.py", line 340, in init
super(_UnbiasedImportanceTraining, self).init(model, score, layer)
File "C:\Users\danie\AppData\Roaming\Python\Python37\site-packages\importance_sampling\training.py", line 39, in init
layer=layer
File "C:\Users\danie\AppData\Roaming\Python\Python37\site-packages\importance_sampling\model_wrappers.py", line 169, in init
self._augment_model(model, score, reweighting)
File "C:\Users\danie\AppData\Roaming\Python\Python37\site-packages\importance_sampling\model_wrappers.py", line 205, in _augment_model
output_shape = model.get_output_shape_at(0)[1:]
File "C:\Users\danie\anaconda3\envs\importance-sampling\lib\site-packages\tensorflow\python\keras\engine\base_layer.py", line 2030, in get_output_shape_at
'output shape')
File "C:\Users\danie\anaconda3\envs\importance-sampling\lib\site-packages\tensorflow\python\keras\engine\base_layer.py", line 2603, in _get_node_attribute_at_index
'and thus has no defined ' + attr_name + '.')
RuntimeError: The layer has never been called and thus has no defined output shape.

Creating a New Sampler

Hi,
First, Great work by you guys
I am also trying to do something similar in my research. I am trying to implement the Entropy-Based Sampling. I came up with following

class EntropySampler(ModelSampler):
    """ENtropySampler uses the entropy of the samples to do importance sampling

    Arguments
    ---------
    dataset: The dataset to sample from
    reweighting: The reweighting scheme
    model: The model to be used for scoring
    recompute: Compute the loss for the whole dataset every recompute batches
    """
    def __init__(self, dataset, reweighting, model, forward_batch_size=128,
                 recompute=2):
        super(HistorySampler, self).__init__(
            dataset,
            reweighting,
            model,
            forward_batch_size=forward_batch_size
        )

        # The configuration of EntropySampler
        self.recompute = recompute

        # Mutable variables holding the state of the sampler
        self._batch = 0
        self._scores = np.ones((len(dataset.train_data),))
        self._unseen = np.ones(len(dataset.train_data), dtype=np.bool)
        self._seen = np.zeros_like(self._unseen)

    def _entropy(self, x, px):
        assert(len(x) == len(px))
        px = px/sum(px)
        entropy = []
        for i in range(1, len(px)):
            entropy.append(px*np.log(px))
        return entropy

    def _get_samples_with_scores(self, batch_size):
        return (
            np.arange(len(self._scores)),
            self._scores,
            None
        )

    def update(self, idxs, results):
        # Update the scores of the seen samples
        self._scores[idxs] = results.ravel()
        self._unseen[idxs] = False
        self._seen[idxs] = True
        self._scores[self._unseen] = self._scores[self._seen].mean()

        # Recompute all the scores if needed
        self._batch += 1
        if self._batch % self.recompute == 0:
            for i in range(0, len(self.dataset.train_data), 1024*64):
                x, y = self.dataset.train_data[i:i+1024*64]
                scores = self.model.score(
                    x, y,
                    batch_size=self.forward_batch_size
                ).ravel()
                self._scores[i:i+1024*64] = self._entropy(x, scores).ravel()
            self._seen[:] = True
            self._unseen[:] = False

But I am not sure how to create BaseImportanceTraining for this sampler. In particular i do not understand the partial function. Could you help me with this.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.