Hi Guys, First and foremost, I think Keras is quite amazing !! <

Oops, my bad. The behaviour <a class="user-mention notranslate" data-hovercard-type="u

Working with large datasets like Imagenet about keras HOT 17 CLOSED

superhans commented on May 3, 2024 40

Working with large datasets like Imagenet

from keras.

Comments (17)

fchollet commented on May 3, 2024 103

Keras models absolutely do support batch training. The CIFAR10 example offers an example of this.

What's more, you can use the image preprocessing module (data augmentation and normalization) on batches as well. Here's a quick example:

datagen = ImageDataGenerator(
        featurewise_center=True, # set input mean to 0 over the dataset
        samplewise_center=False, # set each sample mean to 0
        featurewise_std_normalization=True, # divide inputs by std of the dataset
        samplewise_std_normalization=False, # divide each input by its std
        zca_whitening=False, # apply ZCA whitening
        rotation_range=20, # randomly rotate images in the range (degrees, 0 to 180)
        width_shift_range=0.2, # randomly shift images horizontally (fraction of total width)
        height_shift_range=0.2, # randomly shift images vertically (fraction of total height)
        horizontal_flip=True, # randomly flip images
        vertical_flip=False) # randomly flip images

datagen.fit(X_sample) # let's say X_sample is a small-ish but statistically representative sample of your data

# let's say you have an ImageNet generator that yields ~10k samples at a time.
for e in range(nb_epoch):
    print("epoch %d" % e)
    for X_train, Y_train in ImageNet(): # these are chunks of ~10k pictures
        for X_batch, Y_batch in datagen.flow(X_train, Y_train, batch_size=32): # these are chunks of 32 samples
            loss = model.train(X_batch, Y_batch)

# Alternatively, without data augmentation / normalization:
for e in range(nb_epoch):
    print("epoch %d" % e)
    for X_train, Y_train in ImageNet(): # these are chunks of ~10k pictures
        model.fit(X_batch, Y_batch, batch_size=32, nb_epoch=1)

from keras.

jfsantos commented on May 3, 2024 53

If you have a huge dataset as an HDF5 file, you can use keras.utils.io_utils.HDF5Matrix to load the dataset. It will only read one batch at a time from memory, but there's some limitations (e.g., you cannot read shuffled data from the file, only sequentially). A workaround would be to shuffle the data before you store it to disk (but you would still read the same batches after a full epoch).

Here is a short example of how to do this. This considers you have all of your samples in the same HDF5 file, and features and targets are in HDF5 datasets named 'features' and 'targets':

def load_data(datapath, train_start, train_end, n_training_examples, n_test_examples)
    X_train = HDF5Matrix(datapath, 'features', train_start, train_start+n_training_examples, normalizer=normalize_data)
    y_train = HDF5Matrix(datapath, 'targets', train_start, train_start+n_training_examples)
    X_test = HDF5Matrix(datapath, 'features', test_start, test_start+n_test_examples, normalizer=normalize_data)
    y_test = HDF5Matrix(datapath, 'targets', test_start, test_start+n_test_examples)
    return X_train, y_train, X_test, y_test

The returned variables here are not real Numpy arrays, but they implement the same interface so everything works transparently in keras (as long as you don't try to read shuffled indices).

from keras.

fchollet commented on May 3, 2024 45

No, no, model.fit does NOT reset the weights of your model. It starts from
the previous state of the model. You can definitely call model.fit multiple
times.

The difference between model.fit and model.train_on_batch is mainly that
model.fit will break up your data into small batches whereas
model.train_on_batch will use the data it gets as a single batch, running a
single gradient update.

from keras.

fchollet commented on May 3, 2024 17

The model.fit() can be used to update models, right ? That's what I've understood it to do. It doesn't train from scratch each time it is called.

Yes. You're starting from the previous model state (ie. the weights are not re-initialized).

The batch_size seems to be a really crucial parameter. Is there some rule of thumb for selecting the batch size ?

In general smaller batches will give better results, however larger batches will make training faster. There is a compromise to strike, somewhere in between. I generally use 16 or 32, unless there is very little data in which case I go full stochastic (batch_size = 1).

from keras.

jfsantos commented on May 3, 2024 6

You should not call model.fit, as it does exactly what you said it does: reset and train a model from scratch. If you're training on successive batches of data, use model.train_on_batch instead. You can also write a wrapper for your data using the Python generator interface and then use model.fit_generator, which behaves like model.fit but will use whatever generator you pass to it instead of a Numpy array.

from keras.

jfsantos commented on May 3, 2024 6

Oops, my bad. The behaviour @hadi-ds is seeing is then probably due to the model overfitting a bit to the different slices of the dataset. Definitely the best practice with larger than memory datasets is to use either model.fit_generator with a "smart" generator or HDF5Matrix.

from keras.

fchollet commented on May 3, 2024 4

I learned the opposite way: small batch sizes will approach a minimum faster, but using large batches will better approximate the distribution of the training data, thus giving a better result.

Depends on what you mean by fast: stochastic learning gets to a minimum in less epochs/samples seen, but you're doing many more gradient updates, and the average time per sample increases dramatically. So the computing time will be longer (unless you are using very large batch sizes, in which case you're doing redundant computations). Just try it: batch_size = 1 vs. batch_size = 32, on, say, the CIFAR10 example. 32 will be much faster (computationally).

LeCun has long argued that the result obtained with stochastic learning is almost always better, thanks to the random noise it introduces. See: http://yann.lecun.com/exdb/publis/pdf/lecun-98b.pdf

Stochastic learning also often results in better solutions because of the noise in the updates. Nonlinear networks usually have multiple local minima of differing depths. The goal of training is to locate one of these minima. Batch learning will discover the minimum of whatever basin the weights are initially placed. In stochastic learning, the noise present in the updates can result in the weights jumping into the basin of another, possibly deeper, local minimum. This has been demonstrated in certain simplified cases.

from keras.

patyork commented on May 3, 2024 3

In general smaller batches will give better results, however larger batches will make training faster. There is a compromise to strike, somewhere in between. I generally use 16 or 32, unless there is very little data in which case I go full stochastic (batch_size = 1).

There's much debate there. I learned the opposite way: small batch sizes will approach a minimum faster, but using large batches will better approximate the distribution of the training data, thus giving a better result.

I go full stochastic (batch_size = 1).

This is online. Full stochastic would be 1 random sample, such that after training for nb_samples, there is no guarantee that all of the training data would appear once; some wouldn't have been trained on yet, some would have been trained on twice. Yeah?

from keras.

patyork commented on May 3, 2024 1

Depends on what you mean by fast

Fair enough - I was referring to computational time.

Batch learning will discover the minimum of whatever basin the weights are initially placed.

With pre-training (although, that just raises the point again: batches or not during pre-training) this may not be a bad thing. Random noise is random, so there's always the possibility of jumping out of a global optimum, but such is life of those in the machine learning field.

I think we're arguing different points here though: LeCun is referring to Batch training (batch_size==nb_examples), not to mini-batch learning (batch_size < nb_examples). I think mini-batch is the best of both worlds, and I usually train with stochastic mini-batches (a random subset of the data comprises a pass, and an epoch is just an arbitrary number of these passes such that I can save the model frequently).

from keras.

superhans commented on May 3, 2024

Thanks for your reply. Actually, I'm having a hard time even getting a "toy program" to work. Maybe I've done something wrong. Both data and labels are nxd and nxc numpy arrays (where d is dimension of data, and c is number of classes), right ?

I wasn't able to get this code working correctly on the 'iris data set'. get_data() is a function that reads from file.

from keras.models import Sequential                                        
from keras.layers.core import Dense, Dropout, Activation                   
from keras.optimizers import SGD                                           
from keras.utils import np_utils, generic_utils                            
import numpy, scipy, scipy.io                                              
import sys                                                                 

model = Sequential()                                                       
model.add(Dense(4, 4, init='uniform'))                                     
model.add(Activation('tanh'))                                              
model.add(Dense(4, 3, init='uniform'))                                     
model.add(Activation('softmax'))                                           
sgd = SGD(lr=0.01, decay=1e-6, momentum=0.1, nesterov=True)                
model.compile(loss='mean_squared_error', optimizer=sgd)                    

train_data, train_labels = get_data('iris_training.dat');          
test_data, test_labels = get_data('iris_test.dat');                
valid_data, valid_labels = get_data('iris_validation.dat');        

nb_classes = 3;                                                    
t = test_labels;                                                   
train_labels = np_utils.to_categorical(train_labels, nb_classes)   
test_labels = np_utils.to_categorical(test_labels, nb_classes)     
valid_labels = np_utils.to_categorical(valid_labels, nb_classes)   

model.fit(train_data, train_labels, nb_epoch=5, batch_size = 10, show_accuracy = True)      
score = model.evaluate(valid_data, valid_labels)                      
print model.predict_classes(valid_data)  # records output of program

This code outputs the following (which shows it isn't learning anything at all)
Epoch 0
75/75 [==============================] - 0s - loss: 0.2223 - acc.: 0.2636
Epoch 1
75/75 [==============================] - 0s - loss: 0.2222 - acc.: 0.4439
Epoch 2
75/75 [==============================] - 0s - loss: 0.2222 - acc.: 0.4167
Epoch 3
75/75 [==============================] - 0s - loss: 0.2222 - acc.: 0.4030
Epoch 4
75/75 [==============================] - 0s - loss: 0.2221 - acc.: 0.4030
37/37 [==============================] - 0s - loss: 0.2228
37/37 [==============================] - 0s
[1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1]

from keras.

fchollet commented on May 3, 2024

A toy example should be at least a properly formulated ML problem, otherwise the point is lost. You should look at MNIST, it's a good toy example. https://www.kaggle.com/users/123235/fchollet/digit-recognizer/simple-deep-mlp-with-keras

Here's a simpler version of your code, and its output. You can see it is in fact learning the training data, but with only a hundred or so samples, it starts overfitting from the start.

from keras.models import Sequential                                        
from keras.layers.core import Dense, Activation                                                              
from keras.utils import np_utils
from sklearn import datasets

iris = datasets.load_iris()    
print iris.data.shape
print iris.target.shape                                                       

model = Sequential()                                                       
model.add(Dense(4, 3, init='uniform'))                                   
model.add(Activation('softmax'))                                           

model.compile(loss='mean_squared_error', optimizer='rmsprop')                    

labels = np_utils.to_categorical(iris.target)                                              
model.fit(iris.data, labels, nb_epoch=5, batch_size=1, show_accuracy=True, validation_split=0.3)

Train on 105 samples, validate on 45 samples
Epoch 0
105/105 [==============================] - 0s - loss: 0.2116 - acc.: 0.3714 - val. loss: 0.3828 - val. acc.: 0.0000
Epoch 1
105/105 [==============================] - 0s - loss: 0.1659 - acc.: 0.5048 - val. loss: 0.4688 - val. acc.: 0.0000
Epoch 2
105/105 [==============================] - 0s - loss: 0.1428 - acc.: 0.7905 - val. loss: 0.5031 - val. acc.: 0.0000
Epoch 3
105/105 [==============================] - 0s - loss: 0.1258 - acc.: 0.9524 - val. loss: 0.5391 - val. acc.: 0.0000
Epoch 4
105/105 [==============================] - 0s - loss: 0.1113 - acc.: 0.9524 - val. loss: 0.5564 - val. acc.: 0.0000

from keras.

nagadomi commented on May 3, 2024

The iris data sorted by the label. In this case, validation_split in model.fit does not work correctly.
We should shuffle data before training.

from keras.models import Sequential                                        
from keras.layers.core import Dense, Activation                                                              
from keras.utils import np_utils
from sklearn import datasets
import numpy as np

iris = datasets.load_iris()
shuffle = np.arange(len(iris.data))
np.random.shuffle(shuffle)
iris.data = iris.data[shuffle]
iris.target = iris.target[shuffle]

print iris.data.shape
print iris.target.shape                                                       

model = Sequential()                                                       
model.add(Dense(4, 3, init='uniform'))                                   
model.add(Activation('softmax'))                                           

model.compile(loss='mean_squared_error', optimizer='rmsprop')                    

labels = np_utils.to_categorical(iris.target)                                              
model.fit(iris.data, labels, nb_epoch=5, batch_size=1, show_accuracy=True, validation_split=0.3)

(150, 4)
(150,)
Train on 105 samples, validate on 45 samples
Epoch 0
105/105 [==============================] - 0s - loss: 0.2135 - acc.: 0.3524 - val. loss: 0.2137 - val. acc.: 0.2667
Epoch 1
105/105 [==============================] - 0s - loss: 0.2004 - acc.: 0.4190 - val. loss: 0.2051 - val. acc.: 0.5778
Epoch 2
105/105 [==============================] - 0s - loss: 0.1891 - acc.: 0.6952 - val. loss: 0.1956 - val. acc.: 0.6000
Epoch 3
105/105 [==============================] - 0s - loss: 0.1787 - acc.: 0.6952 - val. loss: 0.1842 - val. acc.: 0.6000
Epoch 4
105/105 [==============================] - 0s - loss: 0.1686 - acc.: 0.6952 - val. loss: 0.1757 - val. acc.: 0.6000

from keras.

superhans commented on May 3, 2024

I have a couple of other questions :

The model.fit() can be used to update models, right ? That's what I've understood it to do. It doesn't train from scratch each time it is called.
The batch_size seems to be a really crucial parameter. Is there some rule of thumb for selecting the batch size ?

from keras.

superhans commented on May 3, 2024

Got it. Thanks, everyone !!

from keras.

hadi-ds commented on May 3, 2024

Hi there, I have a related issue regarding consecutive call of .fit method on batches of the data. In the following, e.g.:

for e in range(nb_epoch): print("epoch %d" % e) for X_train, Y_train in ImageNet(): # these are chunks of ~10k pictures model.fit(X_batch, Y_batch, batch_size=32, nb_epoch=1)

It seems like each time fit in invoked, the model is fit to the given batch of data, but the next fit will reset the model and starts fitting to the new batch (instead of starting at what weights were at the end of previous round).

The reason I am saying so is that, is my case, I have a large data set I an reading it 10K lines at a time, and use that to fit the model. I see that the loss function is constantly decreased at fit is done, but then when the next fit starts on the next batch, the loss jumps back high and starts reducing as before.
Therefore, what I do instead is to use as much as data I can and run several epoch through that.

Can someone comment if this is expected behaviour every time fit is applied?

from keras.

mistborn17 commented on May 3, 2024

Will these images get loaded in main memory or GPU memory?

from keras.

was84san commented on May 3, 2024

@fchollet , if I want to use your way to do augmentation for large dataset using model.train_on_batch, How I can feed the validation data to the network?

from keras.

Working with large datasets like Imagenet about keras HOT 17 CLOSED

Comments (17)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs