Comments (17)
Keras models absolutely do support batch training. The CIFAR10 example offers an example of this.
What's more, you can use the image preprocessing module (data augmentation and normalization) on batches as well. Here's a quick example:
datagen = ImageDataGenerator(
featurewise_center=True, # set input mean to 0 over the dataset
samplewise_center=False, # set each sample mean to 0
featurewise_std_normalization=True, # divide inputs by std of the dataset
samplewise_std_normalization=False, # divide each input by its std
zca_whitening=False, # apply ZCA whitening
rotation_range=20, # randomly rotate images in the range (degrees, 0 to 180)
width_shift_range=0.2, # randomly shift images horizontally (fraction of total width)
height_shift_range=0.2, # randomly shift images vertically (fraction of total height)
horizontal_flip=True, # randomly flip images
vertical_flip=False) # randomly flip images
datagen.fit(X_sample) # let's say X_sample is a small-ish but statistically representative sample of your data
# let's say you have an ImageNet generator that yields ~10k samples at a time.
for e in range(nb_epoch):
print("epoch %d" % e)
for X_train, Y_train in ImageNet(): # these are chunks of ~10k pictures
for X_batch, Y_batch in datagen.flow(X_train, Y_train, batch_size=32): # these are chunks of 32 samples
loss = model.train(X_batch, Y_batch)
# Alternatively, without data augmentation / normalization:
for e in range(nb_epoch):
print("epoch %d" % e)
for X_train, Y_train in ImageNet(): # these are chunks of ~10k pictures
model.fit(X_batch, Y_batch, batch_size=32, nb_epoch=1)
from keras.
If you have a huge dataset as an HDF5 file, you can use keras.utils.io_utils.HDF5Matrix
to load the dataset. It will only read one batch at a time from memory, but there's some limitations (e.g., you cannot read shuffled data from the file, only sequentially). A workaround would be to shuffle the data before you store it to disk (but you would still read the same batches after a full epoch).
Here is a short example of how to do this. This considers you have all of your samples in the same HDF5 file, and features and targets are in HDF5 datasets named 'features' and 'targets':
def load_data(datapath, train_start, train_end, n_training_examples, n_test_examples)
X_train = HDF5Matrix(datapath, 'features', train_start, train_start+n_training_examples, normalizer=normalize_data)
y_train = HDF5Matrix(datapath, 'targets', train_start, train_start+n_training_examples)
X_test = HDF5Matrix(datapath, 'features', test_start, test_start+n_test_examples, normalizer=normalize_data)
y_test = HDF5Matrix(datapath, 'targets', test_start, test_start+n_test_examples)
return X_train, y_train, X_test, y_test
The returned variables here are not real Numpy arrays, but they implement the same interface so everything works transparently in keras (as long as you don't try to read shuffled indices).
from keras.
No, no, model.fit does NOT reset the weights of your model. It starts from
the previous state of the model. You can definitely call model.fit multiple
times.
The difference between model.fit and model.train_on_batch is mainly that
model.fit will break up your data into small batches whereas
model.train_on_batch will use the data it gets as a single batch, running a
single gradient update.
from keras.
The model.fit() can be used to update models, right ? That's what I've understood it to do. It doesn't train from scratch each time it is called.
Yes. You're starting from the previous model state (ie. the weights are not re-initialized).
The batch_size seems to be a really crucial parameter. Is there some rule of thumb for selecting the batch size ?
In general smaller batches will give better results, however larger batches will make training faster. There is a compromise to strike, somewhere in between. I generally use 16 or 32, unless there is very little data in which case I go full stochastic (batch_size = 1).
from keras.
You should not call model.fit
, as it does exactly what you said it does: reset and train a model from scratch. If you're training on successive batches of data, use model.train_on_batch
instead. You can also write a wrapper for your data using the Python generator interface and then use model.fit_generator
, which behaves like model.fit
but will use whatever generator you pass to it instead of a Numpy array.
from keras.
Oops, my bad. The behaviour @hadi-ds is seeing is then probably due to the model overfitting a bit to the different slices of the dataset. Definitely the best practice with larger than memory datasets is to use either model.fit_generator
with a "smart" generator or HDF5Matrix
.
from keras.
I learned the opposite way: small batch sizes will approach a minimum faster, but using large batches will better approximate the distribution of the training data, thus giving a better result.
Depends on what you mean by fast: stochastic learning gets to a minimum in less epochs/samples seen, but you're doing many more gradient updates, and the average time per sample increases dramatically. So the computing time will be longer (unless you are using very large batch sizes, in which case you're doing redundant computations). Just try it: batch_size = 1 vs. batch_size = 32, on, say, the CIFAR10 example. 32 will be much faster (computationally).
LeCun has long argued that the result obtained with stochastic learning is almost always better, thanks to the random noise it introduces. See: http://yann.lecun.com/exdb/publis/pdf/lecun-98b.pdf
Stochastic learning also often results in better solutions because of the noise in the updates. Nonlinear networks usually have multiple local minima of differing depths. The goal of training is to locate one of these minima. Batch learning will discover the minimum of whatever basin the weights are initially placed. In stochastic learning, the noise present in the updates can result in the weights jumping into the basin of another, possibly deeper, local minimum. This has been demonstrated in certain simplified cases.
from keras.
In general smaller batches will give better results, however larger batches will make training faster. There is a compromise to strike, somewhere in between. I generally use 16 or 32, unless there is very little data in which case I go full stochastic (batch_size = 1).
There's much debate there. I learned the opposite way: small batch sizes will approach a minimum faster, but using large batches will better approximate the distribution of the training data, thus giving a better result.
I go full stochastic (batch_size = 1).
This is online. Full stochastic would be 1 random sample, such that after training for nb_samples, there is no guarantee that all of the training data would appear once; some wouldn't have been trained on yet, some would have been trained on twice. Yeah?
from keras.
Depends on what you mean by fast
Fair enough - I was referring to computational time.
Batch learning will discover the minimum of whatever basin the weights are initially placed.
With pre-training (although, that just raises the point again: batches or not during pre-training) this may not be a bad thing. Random noise is random, so there's always the possibility of jumping out of a global optimum, but such is life of those in the machine learning field.
I think we're arguing different points here though: LeCun is referring to Batch training (batch_size==nb_examples), not to mini-batch learning (batch_size < nb_examples). I think mini-batch is the best of both worlds, and I usually train with stochastic mini-batches (a random subset of the data comprises a pass, and an epoch is just an arbitrary number of these passes such that I can save the model frequently).
from keras.
Thanks for your reply. Actually, I'm having a hard time even getting a "toy program" to work. Maybe I've done something wrong. Both data and labels are nxd and nxc numpy arrays (where d is dimension of data, and c is number of classes), right ?
I wasn't able to get this code working correctly on the 'iris data set'. get_data() is a function that reads from file.
from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Activation
from keras.optimizers import SGD
from keras.utils import np_utils, generic_utils
import numpy, scipy, scipy.io
import sys
model = Sequential()
model.add(Dense(4, 4, init='uniform'))
model.add(Activation('tanh'))
model.add(Dense(4, 3, init='uniform'))
model.add(Activation('softmax'))
sgd = SGD(lr=0.01, decay=1e-6, momentum=0.1, nesterov=True)
model.compile(loss='mean_squared_error', optimizer=sgd)
train_data, train_labels = get_data('iris_training.dat');
test_data, test_labels = get_data('iris_test.dat');
valid_data, valid_labels = get_data('iris_validation.dat');
nb_classes = 3;
t = test_labels;
train_labels = np_utils.to_categorical(train_labels, nb_classes)
test_labels = np_utils.to_categorical(test_labels, nb_classes)
valid_labels = np_utils.to_categorical(valid_labels, nb_classes)
model.fit(train_data, train_labels, nb_epoch=5, batch_size = 10, show_accuracy = True)
score = model.evaluate(valid_data, valid_labels)
print model.predict_classes(valid_data) # records output of program
This code outputs the following (which shows it isn't learning anything at all)
Epoch 0
75/75 [==============================] - 0s - loss: 0.2223 - acc.: 0.2636
Epoch 1
75/75 [==============================] - 0s - loss: 0.2222 - acc.: 0.4439
Epoch 2
75/75 [==============================] - 0s - loss: 0.2222 - acc.: 0.4167
Epoch 3
75/75 [==============================] - 0s - loss: 0.2222 - acc.: 0.4030
Epoch 4
75/75 [==============================] - 0s - loss: 0.2221 - acc.: 0.4030
37/37 [==============================] - 0s - loss: 0.2228
37/37 [==============================] - 0s
[1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1]
from keras.
A toy example should be at least a properly formulated ML problem, otherwise the point is lost. You should look at MNIST, it's a good toy example. https://www.kaggle.com/users/123235/fchollet/digit-recognizer/simple-deep-mlp-with-keras
Here's a simpler version of your code, and its output. You can see it is in fact learning the training data, but with only a hundred or so samples, it starts overfitting from the start.
from keras.models import Sequential
from keras.layers.core import Dense, Activation
from keras.utils import np_utils
from sklearn import datasets
iris = datasets.load_iris()
print iris.data.shape
print iris.target.shape
model = Sequential()
model.add(Dense(4, 3, init='uniform'))
model.add(Activation('softmax'))
model.compile(loss='mean_squared_error', optimizer='rmsprop')
labels = np_utils.to_categorical(iris.target)
model.fit(iris.data, labels, nb_epoch=5, batch_size=1, show_accuracy=True, validation_split=0.3)
Train on 105 samples, validate on 45 samples
Epoch 0
105/105 [==============================] - 0s - loss: 0.2116 - acc.: 0.3714 - val. loss: 0.3828 - val. acc.: 0.0000
Epoch 1
105/105 [==============================] - 0s - loss: 0.1659 - acc.: 0.5048 - val. loss: 0.4688 - val. acc.: 0.0000
Epoch 2
105/105 [==============================] - 0s - loss: 0.1428 - acc.: 0.7905 - val. loss: 0.5031 - val. acc.: 0.0000
Epoch 3
105/105 [==============================] - 0s - loss: 0.1258 - acc.: 0.9524 - val. loss: 0.5391 - val. acc.: 0.0000
Epoch 4
105/105 [==============================] - 0s - loss: 0.1113 - acc.: 0.9524 - val. loss: 0.5564 - val. acc.: 0.0000
from keras.
The iris data sorted by the label. In this case, validation_split in model.fit does not work correctly.
We should shuffle data before training.
from keras.models import Sequential
from keras.layers.core import Dense, Activation
from keras.utils import np_utils
from sklearn import datasets
import numpy as np
iris = datasets.load_iris()
shuffle = np.arange(len(iris.data))
np.random.shuffle(shuffle)
iris.data = iris.data[shuffle]
iris.target = iris.target[shuffle]
print iris.data.shape
print iris.target.shape
model = Sequential()
model.add(Dense(4, 3, init='uniform'))
model.add(Activation('softmax'))
model.compile(loss='mean_squared_error', optimizer='rmsprop')
labels = np_utils.to_categorical(iris.target)
model.fit(iris.data, labels, nb_epoch=5, batch_size=1, show_accuracy=True, validation_split=0.3)
(150, 4)
(150,)
Train on 105 samples, validate on 45 samples
Epoch 0
105/105 [==============================] - 0s - loss: 0.2135 - acc.: 0.3524 - val. loss: 0.2137 - val. acc.: 0.2667
Epoch 1
105/105 [==============================] - 0s - loss: 0.2004 - acc.: 0.4190 - val. loss: 0.2051 - val. acc.: 0.5778
Epoch 2
105/105 [==============================] - 0s - loss: 0.1891 - acc.: 0.6952 - val. loss: 0.1956 - val. acc.: 0.6000
Epoch 3
105/105 [==============================] - 0s - loss: 0.1787 - acc.: 0.6952 - val. loss: 0.1842 - val. acc.: 0.6000
Epoch 4
105/105 [==============================] - 0s - loss: 0.1686 - acc.: 0.6952 - val. loss: 0.1757 - val. acc.: 0.6000
from keras.
I have a couple of other questions :
- The model.fit() can be used to update models, right ? That's what I've understood it to do. It doesn't train from scratch each time it is called.
- The batch_size seems to be a really crucial parameter. Is there some rule of thumb for selecting the batch size ?
from keras.
Got it. Thanks, everyone !!
from keras.
Hi there, I have a related issue regarding consecutive call of .fit method on batches of the data. In the following, e.g.:
for e in range(nb_epoch): print("epoch %d" % e) for X_train, Y_train in ImageNet(): # these are chunks of ~10k pictures model.fit(X_batch, Y_batch, batch_size=32, nb_epoch=1)
It seems like each time fit in invoked, the model is fit to the given batch of data, but the next fit will reset the model and starts fitting to the new batch (instead of starting at what weights were at the end of previous round).
The reason I am saying so is that, is my case, I have a large data set I an reading it 10K lines at a time, and use that to fit the model. I see that the loss function is constantly decreased at fit is done, but then when the next fit starts on the next batch, the loss jumps back high and starts reducing as before.
Therefore, what I do instead is to use as much as data I can and run several epoch through that.
Can someone comment if this is expected behaviour every time fit is applied?
from keras.
Will these images get loaded in main memory or GPU memory?
from keras.
@fchollet , if I want to use your way to do augmentation for large dataset using model.train_on_batch, How I can feed the validation data to the network?
from keras.
Related Issues (20)
- Need help to understand the logic here HOT 1
- [BUG] Conflicting `loss_weights` implementation in Keras3 for single output case. HOT 6
- CategoryEncoding layer one hot indices cast to float in graph execution HOT 3
- Any method to get formulation of functions without seeking into source? HOT 2
- There seem some differences between the source code from PyPI and that tagged with v3.2.1 in GitHub. HOT 3
- Rescaling layer on input problems / ValueError: Layer node index out of bounds. inbound_layer = <InputLayer name=keras_tensorCLONE, built=True> HOT 1
- Conv3D crash when the data_format is 'channels_first' and using Tensorflow backend HOT 3
- Misspelled link. HOT 1
- imdb.load_data function returns a python list instead of ndarray object HOT 2
- Returning backend.set_learning_phase HOT 4
- Custom loss defined as a class instance vs function HOT 3
- Torch 2.3.0 (next ver) fails with AttributeError: 'Parameter' object has no attribute 'fget'
- Add support for jnp.linalg.slogdet HOT 2
- keras.layers.Layer.call method fails when building keras model with functional API HOT 1
- Getting Wrong output even though vgg16 model showing 95% val_accuracy HOT 3
- import keras error (V3.3.2) (kaggle Notebook) HOT 1
- Keras 3 with Pytorch backend ERROR - Layer 'lstm_cell' expected 3 variables, but received 0 variables during loading. Expected: ['kernel', 'recurrent_kernel', 'bias'] HOT 4
- The source code URL in the documentation leads to a non-existent page. HOT 1
- keras.ops.linalg.cholesky can't JIT HOT 1
- Model fails to train with Linux and Keras 3.3.2 HOT 9
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from keras.