GithubHelp home page GithubHelp logo

deeplearningframeworks's People

Contributors

banibrata-de avatar botev avatar bwasti avatar crissman avatar danielleodean avatar denizyuret avatar kashif avatar marktab avatar miguelgfierro avatar mitmul avatar msalvaris avatar paulshealy1 avatar thomasdelteil avatar vapaunic avatar yangqing avatar yytdfc avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

deeplearningframeworks's Issues

Adding neon

Interesting to see how this does also and they have very clear CNN and LSTM examples to reproduce

Tensorflow CudnnGRU

I'm trying to replace the basic GRU Cell I currently have:

cell = tf.contrib.rnn.GRUCell(NUMHIDDEN)
outputs, states = tf.contrib.rnn.static_rnn(cell, word_list, dtype=tf.float32)

With the CuDNN version:

cudnn_cell = tf.contrib.cudnn_rnn.CudnnGRU(num_layers=1, 
                                           num_units=NUMHIDDEN, 
                                           input_size=EMBEDSIZE)    # Set params
params_size_t = cudnn_cell.params_size()
params = tf.Variable(tf.ones([params_size_t]), validate_shape=False)   
input_h = tf.Variable(tf.ones([1, BATCHSIZE, NUMHIDDEN]))

outputs, states = cudnn_cell(is_training=True,
                             input_data=word_list,
                             input_h=input_h,
                             params=params)

However, when I do this my model starts to predict randomly. My accuracy goes from 0.86 to 0.5

what change for tensor flow?

I remember tensor flow in vggstyle run 300s for traintime, and now change to 173s, is the config or version change?

MXNet MultiGPU

@ThomasDelteil I have been trying to re-run the mxnet example on V100s, however still end up with the same error as on the P100s:

MXNetError: [11:35:49] /home/travis/build/dmlc/mxnet-distro/mxnet-build/3rdparty/mshadow/mshadow/./stream_gpu-inl.h:62: Check failed: e == cudaSuccess CUDA: an illegal memory access was encountered

MXNet:  1.3.0
GPU:  ['Tesla V100-PCIE-16GB', 'Tesla V100-PCIE-16GB', 'Tesla V100-PCIE-16GB', 'Tesla V100-PCIE-16GB']
CUDA Version 9.0.176
CuDNN Version  7.0.5

Also do you know if any further updates on MXNet have reduced the need for the boiler-plate code e.g. "Hot fixing DataLoader for multi-processing and RecordFileDataset"? Also, perhaps to avoid using tfrecords and just the raw images as with the other frameworks? It would be cool to match the conciseness of other frameworks (e.g PyTorch)

Knet Inference

@kirnap @denizyuret Would it be possible at some point to directly compare the inference speed of Knet on a pre-trained resnet50 model similar to these notebooks?

I'm not sure if there is a converter for say caffe models to knet format?

The CNN and RNN training times are very impressive!

TensorFlow dropout code is wrong

Current code also uses dropout for testing.
Right code is like below.
I don't know that dropout should be used for train accuracies.

High-level TF Example

import numpy as np
import os
import sys
import tensorflow as tf
from common.params import *
from common.utils import *
print("OS: ", sys.platform)
print("Python: ", sys.version)
print("Numpy: ", np.__version__)
print("Tensorflow: ", tf.__version__)
OS:  linux
Python:  3.6.0 (default, May  9 2017, 15:45:21) 
[GCC 5.4.0 20160609]
Numpy:  1.13.1
Tensorflow:  1.3.0
def create_symbol(training):
    conv1 = tf.layers.conv2d(X, filters=50, kernel_size=(3, 3), padding='same')
    relu1 = tf.nn.relu(conv1)
    conv2 = tf.layers.conv2d(relu1, filters=50, kernel_size=(3, 3), padding='same')
    relu2 = tf.nn.relu(conv2)
    pool1 = tf.layers.max_pooling2d(relu2, pool_size=(2, 2), strides=(2, 2), padding='valid')
    drop1 = tf.layers.dropout(pool1, 0.25, training=training)
    
    conv3 = tf.layers.conv2d(drop1, filters=100, kernel_size=(3, 3), padding='same')
    relu3 = tf.nn.relu(conv3)
    conv4 = tf.layers.conv2d(relu3, filters=100, kernel_size=(3, 3), padding='same')
    relu4 = tf.nn.relu(conv4)
    pool2 = tf.layers.max_pooling2d(relu4, pool_size=(2, 2), strides=(2, 2), padding='valid')
    drop2 = tf.layers.dropout(pool2, 0.25, training=training)
    
    flatten = tf.reshape(drop2, shape=[-1, 100*8*8])
    fc1 = tf.layers.dense(flatten, 512, activation=tf.nn.relu)
    drop4 = tf.layers.dropout(fc1, 0.5, training=training)
    logits = tf.layers.dense(drop4, N_CLASSES, name='output')
    return logits
def init_model(m):
    # Single-class labels, don't need dense one-hot
    # Expects unscaled logits, not output of tf.nn.softmax
    xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=m, labels=y)
    loss = tf.reduce_mean(xentropy)
    optimizer = tf.train.MomentumOptimizer(learning_rate=LR, momentum=MOMENTUM)
    training_op = optimizer.minimize(loss)
    return training_op
%%time
# Data into format for library
#x_train, x_test, y_train, y_test = mnist_for_library(channel_first=False)
x_train, x_test, y_train, y_test = cifar_for_library(channel_first=False)
print(x_train.shape, x_test.shape, y_train.shape, y_test.shape)
print(x_train.dtype, x_test.dtype, y_train.dtype, y_test.dtype)
Preparing train set...
Preparing test set...
Done.
(50000, 32, 32, 3) (10000, 32, 32, 3) (50000,) (10000,)
float32 float32 int32 int32
CPU times: user 544 ms, sys: 224 ms, total: 768 ms
Wall time: 766 ms
%%time
# Place-holders
X = tf.placeholder(tf.float32, shape=[None, 32, 32, 3])
y = tf.placeholder(tf.int32, shape=[None])
training = tf.placeholder(tf.bool)
# Initialise model
sym = create_symbol(training)
CPU times: user 76 ms, sys: 4 ms, total: 80 ms
Wall time: 78.4 ms
%%time
model = init_model(sym)
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)
# Accuracy logging
correct = tf.nn.in_top_k(sym, y, 1)
accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))
CPU times: user 360 ms, sys: 632 ms, total: 992 ms
Wall time: 1.04 s
%%time
for j in range(EPOCHS):
    for data, label in yield_mb(x_train, y_train, BATCHSIZE, shuffle=True):
        sess.run(model, feed_dict={X: data, y: label, training: True})
    # Log
    acc_train = sess.run(accuracy, feed_dict={X: data, y: label, training: True})
    print(j, "Train accuracy:", acc_train)
0 Train accuracy: 0.546875
1 Train accuracy: 0.484375
2 Train accuracy: 0.671875
3 Train accuracy: 0.65625
4 Train accuracy: 0.609375
5 Train accuracy: 0.765625
6 Train accuracy: 0.765625
7 Train accuracy: 0.796875
8 Train accuracy: 0.90625
9 Train accuracy: 0.734375
CPU times: user 1min 21s, sys: 11.9 s, total: 1min 33s
Wall time: 1min 20s
%%time
n_samples = (y_test.shape[0]//BATCHSIZE)*BATCHSIZE
y_guess = np.zeros(n_samples, dtype=np.int)
y_truth = y_test[:n_samples]
c = 0
for data, label in yield_mb(x_test, y_test, BATCHSIZE):
    pred = tf.argmax(sym,1)
    output = sess.run(pred, feed_dict={X: data, training: False})
    y_guess[c*BATCHSIZE:(c+1)*BATCHSIZE] = output
    c += 1
CPU times: user 3.83 s, sys: 152 ms, total: 3.98 s
Wall time: 3.58 s
print("Accuracy: ", sum(y_guess == y_truth)/len(y_guess))
Accuracy:  0.770532852564

[what to do next] What problem you would like to see next?

We compared some of the top deep learning frameworks using CNNs and CIFAR. It would be great if the community could vote on what problem you would like to see next. I add some options:

  • RNNs
  • LSTMs on time series data
  • LSTMs on text
  • CNNs on ImageNet/bigger dataset
  • CNNs on text
  • Reinforcement Learning
  • GANs/ VAC
  • Neural art
  • STOP DOING THIS, AI IS TOO DANGEROUS!!!
  • Other??

cc @botev, @souptc, @YusukeSuzuki, @Yangqing, @ppwwyyxx, @miguelvr, @msalvaris, @ilkarman, @piiswrong, @soumith, @n17s

How about adding Gluon and Dynet?

Recently the mxnet's new top framework gluon is hot, it can support hybrid imperative and symbolic network. So can you add the gluon testing and the dynet?
Finally, thank you for your work to give us a clear idea.

IMDB in mxnet only unroll 1 step but other frameworks unroll 150 step.

That's why mxnet is very fast in your benchmark. my code is here:

def create_symbol():
    # https://mxnet.incubator.apache.org/api/python/rnn.html
    data = mx.symbol.Variable('data')
    embedded_step = mx.symbol.Embedding(data=data, input_dim=MAXFEATURES, output_dim=EMBEDSIZE)
    gru_cell = mx.rnn.GRUCell(num_hidden=NUMHIDDEN)
    # Initialize its hidden and memory states.
    # 'begin_state' method takes an initialization function, and uses 'zeros' by default.
    begin_state = gru_cell.begin_state()
    # Call the cell to get the output of one time step for a batch.
    output, states = gru_cell.unroll(length=MAXLEN, inputs=embedded_step, merge_outputs=False)
    # output, states = gru_cell(embedded_step, begin_state) ***WRONG***
    # FC out
    fc1 = mx.symbol.FullyConnected(data=output[-1], num_hidden=2) 
    # Label
    input_y = mx.symbol.Variable('softmax_label')  
    m = mx.symbol.SoftmaxOutput(data=fc1, label=input_y, name="softmax")
    return m

Chainer MultiGPU

@mitmul Thank you for highlighting my typo in your PR request; I wanted to highlight two further issues I am facing here

  1. Toggling between single and muli-gpu (4x) improves time-taken from 47min15s to 14min43s; however for some reason the AUC also drops from 0.8028 (which matches all other examples) to 0.56. This does not happen for example with PyTorch. There is a also a diff in validation/main/loss which ends at 0.23 for multi-gpu but 0.15 for single-gpu

  2. I wondered if there was an update to the pre-trained densenet model so that I no longer have to override CaffeFunction with class to reduce the memory fooptrint? The custom call_ lets me use a batch of 56 over 32, however I am still not able to get the low-memory footprint as with other frameworks that lets me run a batch of 64

Chainer:  4.1.0
CuPy:  4.1.0
Numpy:  1.14.1
GPU:  ['Tesla V100-PCIE-16GB', 'Tesla V100-PCIE-16GB', 'Tesla V100-PCIE-16GB', 'Tesla V100-PCIE-16GB']
CUDA Version 9.0.176
CuDNN Version  7.0.5

Theano test performance

I've just noticed your remark about the flags for test and train on Pytorch, Tensorflow. In fact, there is a similar thing for Lasagne which I totally forgot about. Could you change block 9 to:

%%time
# Compile functions
train_func = theano.function([X.input_var, y], [loss, accuracy], updates=updates)
pred = L.get_output(net, deterministic=True)
pred_func = theano.function([X.input_var], T.argmax(pred, axis=1))

Fix Keras to allow use with 2.1.6

Hey @ilkarman there is mention of an incompatibility with keras >2.1.4, can this be fixed? I'd like to try with the keras-mxnet backend to see if there is any difference.

Thanks!

multi-gpu CUDA 9

Let's do the multi-gpu notebooks using CUDA 9 + CuDNN 7 and updated frameworks (e.g. TF 1.6 > 1.4) instead of CUDA 8.

The performance of Caffe

Nice repo, thanks a lot for the authors' work. Could we also add the results from caffe for comparison ?

Could you include Caffe (not Caffe2)

I am sorry for asking this, because I know how Caffe is a pain to work with. However you could probably use MMdnn to quickly create the networks.

Caffe2 inference timings (CPU/GPU)

Caffe2 appears to be optimised for CPU inference using Intel's MKL library. In terms of GPU training-times it's one of the fastest frameworks. However, for inference I can't get a lot of speed out of it (both GPU and CPU).

You can see here that timings for feature extraction on a resnet-50 model are:

DL Library Images/s GPU Images/s CPU
Tensorflow 155 11
MXNet(w/mkl) 129 25
MXNet 130 8
PyTorch 130 6
CNTK 117 8
Chainer 107 3
Keras(TF) 98 5
ONNX_Caffe2 75 6
Caffe2 71 6
Keras(CNTK) 46 4

So I have tried it also with a different model (PyTorch converted to ONNX) but it's still not as fast as I would expect it to be. This is the same environment that had blazingly fast CNN training times so I wonder if I'm just running inference in a non-optimal way?

It's an amazing framework and the results are very surprising for me.

Theano - theano.config.floatX

So I've just noticed here that the default value for theano.config.floatX is float64. Thus unless you have edited your .theanorc it might have been running on float64 rather than float32. If you don't want to edit the environmental file you can set the value similar to the cuDNN flags by adding theano.config.floatX = "float32". Also for good practice I suggest to add theano.config.warn_float64 = "warn".

Test Keras_CNTK with channel_first format

Current Keras_CNTK test case is running with channel_last format, which will led to perf degradation in cntk, that's why you see the warning:

/anaconda/envs/py35/lib/python3.5/site-packages/cntk/core.py:82: RuntimeWarning: data is not C contiguous; rearrange your data/computation to avoid costly data conversions
RuntimeWarning)

Could we test with 'channel_first' format in keras? I manually run it on my box, Keras_CNTK is only around 20% slower than CNTK native implementation.

Extension to multi-GPU, multi-node

@ilkarman Love the benchmarks.

It would be interesting to see which of the current platforms is able to scale the best to take advantage of cloud resources. Have you considered expanding to multi-GPU and to multi-node benchmarks?

Chainer Multi-GPU

I'm having some issues with the Chainer Multi-GPU examples and I was hoping someone could give me some guidance to fix it. @Crissman if you get a chance I would really appreciate your feedback.

  1. The single-GPU example works (and gets to AUC of 0.81 which matches the rest), however there were three modifications I had to make ( I am using a trained model from shicai ):

First I had to truncate the batch-norm param (however in the prototxt they are already 1e-5, so I'm not sure why they become less when imported):

def truncate_bn(sym):
    # Need to truncate batchnorm - eps
    for layer in list(sym._children):
        if "bn" in layer:
            if sym.__dict__[layer].eps < 1e-5:
                sym.__dict__[layer].eps = 1e-5

Second, I had to update to 4.0.0b3 to handle the average pooling layer in the pretrained model

Third, I modified the chainer.links.caffe.CaffeFunction as noted here to save only the layers that are needed for the final computation, not all of them:

class CaffeFunctionDenseNet121(CaffeFunction):
        
    # Standard function saves all variables so cannot use big batch
    # This lets me run BATCH of 56 over 32 - still can't get to 64
    # https://github.com/chainer/chainer/blob/master/chainer/links/caffe/caffe_function.py#L176
    def __call__(self, inputs, **kwargs):
        variables = dict(inputs)
        # Pools not to save
        # These layers are not concatenated
        _NOSAVE = set(['pool5', 'concat_5_16', 'concat_4_24', 'concat_3_12', 'concat_2_6'])
        # Forward through all layers
        for func_name, bottom, top in self.layers:

            func = self.forwards[func_name]
            # Concat ops require some previous layers that are saved
            if "concat" in func_name:
                input_vars = tuple([variables[bottom[0]], variables['data']])
            else:
                input_vars = tuple([variables['data']])
            output_vars = func(*input_vars)
            # Delete layers for concat once used
            if "concat" in func_name:
                del variables[bottom[0]]
            if not isinstance(output_vars, collections.Iterable):
                output_vars = output_vars,
            # Save to dict
            variables['data'] = output_vars[0]
            top = top[0]
            # Save for concat
            if ("pool" in top) and (top not in _NOSAVE):
                variables[top] = output_vars[0]
            elif ("concat" in top) and (top not in _NOSAVE):
                variables[top] = output_vars[0]
                
        return tuple([variables['data']])

With these three changes I am able to train DenseNet-121 with a batchsize of 56 before killing my GPU VRAM, without the modification to the .call() method I can only run 32 -> and this speeds up the model by around 30 minutes (over 5 epochs). I believe the reason it is still slower than the rest is because the batchsize is still too small, but I'm not sure what else I can do. Comparing this to PyTorch -> that runs at half the memory-usage this currently does.

It seems that CaffeFunction is not the preferred way of loading and fine-tuning pretrained models? I think it would be possible to copy the weights into a model that has been defined already like this, however the names and structure are different so it would be almost easier to attempt to write a new implementation to match the names from the pre-trained Caffe model.

Transfer learning appears quite popular so it would be great if the above was possible. I'm not sure if I'm doing it wrong

  1. For some reason when I run with multi-gpu it takes longer to complete 5 epochs. I checked that all my GPUs are used and raised an issue. I'm not sure if this is specific to just the CaffeFunction?

  2. Also the multi-GPU methods have a much lower AUC (0.7 vs 0.8 for the other frameworks). I adopt a linear LR scaling rule and get 0.8 running Chainer single-gpu and all other frameworks single and multi-gpu, so not sure what happens.

I would appreciate any help finalising this notebook since aside from the three points above, I really do like the interface.

Quick performance note

Hi - thanks for setting up the benchmark. I want to quickly put a performance note so folks are not surprised when seeing the perf numbers.

First, if one measure speed difference for basic ones like CNNs, something is wrong in how one use the frameworks :)

In this case, the major performance difference comes from I/O, not the framework itself. If you look at the TensorFlow and Caffe2 examples, the data is provided with a feed approach - this is usually bad for performance, and instead one should use a db or input iterator to do so. Under the hood, this makes prefetching and other optimizations possible and is particularly important for performance.

A few suggestions

  1. The last line can be changed to:
    'print("Accuracy: ", sum(y_guess == y_test)/float(len(y_guess)))'
    as I've got 0 in Python 2.7
  2. Increace EPOCHES to 20 or more to get a more fair accuracy because of the differences in platform(such as hyper-parameters) and weights initialization and randomness with SGD.
  3. Is the data need to zero mean normalization first to get a more resaonable result?

MXNet high api

Redo MXNet high_api example using .fit() to match Tensorflow and CNTK.

tf.nn.dynamic_rnn

Should really use tf.nn.dynamic_rnn() not tf.contrib.rnn.static_rnn()

Note difference in input shape:

inputs: The RNN inputs. If time_major == False (default), this must be a Tensor of shape: [batch_size, max_time, ...], or a nested tuple of such elements. If time_major == True, this must be a Tensor of shape: [max_time, batch_size, ...], or a nested tuple of such elements. This may also be a (possibly nested) tuple of Tensors satisfying this property. The first two dimensions must match across all the inputs, but otherwise the ranks and other shape components may differ. In this case, input to cell at each time-step will replicate the structure of these tuples, except for the time dimension (from which the time is taken). The input to cell at each time step will be a Tensor or (possibly nested) tuple of Tensors each with dimensions [batch_size, ...].

Tensorflow MultiGPU

The way that TF does checkpointing with:

tf.estimator.train_and_evaluate(nn, train_spec, eval_spec)

Seems to result in a lot of IO lag where it saves the params to disk after every epoch, runs validation, then loads the model again and repeats.

Is there an easier way to just keep this in memory (like other frameworks, e.g. PyTorch) and just save to disk once at the end?

For example running on pure numpy array:

nn.train(tf.estimator.inputs.numpy_input_fn(
    fake_X,
    fake_y,
    shuffle=False,
    num_epochs=EPOCHS,
    batch_size=BATCHSIZE))

Takes 14min30s with TF and 16min52s with Keras. However, the train_and_evaluate loop takes 21min49s sec with TF and 20min16s with Keras.

Benchmark with suboptimal performance

import numpy as np
import tensorflow as tf
from tensorpack import *
from common.params import *
from common.utils import *


def create_symbol(X, training, n_classes=N_CLASSES):
    # Tensorflow requires a flag for training in dropout
    conv1 = tf.layers.conv2d(X, activation=tf.nn.relu, filters=50, kernel_size=(3, 3),
                             padding='same', data_format='channels_first')
    conv2 = tf.layers.conv2d(conv1, filters=50, kernel_size=(3, 3),
                             padding='same', data_format='channels_first')
    pool1 = tf.layers.max_pooling2d(conv2, pool_size=(2, 2), strides=(2, 2),
                                    padding='valid', data_format='channels_first')
    relu2 = tf.nn.relu(pool1)
    drop1 = tf.layers.dropout(relu2, 0.25, training=training)

    conv3 = tf.layers.conv2d(drop1, activation=tf.nn.relu, filters=100, kernel_size=(3, 3),
                             padding='same', data_format='channels_first')
    conv4 = tf.layers.conv2d(conv3, filters=100, kernel_size=(3, 3),
                             padding='same', data_format='channels_first')
    pool2 = tf.layers.max_pooling2d(conv4, pool_size=(2, 2), strides=(2, 2),
                                    padding='valid', data_format='channels_first')
    relu4 = tf.nn.relu(pool2)
    drop2 = tf.layers.dropout(relu4, 0.25, training=training)

    flatten = tf.reshape(drop2, shape=[-1, 100*8*8])
    fc1 = tf.layers.dense(flatten, 512, activation=tf.nn.relu)
    drop3 = tf.layers.dropout(fc1, 0.5, training=training)
    logits = tf.layers.dense(drop3, n_classes, name='output')
    return logits


def tower_func(x, y):
    logits = create_symbol(x, training=get_current_tower_context().is_training)
    xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits, labels=y)
    loss = tf.reduce_mean(xentropy)

    # Accuracy logging
    correct = tf.nn.in_top_k(logits, y, 1)
    accuracy = tf.reduce_mean(tf.cast(correct, tf.float32), name='accuracy')
    tf.summary.scalar('train_accuracy', accuracy)
    return loss

def get_optimizer(learning_rate=LR, momentum=MOMENTUM):
    return tf.train.MomentumOptimizer(learning_rate, momentum)


x_train, x_test, y_train, y_test = cifar_for_library(channel_first=True)
print(x_train.shape, x_test.shape, y_train.shape, y_test.shape)
print(x_train.dtype, x_test.dtype, y_train.dtype, y_test.dtype)

def generator(train=True):
    if train:
        while True:
            yield from yield_mb(x_train, y_train, BATCHSIZE, shuffle=True)
    else:
        while True:
            yield from yield_mb(x_test, y_test, BATCHSIZE)

df_train = PrefetchDataZMQ(DataFromGenerator(generator(True)), 1)
df_test = FixedSizeData(DataFromGenerator(generator(False)), len(x_test) // BATCHSIZE)

trainer = SimpleTrainer()
trainer.setup_graph(
    inputs_desc=[InputDesc(tf.float32, [None, 3, 32, 32], 'image'),
                 InputDesc(tf.int32, [None], 'label')],
    input=QueueInput(df_train),
    get_cost_fn=tower_func,
    get_opt_fn=get_optimizer
)

trainer.train_with_defaults(
    callbacks=[
        PeriodicCallback(
            InferenceRunner(df_test, [ScalarStats('accuracy')]),
            every_k_epochs=10)
    ],
    steps_per_epoch=len(x_train) // BATCHSIZE,
    max_epoch=EPOCHS
)

The above code based on tensorpack is equivalent to Tensorflow_CNN.ipynb but runs 20% faster than it on my machine (with cuda 9, TF1.5, cudnn7, GTX1080).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.