galeone / dynamic-training-bench Goto Github PK

Simplify the training and tuning of Tensorflow models

License: Mozilla Public License 2.0

Python 100.00%

tensorflow training tensorboard neural-network convolutional-neural-networks dataset models

dynamic-training-bench's Introduction

Dynamic Training Bench: DyTB

Stop wasting your time rewriting the training, evaluation & visualization procedures for your ML model: let DyTB do the work for you!

DyTB is compatible with: Tensorflow 1.x & Python >= 3.5

Features

Dramatically easy to use
Object Oriented: models and inputs are interfaces to implement
End-to-end training of ML models
Fine tuning
Transfer learning
Easy model comparison
Metrics visualization
Easy statistics
Hyperparameters oriented: change hyperparameters to see how they affect the performance
Automatic checkpoint save of the best model with respect to a metric
Usable as a library or a CLI tool

Getting started: python library

TL;DR: pip install dytb + python-notebook with a complete example.

The standard workflow is extremely simple:

Define or pick a predefined model
Define or pick a predefined dataset
Train!

Define or pick a predefined Model

DyTB comes with some common ML model, like LeNet & VGG, if you want to test how these models perform when trained on different datasets and/or with different hyperparameters, just use it.

Instead, if you want to define your own model just implement one of the available interfaces, depending on ML model you want to implement. The available interfaces are:

Classifier
Autoencoder
Regressor
Detector

It's recommended, but not strictly required, to use the wrappers built around the Tensorflow methods to define the model: these wrappers creates log and visualizations for you. Wrappers are documented and intuitive: you can find it in the dytb/models/utils.py file.

DyTB provides different models that can be used alone or can be used as examples of correct implementations. Every model in the dytb/models/predefined/ folder is a valid example.

In general, the model definition is just the implementation of 2 methods:

get is which implementing the model itself
loss in which implementing the loss function

It's strictly required to return the parameters that the method documentation requires to, even if they're unused by your model.

E.g.: even if you never use a is_training_ boolean placeholder in your model definition, define it and return it anyway.

Define or pick a predefined Input

DyTB comes with some common ML benchmark, like Cifar10, Cifar100 & MNIST, you can use it to train and measure the performances of your model or you can define your own input source implementing the Input interface that you can find here:

dytb/inputs/interfaces.py

The interface implementation should follow these points:

Implement the __init__ method: this method must download the dataset and apply the desired transformations to its elements. There are some utility functions defined in the inputs/utils.py file that can be used. This method is executed as first operation when the dataset object is created, therefore is recommended to cache the results.
Implement the num_classes method: this method must return the number of classes of the dataset. If your dataset has no labels, just return 0.
Implement the num_examples(input_type) method: this method accepts an InputType enumeration, defined in inputs/utils.py. This enumeration has 3 possible values: InputType.train, InputType.validation, InputType.test. As obvious, the method must return the number of examples for every possible value of this enumeration.
Implement the inputs method. The inputs method is a general method that should return the real values of the dataset, related to the InputType passed, without any augmentation. The augmentations are defined at training time.

Note: inputs must return a Tensorflow queue of value, label pairs.

The better way to understand how to build the input source is to look at the examples in the dytb/inputs/predefined/ folder. A small and working example that can be worth looking is Cifar10: dytb/inputs/predefined/Cifar10.py.

Train

Train measuring predefined metrics it's extremely easy, let's see a complete example:

import pprint
import tensorflow as tf
from dytb.inputs.predefined import Cifar10
from dytb.train import train
from dytb.models.predefined.VGG import VGG

# Instantiate the model
vgg = VGG()

# Instantiate the CIFAR-10 input source
cifar10 = Cifar10.Cifar10()

# 1: Train VGG on Cifar10 for 50 epochs
# Place the train process on GPU:0
device = '/gpu:0'
with tf.device(device):
    info = train(
        model=vgg,
        dataset=cifar10,
        hyperparameters={
            "epochs": 50,
            "batch_size": 50,
            "regularizations": {
                "l2": 1e-5,
                "augmentation": {
                    "name": "FlipLR",
                    "fn": tf.image.random_flip_left_right,
                    # factor is the estimated amount of augmentation
                    # that "fn" introduces.
                    # In this case, "fn" doubles the training set size
                    # Thus, an epoch is now seen as the original training
                    # training set size * 2
                    "factor": 2,
                }
            },
            "gd": {
                "optimizer": tf.train.AdamOptimizer,
                "args": {
                    "learning_rate": 1e-3,
                    "beta1": 0.9,
                    "beta2": 0.99,
                    "epsilon": 1e-8
                }
            }
        })

Finish!

At the end of the training process info will contain some useful information, let's (pretty) print them:

pprint.pprint(info, indent=4)

{   'args': {   'batch_size': 50,
                'checkpoint_path': '',
                'comment': '',
                'dataset': <dytb.inputs.Cifar10.Cifar10 object at 0x7f896c19a1d0>,
                'epochs': 2,
                'exclude_scopes': '',
                'force_restart': False,
                'gd': {   'args': {   'beta1': 0.9,
                                      'beta2': 0.99,
                                      'epsilon': 1e-08,
                                      'learning_rate': 0.001},
                          'optimizer': <class 'tensorflow.python.training.adam.AdamOptimizer'>},
                'lr_decay': {'enabled': False, 'epochs': 25, 'factor': 0.1},
                'model': <dytb.models.VGG.VGG object at 0x7f896c19a128>,
                'regularizations': {   'augmentation': <function random_flip_left_right at 0x7f89109cb0d0>,
                                       'l2': 1e-05},
                'trainable_scopes': ''},
    'paths': {   'best': '/mnt/data/pgaleone/dytb_work/examples/log/VGG/CIFAR-10_Adam_l2=1e-05_fliplr/best',
                 'current': '/mnt/data/pgaleone/dytb_work/examples',
                 'log': '/mnt/data/pgaleone/dytb_work/examples/log/VGG/CIFAR-10_Adam_l2=1e-05_fliplr'},
    'stats': {   'dataset': 'CIFAR-10',
                 'model': 'VGG',
                 'test': 0.55899998381733895,
                 'train': 0.5740799830555916,
                 'validation': 0.55899998381733895},
    'steps': {'decay': 25000, 'epoch': 1000, 'log': 100, 'max': 2000}}

Here you can see a complete example of training, continue an interrupted training, fine tuning & transfer learning: python-notebook with a complete example.

Getting started: CLI

The only prerequisite is to install DyTB via pip.

pip install --upgrade dytb

DyTB adds to your $PATH two executables: dytb_train and dytb_evaluate.

The CLI workflow is the same as the library one, with 2 differences:

1. Interface implementations

If you define your own input source / model, it must be placed into the appropriate folder:

For models: scripts/models/
For inputs: scripts/inputs/

Rule: the class name must be equal to the file name. E.g.: class LeNet into LeNet.py file.

If you want to use a predefined input/model you don't need to do anything.

2. Train via CLI

Every single hyperparameter (except for the augmentations) definable in the Python version, can be passed as CLI argument to the dytb_train script.

A single model can be trained using various hyper-parameters, such as the learning rate, the weight decay penalty applied, the exponential learning rate decay, the optimizer and its parameters, ...

DyTB allows training a model with different hyper-parameter and automatically it logs every training process allowing the developer to visually compare them.

Moreover, if a training process is interrupted, it automatically resumes it from the last saved training step.

Example

# LeNet: no regularization
dytb_train --model LeNet --dataset MNIST

# LeNet: L2 regularization with value 1e-5
dytb_train --model LeNet --dataset MNIST --l2_penalty 1e-5

# LeNet: L2 regularization with value 1e-2
dytb_train --model LeNet --dataset MNIST --l2_penalty 1e-2

# LeNet: L2 regularization with value 1e-2, initial learning rate of 1e-4
# The default optimization algorithm is MomentumOptimizer, so we can change the momentum value
# The optimizer parameters are passed as a json string
dytb_train --model LeNet --dataset MNIST --l2_penalty 1e-2 \
    --optimizer_args '{"learning_rate": 1e-4, "momentum": 0.5}'

# If, for some reason, we interrupt this training process, rerunning the same command
# will restart the training process from the last saved training step.
# If we want to delete every saved model and log, we can pass the --restart flag
dytb_train --model LeNet --dataset MNIST --l2_penalty \
    --optimizer_args '{"learning_rate": 1e-4, "momentum": 0.5}' --restart

The commands above will create 4 different models. Every model has it's own log folder that shares the same root folder.

In particular, in the log folder there'll be a LeNet folder and within this folder, there'll be other 4 folders, each one with a name that contains the hyper-parameters previously defined. This allows visualizing in the same graphs, using Tensorboard, the 4 models and easily understand which one performs better.

No matter what interface has been implemented, the script to run is always train.py: it's capable of identifying the type of the model and use the right training procedure.

A complete list of the available tunable parameters can be obtained running dytb_train --help (dytb_train --help).

For reference, a part of the output of dytb_train --help:

usage: train.py [-h] --model --dataset
  -h, --help            show this help message and exit
  --model {<list of models in the models/ folder, without the .py suffix>}
  --dataset {<list of inputs in the inputs/folder, without the .py suffix}
  --batch_size BATCH_SIZE
  --restart             restart the training process DELETING the old
                        checkpoint files
  --lr_decay            enable the learning rate decay
  --lr_decay_epochs LR_DECAY_EPOCHS
                        decay the learning rate every lr_decay_epochs epochs
  --lr_decay_factor LR_DECAY_FACTOR
                        decay of lr_decay_factor the initial learning rate
                        after lr_decay_epochs epochs
  --l2_penalty L2_PENALTY
                        L2 penalty term to apply ad the trained parameters
  --optimizer {<list of tensorflow available optimizers>}
                        the optimizer to use
  --optimizer_args OPTIMIZER_ARGS
                        the optimizer parameters
  --epochs EPOCHS       number of epochs to train the model
  --train_device TRAIN_DEVICE
                        the device on which place the the model during the
                        trining phase
  --comment COMMENT     comment string to preprend to the model name
  --exclude_scopes EXCLUDE_SCOPES
                        comma separated list of scopes of variables to exclude
                        from the checkpoint restoring.
  --checkpoint_path CHECKPOINT_PATH
                        the path to a checkpoint from which load the model

Best models & results

No matter if the CLI or the library version is used: DyTB saves for you in the log folder of every model the "best" model with respect to the default metric used for the trained model.

For example, for the LeNet model created with the first command in the previous script, the following directory structure is created:

log/LeNet/
|---MNIST_Momentum
|-----best
|-----train
|-----validation

train and validation folders contain the logs, used by Tensorboard to display in the same graphs train and validation metrics.

The best folder contains one single checkpoint file that is the model with the highest quality obtained during the training phase.

This model is used at the end of the training process to evaluate the model performance.

Moreover, is possible to run the evaluation of any checkpoint file (in the log/<MODEL> folder or in the log/<MODEL>/best folder) using the dytb_evaluate script.

For example:

# Evaluate the validation accuracy
dytb_evaluate --model LeNet \
              --dataset MNIST \
              --checkpoint_path log/LeNet/MNIST_Momentum/
# outputs something like: validation accuracy = 0.993

# Evaluate the test accuracy
dytb_evaluate --model LeNet \
              --dataset MNIST \
              --checkpoint_path log/LeNet/MNIST_Momentum/ \
              --test
# outputs something like: test accuracy = 0.993

Fine Tuning & network surgery

A trained model can be used to build a new model exploiting the learned parameters: this helps to speed up the learning process of new models.

DyTB allows to restore a model from its checkpoint file, remove some layer that's not necessary for the new model, and add new layers to train.

For example, a VGG model trained on the Cifar10 dataset, can be used to train a VGG model but on the Cifar100 dataset.

The examples are for the CLI version, but the same parameters can be used in the Python library.

dytb_train
    --model VGG \
    --dataset Cifar100 \
    --checkpoint_path log/VGG/Cifar10_Momentum/best/ \
    --exclude_scopes softmax_linear

This training process loads the "best" VGG model weights trained on Cifar10 from the checkpoint_path, then the weights are used to initialize the VGG model (so the VGG model must be compatible, at least for the non excluded scopes, to the loaded model) except for the layers under the excluded_scopes list.

Then the softmax_linear layers are replaced with the ones defined in the VGG model, that when trained on Cifar100 adapt themself to output 100 classes instead of 10.

So the above command starts a new training from the pre-trained model and trains the new output layer (with 100 outputs) that the VGG model defines, refining every other weights imported.

If you don't want to train the imported weights, you have to point out which scopes to train, using trainable_scopes:

dytb_train \
    --model VGG \
    --dataset Cifar100 \
    --checkpoint_path log/VGG/Cifar10_Momentum/best/ \
    --exclude_scopes softmax_linear \
    --trainable_scopes softmax_linear

With the above command your instructing DyTB to exclude the softmax_linear scope from the checkpoint_file and to train only the scope named softmax_linear in the new defined model.

Data visualization

Running tensorboard

tensorboard --logdir log/<MODEL>

It's possible to visualize the trend of the loss, the validation measures, the input values and so on. To see some of the produced output, have a look at the implementation of the Convolutional Autoencoder, described here: https://pgaleone.eu/neural-networks/deep-learning/2016/12/13/convolutional-autoencoders-in-tensorflow/#visualization

dynamic-training-bench's People

Contributors

Stargazers

Watchers

dynamic-training-bench's Issues

SyntaxError

I upgraded the library but I got the similar problem..
When I type the following command,
dytb_train --model LeNet --dataset MNIST

File "/home/dongwonshin/Desktop/venv/bin/dytb_train", line 58
row = {**info["stats"], "path": info["paths"]["best"], "time": time.strftime("%Y-%m-%d %H:%M")}
^
SyntaxError: invalid syntax

I got this message.

This is very wired.. even I created a virtual environment using python3 and I installed Tensorflow on it..

inception v3 predefined

Wrt #13, is a predefined Inception V3 model planned?

invalid syntax error

When I try to execute dytb_train, I encounter this error message..
Among them, I think the last error may cause the problem..
Could you give me a clue to solve this problem?

Traceback (most recent call last):
File "/usr/local/bin/dytb_train", line 4, in
import('pkg_resources').run_script('dytb==0.4.4', 'dytb_train')
File "/usr/local/lib/python2.7/dist-packages/pkg_resources/init.py", line 738, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/usr/local/lib/python2.7/dist-packages/pkg_resources/init.py", line 1498, in run_script
code = compile(source, script_filename, 'exec')
File "/usr/local/lib/python2.7/dist-packages/dytb-0.4.4-py2.7.egg/EGG-INFO/scripts/dytb_train", line 58
row = {**info["stats"], "path": info["paths"]["best"], "time": time.strftime("%Y-%m-%d %H:%M")}
^
SyntaxError: invalid syntax

How to use?

Train By my own data sets?

Hi,
I want to use the pretrained model by the SAE which is use CIFAR10-dataset, and fine-tune the model by my own dataset which only about one label data(airplane), how the organize the input data-set, and how to fine-tune the model ?

Please give me some instruction or steps, It will really very help to me.

thx !

cifar example fails for me

I put this into a script and executed it: (from your website) except that
from dytb.inputs import Cifar10 ===>>> from dytb.inputs.predefined import Cifar10
and
from dytb.models.VGG import VGG ===>>> from dytb.models.predefined.VGG import VGG
It failed. Error is below.

import pprint
import tensorflow as tf
from dytb.inputs.predefined import Cifar10
from dytb.train import train
from dytb.models.predefined.VGG import VGG

Instantiate the model

vgg = VGG()

Instantiate the CIFAR-10 input source

cifar10 = Cifar10.Cifar10()

1: Train VGG on Cifar10 for 50 epochs

Place the train process on GPU:0

device = '/gpu:0'
with tf.device(device):
info = train(
model=vgg,
dataset=cifar10,
hyperparameters={
"epochs": 50,
"batch_size": 50,
"regularizations": {
"l2": 1e-5,
"augmentation": {
"name": "FlipLR",
"fn": tf.image.random_flip_left_right
}
},
"gd": {
"optimizer": tf.train.AdamOptimizer,
"args": {
"learning_rate": 1e-3,
"beta1": 0.9,
"beta2": 0.99,
"epsilon": 1e-8
}
}
})

python cifar10test.py
Traceback (most recent call last):
File "cifar10test.py", line 36, in
"epsilon": 1e-8
File ".../anaconda3/lib/python3.6/site-packages/dytb/train.py", line 188, in train
"regularizations"]["augmentation"]["factor"] / args["batch_size"]
KeyError: 'factor'

Any ideas? Thanks.

can't run example - no trainable_scope

Any ideas? (python 3.6/TF1.3)

$ dytb_train --model LeNet --dataset MNIST
...loads data
Args: { 'batch_size': 128,
'checkpoint_path': '',
'comment': '',
'dataset': 'MNIST',
'epochs': 150,
'exclude_scopes': [],
'l2_penalty': 0.0,
'lr_decay': False,
'lr_decay_epochs': 25,
'lr_decay_factor': 0.1,
'model': 'LeNet',
'optimizer': 'MomentumOptimizer',
'optimizer_args': {'learning_rate': 0.01, 'momentum': 0.9},
'restart': False,
'train_device': '/gpu:0',
'trainable_scopes': []}
<tf.Variable 'LeNet/conv1/W:0' shape=(5, 5, 1, 32) dtype=float32_ref>
<tf.Variable 'LeNet/conv1/b:0' shape=(32,) dtype=float32_ref>
<tf.Variable 'LeNet/conv2/W:0' shape=(5, 5, 32, 64) dtype=float32_ref>
<tf.Variable 'LeNet/conv2/b:0' shape=(64,) dtype=float32_ref>
<tf.Variable 'LeNet/fc1/W:0' shape=(3136, 1024) dtype=float32_ref>
<tf.Variable 'LeNet/fc1/b:0' shape=(1024,) dtype=float32_ref>
<tf.Variable 'LeNet/softmax_linear/W:0' shape=(1024, 10) dtype=float32_ref>
<tf.Variable 'LeNet/softmax_linear/b:0' shape=(10,) dtype=float32_ref>
Model LeNet: trainable parameters: 3274634. Size: 13098.536 KB
Traceback (most recent call last):
File "...anaconda3/bin/dytb_train", line 66, in
sys.exit(main())
File "...anaconda3/bin/dytb_train", line 56, in main
comment=ARGS.comment)
File "...tensorflow_tools/dtb/dytb/train.py", line 217, in train
return Trainer(model, dataset, args, steps, paths).train()
File "...tensorflow_tools/dtb/dytb/trainer/Trainer.py", line 112, in train
var_list=variables_to_train(self._args["trainable_scopes"]))
File "...anaconda3/lib/python3.6/site-packages/tensorflow/python/training/optimizer.py", line 315, in minimize
grad_loss=grad_loss)
File "...anaconda3/lib/python3.6/site-packages/tensorflow/python/training/optimizer.py", line 380, in compute_gradients
raise ValueError("No variables to optimize.")
ValueError: No variables to optimize.

how to compute the GPU memory that the graph need be using ?

hi,
I found a question, when i want to dispatch my job to different machines.
I can know the machines ability, but how to estimator the graph's GPU memory ?
Please give me some idea, thanks a lot .

thx !

At least two variables have the same name: global_step

System information
TensorFlow version: 1.5
Python version: 3.5
CUDA/cuDNN version: CUDA 9.0, CuDNN 7
Command to reproduce:

    info = train(
        model=model,
        dataset=dataset,
        hyperparameters={
            "batch_size": 128,
            "epochs": 20,
            "regularizations": {
                "l2": 1e-4,
                "augmentation": {
                    "name": "noise_brightness",
                    "fn": lambda image: tf.image.random_brightness(aug(image), max_delta=32./255.),
                    "factor": 100
                }
            },
            "gd": {
                "optimizer": tf.train.AdamOptimizer,
                    "args": {
                        "learning_rate": 1e-3
                }
            },
            "seed": None,        },
        comment='test',
        force_restart=False)

After upgrading to Tensorflow 1.5 I get this error in train function:

At least two variables have the same name: global_step <

keras equivalent to extract features

In keras, there is an example of how to extract features from an arbitrary intermediate layer with a VGG19 model at https://keras.io/applications/#usage-examples-for-image-classification-models. It is:

from keras.applications.vgg19 import VGG19
from keras.preprocessing import image
from keras.applications.vgg19 import preprocess_input
from keras.models import Model
import numpy as np

base_model = VGG19(weights='imagenet')
model = Model(inputs=base_model.input, outputs=base_model.get_layer('block4_pool').output)

img_path = 'elephant.jpg'
img = image.load_img(img_path, target_size=(224, 224))
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
x = preprocess_input(x)

block4_pool_features = model.predict(x)

What would the equivalent be with dytb?

SingleLayerCAE fails

Hi, I'm having a similar problem to the last one I got when running an example model:
Any ideas? Thanks!

dytb_train --model SingleLayerCAE --dataset MNIST --optimizer AdamOptimizer --optimizer_args '{"learning_rate": 1e-5}' --train_device "/gpu:1" --batch_size 1024 --l2_penalty 1e-9
Extracting ...anaconda3/lib/python3.6/site-packages/dytb/inputs/predefined/data/MNIST/train-images-idx3-ubyte.gz
Extracting ...anaconda3/lib/python3.6/site-packages/dytb/inputs/predefined/data/MNIST/train-labels-idx1-ubyte.gz
Extracting ...anaconda3/lib/python3.6/site-packages/dytb/inputs/predefined/data/MNIST/t10k-images-idx3-ubyte.gz
Extracting ...anaconda3/lib/python3.6/site-packages/dytb/inputs/predefined/data/MNIST/t10k-labels-idx1-ubyte.gz
Args: { 'batch_size': 1024,
'checkpoint_path': '',
'comment': '',
'dataset': 'MNIST',
'epochs': 150,
'exclude_scopes': None,
'l2_penalty': 1e-09,
'lr_decay': False,
'lr_decay_epochs': 25,
'lr_decay_factor': 0.1,
'model': 'SingleLayerCAE',
'optimizer': 'AdamOptimizer',
'optimizer_args': {'learning_rate': 1e-05},
'restart': False,
'train_device': '/gpu:1',
'trainable_scopes': None}
Traceback (most recent call last):
File "...anaconda3/bin/dytb_train", line 66, in
sys.exit(main())
File "...anaconda3/bin/dytb_train", line 56, in main
comment=ARGS.comment)
File "...anaconda3/lib/python3.6/site-packages/dytb/train.py", line 217, in train
return Trainer(model, dataset, args, steps, paths).train()
File "...anaconda3/lib/python3.6/site-packages/dytb/trainer/Trainer.py", line 76, in train
l2_penalty=self._args["regularizations"]["l2"])
TypeError: get() got multiple values for argument 'train_phase'

some puzzled to trouble you 😁

hi,
recently, I am learning the rnn and lstm in tensorflow, but I have some puzzled on this.
When I use tf.__version == "v1.0.0", I use this code in my model,

cell = tf.contrib.rnn.LSTMCell(state_size, state_is_tuple=True)
cell = tf.contrib.rnn.MultiRNNCell([cell] * num_layers, state_is_tuple=True)
states_series, current_state = tf.nn.dynamic_rnn(cell=cell,
                                                 inputs=batchX_placeholder,
                                                 initial_state=rnn_tuple_state)

But when i want to run this code on the version about "v1.1.0 and v1.2.0", I got this error,

Traceback (most recent call last):
  File "/Users/liuguiyang/Documents/CodeProj/PyProj/MLCourse/source/LSTM/approximation_sin/RNN_LSTM_example.py", line 90, in <module>
    initial_state=rnn_tuple_state)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/ops/rnn.py", line 574, in dynamic_rnn
    dtype=dtype)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/ops/rnn.py", line 737, in _dynamic_rnn_loop
    swap_memory=swap_memory)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2770, in while_loop
    result = context.BuildLoop(cond, body, loop_vars, shape_invariants)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2599, in BuildLoop
    pred, body, original_loop_vars, loop_vars, shape_invariants)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2549, in _BuildLoop
    body_result = body(*packed_vars_for_body)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/ops/rnn.py", line 722, in _time_step
    (output, new_state) = call_cell()
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/ops/rnn.py", line 708, in <lambda>
    call_cell = lambda: cell(input_t, state)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/ops/rnn_cell_impl.py", line 180, in __call__
    return super(RNNCell, self).__call__(inputs, state)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/layers/base.py", line 441, in __call__
    outputs = self.call(inputs, *args, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/ops/rnn_cell_impl.py", line 916, in call
    cur_inp, new_state = cell(cur_inp, cur_state)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/ops/rnn_cell_impl.py", line 180, in __call__
    return super(RNNCell, self).__call__(inputs, state)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/layers/base.py", line 441, in __call__
    outputs = self.call(inputs, *args, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/ops/rnn_cell_impl.py", line 542, in call
    lstm_matrix = _linear([inputs, m_prev], 4 * self._num_units, bias=True)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/ops/rnn_cell_impl.py", line 1017, in _linear
    initializer=kernel_initializer)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py", line 1065, in get_variable
    use_resource=use_resource, custom_getter=custom_getter)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py", line 962, in get_variable
    use_resource=use_resource, custom_getter=custom_getter)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py", line 360, in get_variable
    validate_shape=validate_shape, use_resource=use_resource)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py", line 1405, in wrapped_custom_getter
    *args, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/ops/rnn_cell_impl.py", line 183, in _rnn_get_variable
    variable = getter(*args, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/ops/rnn_cell_impl.py", line 183, in _rnn_get_variable
    variable = getter(*args, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py", line 352, in _true_getter
    use_resource=use_resource)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py", line 669, in _get_single_variable
    found_var.get_shape()))
ValueError: Trying to share variable rnn/multi_rnn_cell/cell_0/lstm_cell/kernel, but specified shape (8, 16) and found shape (9, 16).

Process finished with exit code 1

I google for answer about that,

def lstm_cell():
    cell = tf.contrib.rnn.NASCell(
        state_size, reuse=tf.get_variable_scope().reuse)
    return tf.contrib.rnn.DropoutWrapper(cell, output_keep_prob=0.8)


cell = tf.contrib.rnn.MultiRNNCell(
    [lstm_cell() for _ in range(num_layers)], state_is_tuple=True)

# cell = tf.contrib.rnn.LSTMCell(state_size, state_is_tuple=True)
# cell = tf.contrib.rnn.MultiRNNCell([cell] * num_layers, state_is_tuple=True)
states_series, current_state = tf.nn.dynamic_rnn(cell=cell,
                                                 inputs=batchX_placeholder,
                                                 initial_state=rnn_tuple_state)

I really fix this problem, but I don't know why, Please help me?

guiyang

How to use the model the extract feature?

Hi,
I trained a SingleLayerCAE model use Cifar10 data sets. And I want to use this model to extract features. How to use your skeleton to test this model.
Or else, Should I write program to finished it ?

thx

Thanks A Lot !

Dear Sir,
You open source code help me a lot !

thx !

Feature extraction from the convolutional autoencoder

Hello,
I could put my training data to the CAE model and I trained the model.
After training, how can I extract a feature for an image?
In the SingleLayerCAE.py, there was the "get" function.
and I found out that "encoding" variable in the "encode" namespace.
Is this related?

how to read the image.list by pipeline

hi,
I sorry to bother you.
I want to make my own dataset, but there is a problem that hold me the whole day. This is the code

import os
import sys
import tensorflow as tf
import cv2

readfile = "/Users/liuguiyang/Documents/CodeProj/PyProj/dtb/scripts/inputs/data/airplane/positive/test.list"
filenames = []
with open(readfile, 'r') as h:
    filenames = [f.strip() for f in h.readlines()]
# print(filenames)

def readMyFileFormatImg(fileNameQueue):
    reader = tf.TextLineReader(skip_header_lines=False)
    key, value = reader.read(fileNameQueue)
    raw_img = tf.read_file(key)
    # features = tf.image.decode_png(tf.read_file(value), channels=3)
    # features.set_shape((32,32,3))
    label = tf.stack([1])
    # label.set_shape((1,))
    return raw_img, label

def inputPipeLine(fileNames, batchSize = 4, numEpochs = None):
    fileNameQueue = tf.train.string_input_producer(fileNames, num_epochs = numEpochs)
    example, label = readMyFileFormatImg(fileNameQueue)
    # min_after_dequeue = 8
    # capacity = min_after_dequeue + 3 * batchSize
    # exampleBatch, labelBatch = tf.train.shuffle_batch([example, label], 
    #                                                    batch_size = batchSize, 
    #                                                    num_threads = 3,  
    #                                                    capacity = capacity, 
    #                                                    min_after_dequeue = min_after_dequeue)
    return example, label


featureBatch, labelBatch = inputPipeLine(filenames, batchSize = 4)
init = [
            tf.variables_initializer(tf.global_variables() +
                                     tf.local_variables()),
            tf.tables_initializer()
        ]
with tf.Session(config=tf.ConfigProto(allow_soft_placement=True)) as sess:
    sess.run(init)
    # Start populating the filename queue.                                                                                                                                                                                                                                    
    coord = tf.train.Coordinator()
    threads = tf.train.start_queue_runners(coord=coord)

   # Retrieve a single instance:                                                                                                                                                                                                                                             
    try:
        #while not coord.should_stop():                                                                                                                                                                                                                                       
        while True:
            example, label = sess.run([featureBatch, labelBatch])
            print(example)
    except tf.errors.OutOfRangeError:
        print('Done reading')
    finally:
        coord.request_stop()

    coord.join(threads)
    sess.close()

The Error Info Msg is:

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1022, in _do_call
    return fn(*args)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1004, in _run_fn
    status, run_metadata)
  File "/usr/local/Cellar/python3/3.6.0/Frameworks/Python.framework/Versions/3.6/lib/python3.6/contextlib.py", line 89, in __exit__
    next(self.gen)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 469, in raise_exception_on_not_ok_status
    pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.NotFoundError: /Users/liuguiyang/Documents/CodeProj/PyProj/dtb/scripts/inputs/data/airplane/positive/32size/0335.png:1
	 [[Node: ReadFile = ReadFile[_device="/job:localhost/replica:0/task:0/cpu:0"](ReaderReadV2)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "tune-airplane.py", line 60, in <module>
    example, label = sess.run([featureBatch, labelBatch])
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 767, in run
    run_metadata_ptr)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 965, in _run
    feed_dict_string, options, run_metadata)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1015, in _do_run
    target_list, options, run_metadata)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1035, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.NotFoundError: /Users/liuguiyang/Documents/CodeProj/PyProj/dtb/scripts/inputs/data/airplane/positive/32size/0335.png:1
	 [[Node: ReadFile = ReadFile[_device="/job:localhost/replica:0/task:0/cpu:0"](ReaderReadV2)]]

Caused by op 'ReadFile', defined at:
  File "tune-airplane.py", line 44, in <module>
    featureBatch, labelBatch = inputPipeLine(filenames, batchSize = 4)
  File "tune-airplane.py", line 33, in inputPipeLine
    example, label = readMyFileFormatImg(fileNameQueue)
  File "tune-airplane.py", line 24, in readMyFileFormatImg
    raw_img = tf.read_file(key)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/ops/gen_io_ops.py", line 202, in read_file
    result = _op_def_lib.apply_op("ReadFile", filename=filename, name=name)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 763, in apply_op
    op_def=op_def)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2395, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1264, in __init__
    self._traceback = _extract_stack()

NotFoundError (see above for traceback): /Users/liuguiyang/Documents/CodeProj/PyProj/dtb/scripts/inputs/data/airplane/positive/32size/0335.png:1
	 [[Node: ReadFile = ReadFile[_device="/job:localhost/replica:0/task:0/cpu:0"](ReaderReadV2)]]

please give me some info.

guiyang, thx!

pip install old not working version of the software (version installed 0.7.3)

System information
TensorFlow version: 1.5
Python version: 3.5
CUDA/cuDNN version: CUDA 9.0, CuDNN 7

Command to reproduce:
Install dytb through pip
import dytb

Error:

Traceback (most recent call last):
File "CNNANGLESPECIFIC.py", line 14, in <module>
   from dytb.models.utils import variables_to_restore
   File "/home/pier/piervenv/local/lib/python3.5/site-packages/dytb/models/utils.py", line 12, in <module>>     from .collections import MODEL_SUMMARIES, REQUIRED_NON_TRAINABLES
 ImportError: cannot import name 'MODEL_SUMMARIES'

Syntax Error on Python 3.4 for merging dict into another using **

Trying to run the example present here, but this line:

https://github.com/galeone/dynamic-training-bench/blob/master/dytb/train.py#L202

throws a syntax error in python 3.4. On checking the reason, I got to know that this syntax is supported only from 3.5 python.

This was confusing because in the description its written that dytb is compatible with Python 3.4

Please correct the code.

how to design a distribution tensorflow wrapper API ?

hi,
I want to bother you.
Now, I want to write a wrapper about to train the distribution training api use tensorflow?
how to design it ?

thx

galeone / dynamic-training-bench Goto Github PK

dynamic-training-bench's Introduction

Dynamic Training Bench: DyTB

Features

Getting started: python library

Define or pick a predefined Model

Define or pick a predefined Input

Train

Getting started: CLI

1. Interface implementations

2. Train via CLI

Example

Best models & results

Fine Tuning & network surgery

Data visualization

dynamic-training-bench's People

Contributors

Stargazers

Watchers

Forkers

dynamic-training-bench's Issues

Instantiate the model

Instantiate the CIFAR-10 input source

1: Train VGG on Cifar10 for 50 epochs

Place the train process on GPU:0

Recommend Projects

Recommend Topics

Recommend Org

Jobs