GithubHelp home page GithubHelp logo

indicodatasolutions / finetune Goto Github PK

View Code? Open in Web Editor NEW
698.0 698.0 81.0 433.01 MB

Scikit-learn style model finetuning for NLP

Home Page: https://finetune.indico.io

License: Mozilla Public License 2.0

Python 95.39% Shell 0.24% CMake 0.24% Cuda 1.82% C++ 2.31%

finetune's People

Contributors

ameasure avatar benleetownsend avatar chris-wells-13 avatar concomitant avatar dimidd avatar eito-fis avatar fitzworkhub avatar guillermogsjc avatar jacobmanderson avatar jkgenser avatar johndpope avatar lhz1029 avatar madisonmay avatar matthewbayer avatar newmu avatar pastap avatar rdedhia avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

finetune's Issues

Possible bug with cached_predict

Describe the bug
I can't get cached_predict() to work properly. After the first call to predict(), the inference time is lightning fast, but the last value from the first call is used for all inference from then on. (see attached screenshot)

Minimum Reproducible Example

model = Classifier().load("path_to_model_file")
with model.cached_predict():
    print(model.predict_proba(["some text"]))
    print(model.predict_proba(["some other text for different probs"]))

Expected behavior
The inference time is significantly faster starting with the second predict and it predicts for the given input.

Additional context
This seems to work on CPU, but not on GPU. This was run on several AWS instance types for CPU performance and a p2.xlarge for GPU testing.
I'm using the Deep Learning AMI (Ubuntu) Version 19.0 - ami-05bc59103c52af154
Then I am pulling the development branch of finetune.
I tried with and without trying to upgrade tensorflow with no difference in performance.

Screen Shot

  • Notice that in the first example all predicted probabilities are different, and in the second example, all predicted probabilities match the first call.
    cached_predict_bug

Early Stopping/Save Best Val Weights?

Thanks for writing this library to wrap the OpenAI code!! I don't think the current codebase does this (I was looking in the base model's _train_loop and validation_hook) but I could be wrong.

Is your feature request related to a problem? Please describe.

When I am fitting the classifier, I would like to save the weights of the model with the best validation data loss. In particular -- I call:

model = finetune.Classifier(n_epochs=10,
                            test_size=.3,
                            verbose=True,
                            batch_size=8,
                            lm_loss_coef=0,
                            val_interval=25)
model.fit(train_data, train_labels)

I watch as the validation loss goes down, and then goes back up again...

...
Train loss: 0.8263243668572684   Validation loss: 0.9696971137060008
Train loss: 0.7893467535291913   Validation loss: 0.9315027973123593
Train loss: 0.7428553240288371   Validation loss: 0.8978088509162531
Train loss: 0.6532491042988868   Validation loss: 0.8657591236030956
Train loss: 0.5872814164979386   Validation loss: 0.8302381457714927
Train loss: 0.5188518893581162   Validation loss: 0.8055230206469525
Train loss: 0.46836010163226455  Validation loss: 0.7790362483260943
Train loss: 0.42655090732702466  Validation loss: 0.7963167028100855
Train loss: 0.37630249256415615  Validation loss: 0.7668756079191992
Train loss: 0.3127427928071437   Validation loss: 0.7535473302166583
Train loss: 0.26105406358681316  Validation loss: 0.752195528951159
Train loss: 0.2315318344735824   Validation loss: 0.74845844850349
Train loss: 0.20082835943056582  Validation loss: 0.7339546502212082
Train loss: 0.16591417155263366  Validation loss: 0.7224668118229861
Train loss: 0.16113676263676516  Validation loss: 0.71280302448007
Train loss: 0.1351519643111266   Validation loss: 0.7155087849984496
Train loss: 0.1122367973046417   Validation loss: 0.7097793683152129
Train loss: 0.08857986326043485  Validation loss: 0.7209393355041678
Train loss: 0.07383205768643403  Validation loss: 0.7290630419633354
Train loss: 0.05744698858779037  Validation loss: 0.737265006848874
Train loss: 0.04475240544373843  Validation loss: 0.7473114921713067
Train loss: 0.03484421883801572  Validation loss: 0.7718343255340118
...

Also -- the point at which the minimum is reached varies run-to-run, so I can't simply set the number of epochs apriori.

Describe the solution you'd like
It would be great to have an option to cache the temporary weights and then, at the end of the training loop, have the best weights according to validation loss reloaded automatically.

Describe alternatives you've considered
I could write my own training loop where I call fit (epochs=1) and save, but caching the weights and then reloading the best weights according to in a single call would be extremely convenient.

Tensorboard event files are very large

Describe the bug
Tensorboard event files are 800MB+ per run

Minimum Reproducible Example

import finetune

x = ["foo", "bar", "baz"]
y = [1, 0, 0]

classifier = finetune.Classifier(tensorboard_folder="./tb_test")

classifier.fit(x, y)

I'm not sure what exactly is contained in the log files - but this is pretty annoying because it means that after ~30 runs, tensorboard is taking 40+ Gigabyte of RAM for me. Looks like for some reason the graph definition is huge (reading the file with tf.train.summary_iterator('./events.out....'))

index is out of bounds for axis 1

Describe the bug
Hello,
By using the model.fit function, I have an unexpected index out of bound exeption. Following the whole trace:

len(df_train["text"])= 20 type= <class 'pandas.core.series.Series'>
len(df_train["label"])= 20 type= <class 'pandas.core.series.Series'>

IndexError                                Traceback (most recent call last)
<ipython-input-34-7fa88e2450d2> in <module>()
      2 print("len(df_train[\"text\"])=", len(df_train["text"]), "type=", type(df_train["text"]))
      3 print("len(df_train[\"label\"])=", len(df_train["label"]), "type=", type(df_train["label"]))
----> 4 model.fit(df_train['text'], df_train['label'])    # Finetune base model on custom data
      5 model.save("model_repairType.md5")                   # Serialize the model to disk

/home/nico/anaconda3/envs/py36/lib/python3.6/site-packages/finetune/base.py in fit(self, *args, **kwargs)
    308     def fit(self, *args, **kwargs):
    309         """ An alias for finetune. """
--> 310         return self.finetune(*args, **kwargs)
    311 
    312     def _predict(self, Xs, max_length=None):

/home/nico/anaconda3/envs/py36/lib/python3.6/site-packages/finetune/classifier.py in finetune(self, X, Y, batch_size)
     55                            corresponds to the number of training examples provided to each GPU.
     56         """
---> 57         return super().finetune(X, Y=Y, batch_size=batch_size)
     58 
     59     def get_eval_fn(cls):

/home/nico/anaconda3/envs/py36/lib/python3.6/site-packages/finetune/base.py in finetune(self, Xs, Y, batch_size)
    199             arr_encoded,
    200             Y=Y,
--> 201             batch_size=batch_size,
    202         )
    203 

/home/nico/anaconda3/envs/py36/lib/python3.6/site-packages/finetune/base.py in _training_loop(self, arr_encoded, Y, batch_size)
    215         else:
    216             Y = np.asarray(Y)
--> 217             train_Y = self.label_encoder.fit_transform(Y[train_idxs])
    218             val_Y = self.label_encoder.transform(Y[val_idxs])
    219             target_dim = self.label_encoder.target_dim

IndexError: index 26 is out of bounds for axis 1 with size 20

Minimum Reproducible Example
This is the code I run:

model = Classifier(n_epochs=2, tensorboard_folder='.tensorboard', chunk_long_sequences=True)
print("len(df_train[\"text\"])=", len(df_train["text"]), "type=", type(df_train["text"]))
print("len(df_train[\"label\"])=", len(df_train["label"]), "type=", type(df_train["label"]))
model.fit(df_train['text'], df_train['label'])
model.save("model_repairType.md5")

Expected behavior
classification of 'text' elements to 'label'

Additional context
no

Many thanks,

Nicolas

** Bug Description **

Describe the bug
A clear and concise description of what the bug is.

Minimum Reproducible Example
A short code snippet which reproduces the exception

Expected behavior
A clear and concise description of what you expected to happen.

Additional context
Add any other context about the problem here.

download_data_if_required not called from Model.load

from finetune import Classifier
model = Classifier.load(model_fname)

fails with No such file or directory: encoder_bpe_40000.json

Fixed by calling this once:

import finetune.download
finetune.download.download_data_if_required()

Shapes are always computed; don't use the compute_shapes as it has no effect.

Describe the bug
I get the following warning both in training and prediction:

WARNING:tensorflow:From ~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow/python/framework/function.py:986: calling Graph.create_op (from tensorflow.python.framework.ops) with compute_shapes is deprecated and will be removed in a future version.
Instructions for updating:
Shapes are always computed; don't use the compute_shapes as it has no effect.

Minimum Reproducible Example

[... train a classifier as in the examples and save it ...]
model = ft.Classifier.load(PATH)
model.predict(X_test)

Expected behavior
No warnings

Unable to train a loaded model

Describe the bug
After I load a saved model, I'm unable to train it again. I've reproduced the bug on two different datasets.

ValueError: Operation name: "NoOp_1"
op: "NoOp"
 is not an element of this graph.

Minimum Reproducible Example

from finetune import Classifier
from sklearn.datasets import fetch_20newsgroups
    
dataset = fetch_20newsgroups()
trainX, trainY = dataset.data[:100], dataset.target[:100]
model = Classifier(n_epochs=1)
model.fit(trainX, trainY)
model.save('repro')
model = Classifier.load('repro')
model.fit(trainX, trainY)

Exception training Comparison model.

Describe the bug

    return fn(*args)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1319, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Input to reshape is a tensor with 24576 values, but the requested shape has 0
	 [[{{node OptimizeLoss/gradients/model/target/Sum_1_grad/Reshape}} = Reshape[T=DT_FLOAT, Tshape=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](OptimizeLoss/gradients/model/target/Abs_1_grad/mul, OptimizeLoss/gradients/model/target/Sum_1_grad/DynamicStitch/_2401)]]
	 [[{{node OptimizeLoss/control_dependency/_2705}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_15696_OptimizeLoss/control_dependency", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "finetune/datasets/quora_similarity.py", line 51, in <module>
    model.fit(list(zip(trainX1, trainX2)), trainY)
  File "/root/code/indico/finetune/finetune/base.py", line 362, in fit
    return self.finetune(*args, **kwargs)
  File "/root/code/indico/finetune/finetune/classifier.py", line 69, in finetune
    return super().finetune(X, Y=Y, batch_size=batch_size)
  File "/root/code/indico/finetune/finetune/base.py", line 236, in finetune
    estimator.train(train_input_fn, hooks=train_hooks, steps=num_steps)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/estimator/estimator.py", line 354, in train
    loss = self._train_model(input_fn, hooks, saving_listeners)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/estimator/estimator.py", line 1207, in _train_model
    return self._train_model_default(input_fn, hooks, saving_listeners)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/estimator/estimator.py", line 1241, in _train_model_default
    saving_listeners)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/estimator/estimator.py", line 1471, in _train_with_estimator_spec
    _, loss = mon_sess.run([estimator_spec.train_op, estimator_spec.loss])
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/monitored_session.py", line 671, in run
    run_metadata=run_metadata)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/monitored_session.py", line 1156, in run
    run_metadata=run_metadata)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/monitored_session.py", line 1255, in run
    raise six.reraise(*original_exc_info)
  File "/usr/local/lib/python3.5/dist-packages/six.py", line 693, in reraise
    raise value
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/monitored_session.py", line 1240, in run
    return self._sess.run(*args, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/monitored_session.py", line 1312, in run
    run_metadata=run_metadata)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/monitored_session.py", line 1076, in run
    return self._sess.run(*args, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 929, in run
    run_metadata_ptr)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1152, in _run
    feed_dict_tensor, options, run_metadata)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1328, in _do_run
    run_metadata)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1348, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Input to reshape is a tensor with 24576 values, but the requested shape has 0
	 [[node OptimizeLoss/gradients/model/target/Sum_1_grad/Reshape (defined at /usr/local/lib/python3.5/dist-packages/tensorflow/contrib/layers/python/layers/optimizers.py:239)  = Reshape[T=DT_FLOAT, Tshape=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](OptimizeLoss/gradients/model/target/Abs_1_grad/mul, OptimizeLoss/gradients/model/target/Sum_1_grad/DynamicStitch/_2401)]]
	 [[{{node OptimizeLoss/control_dependency/_2705}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_15696_OptimizeLoss/control_dependency", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Caused by op 'OptimizeLoss/gradients/model/target/Sum_1_grad/Reshape', defined at:
  File "finetune/datasets/quora_similarity.py", line 51, in <module>
    model.fit(list(zip(trainX1, trainX2)), trainY)
  File "/root/code/indico/finetune/finetune/base.py", line 362, in fit
    return self.finetune(*args, **kwargs)
  File "/root/code/indico/finetune/finetune/classifier.py", line 69, in finetune
    return super().finetune(X, Y=Y, batch_size=batch_size)
  File "/root/code/indico/finetune/finetune/base.py", line 236, in finetune
    estimator.train(train_input_fn, hooks=train_hooks, steps=num_steps)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/estimator/estimator.py", line 354, in train
    loss = self._train_model(input_fn, hooks, saving_listeners)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/estimator/estimator.py", line 1207, in _train_model
    return self._train_model_default(input_fn, hooks, saving_listeners)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/estimator/estimator.py", line 1237, in _train_model_default
    features, labels, model_fn_lib.ModeKeys.TRAIN, self.config)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/estimator/estimator.py", line 1195, in _call_model_fn
    model_fn_results = self._model_fn(features=features, **kwargs)
  File "/root/code/indico/finetune/finetune/model.py", line 154, in _model_fn
    summaries=summaries
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/contrib/layers/python/layers/optimizers.py", line 239, in optimize_loss
    colocate_gradients_with_ops=colocate_gradients_with_ops)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/optimizer.py", line 519, in compute_gradients
    colocate_gradients_with_ops=colocate_gradients_with_ops)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/gradients_impl.py", line 630, in gradients
    gate_gradients, aggregation_method, stop_gradients)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/gradients_impl.py", line 814, in _GradientsHelper
    lambda: grad_fn(op, *out_grads))
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/gradients_impl.py", line 408, in _MaybeCompile
    return grad_fn()  # Exit early
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/gradients_impl.py", line 814, in <lambda>
    lambda: grad_fn(op, *out_grads))
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_grad.py", line 83, in _SumGrad
    grad = array_ops.reshape(grad, output_shape_kept_dims)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/gen_array_ops.py", line 6482, in reshape
    "Reshape", tensor=tensor, shape=shape, name=name)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 3274, in create_op
    op_def=op_def)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 1770, in __init__
    self._traceback = tf_stack.extract_stack()

...which was originally created as op 'model/target/Sum_1', defined at:
  File "finetune/datasets/quora_similarity.py", line 51, in <module>
    model.fit(list(zip(trainX1, trainX2)), trainY)
[elided 6 identical lines from previous traceback]
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/estimator/estimator.py", line 1195, in _call_model_fn
    model_fn_results = self._model_fn(features=features, **kwargs)
  File "/root/code/indico/finetune/finetune/model.py", line 100, in _model_fn
    target_model_state = target_model_op(featurizer_state=featurizer_state, Y=Y, params=params, mode=mode)
  File "/root/code/indico/finetune/finetune/model.py", line 73, in target_model_op
    class_weights=weighted_tensor
  File "/root/code/indico/finetune/finetune/comparison.py", line 55, in _target_model
    featurizer_state["features"] = tf.abs(tf.reduce_sum(featurizer_state["features"], 1))
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py", line 1345, in reduce_sum
    name=name))
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/gen_math_ops.py", line 8389, in _sum
    name=name)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 3274, in create_op
    op_def=op_def)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 1770, in __init__
    self._traceback = tf_stack.extract_stack()

InvalidArgumentError (see above for traceback): Input to reshape is a tensor with 24576 values, but the requested shape has 0
	 [[node OptimizeLoss/gradients/model/target/Sum_1_grad/Reshape (defined at /usr/local/lib/python3.5/dist-packages/tensorflow/contrib/layers/python/layers/optimizers.py:239)  = Reshape[T=DT_FLOAT, Tshape=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](OptimizeLoss/gradients/model/target/Abs_1_grad/mul, OptimizeLoss/gradients/model/target/Sum_1_grad/DynamicStitch/_2401)]]
	 [[{{node OptimizeLoss/control_dependency/_2705}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_15696_OptimizeLoss/control_dependency", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Minimum Reproducible Example
Issue training the following model
model = Comparison(low_memory_mode=True, n_epochs=5, batch_size=32, early_stopping_steps=10000)

Expected behavior
A clear and concise description of what you expected to happen.

Additional context
I am unable to reproduce this exception, but I've logged it here so we can build a picture of what is going on if it happens again.

Keep outputting '0it [00:00, ?it/s]'

Describe the bug
I use the following code to run a demo on SNLI dataset.
It keeps outputting '0it [00:00, ?it/s]'

The output file looks like this:

 FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]

Minimum Reproducible Example

def trim(string):
    try:
        string = ' '.join(string.split(' ')[:256]).rstrip('\n')
        return string       
    except:
        raise ValueError(f'{string}')

def read_file(file):
    with open(file) as f:
        lines=[]
        for line in f:
            line = trim(line)
            lines.append(line)
    return lines

if __name__ == "__main__":

    trainX1 = read_file('premise_snli_1.0_train.txt')
    trainX2 = read_file('hypothesis_snli_1.0_train.txt')
    trainY = read_file('label_snli_1.0_train.txt')

    testX1 = read_file('premise_snli_1.0_test.txt')
    testX2 = read_file('hypothesis_snli_1.0_test.txt')
    testY = read_file('label_snli_1.0_test.txt')

    model = Entailment(verbose=True)
    model.fit(trainX1, trainX2, trainY)
    model.save('./saved_snli_model')
    pred_result = model.predict(testX1, testX2)

premise_snli_1.0_train.txt is a file where each line is a sentence.
In config.py file i set the max_length to be 258, batch_size to be 8

Siamese model

Is it possible to build a siamese model for fine tuning specially for the text inference task ?

Wrong indices in oversampling code?

Describe the bug

In these lines:

def resampling(self, Xs, Y):
if self.config.oversample:
idxs, Ys = RandomOverSampler().fit_sample([[i] for i in range(len(Xs))], Y)
return [Xs[i[0]] for i in idxs], [Ys[i[0]] for i in idxs]

It seems to me that the return statement should either be

return [Xs[i[0]] for i in idxs], [Y[i[0]] for i in idxs]

(Y instead of Ys)

or

return [Xs[i[0]] for i in idxs], Ys

Since as is you're basically applying the originalIndex->sampledIndex mapping twice

I'm not sure if this actually causes wrong labels but it looks like it could.

Cuda error when running nvidia-docker with CUDA 8

I am trying to run training with Nvidia-docker , I have cuda 8 and

getting the following error, when running training file ,

Here is the image of my terminal output : https://ibb.co/eRoj5p

Here is the image of nvidia-smi and nvidia-docker on terminal : https://ibb.co/jEtOy9

I am not able to figure out what the problem is I have docker and compatible verison of nvidia-docker, I have nvidia gpus and cuda 8, I have verified performing nvidia-smi test on nvidia-docker, but when I run the finetune docker image and then run training file , it gives the error mentioned in screenshot.

I would really appreicate if someone can point me towards a right direction to solve this.

Kernel dying when running the SST classification example.

Hi! I'm trying to run the SST classification example in a Jupyter notebook, but the kernel keeps dying as soon as the TensorFlow variables are initialized.

I guess that maybe the model is too big to fit in memory. But I tried to lower the batch size to 1 and the max_length to 10, and the kernel still died anyway.

I tried this on a machine with 16 Gb of RAM and 8 CPUs, as well as on the same machine but using 2 GeForce GTX 970 with each 4 Gb of memory.

Do I need more memory to be able to use the classification model ?

Comparison.predict_proba doesn't returns probability distribution

Describe the bug
Method Comparison.predict_proba returns most likely class, but not probabilities of classes

Minimum Reproducible Example
A short code snippet which reproduces the exception

from finetune import Comparison model = Comparison.load(model_path) model.predict_proba("my sentence", "other sentence")

Expected behavior
{0: float, 1: float}

Additional context
[int]

Error in atexit._run_exitfuncs when running in Windows 10

Describe the bug
Problem deleting temp file while running in Windows 10.

Error in atexit._run_exitfuncs:
Traceback (most recent call last):
  File "C:\Users\yang.liu\AppData\Local\conda\conda\envs\python36\lib\site-packages\finetune\base.py", line 537, in __del__
    shutil.rmtree(file_or_folder)
  File "C:\Users\yang.liu\AppData\Local\conda\conda\envs\python36\lib\shutil.py", line 494, in rmtree
    return _rmtree_unsafe(path, onerror)
  File "C:\Users\yang.liu\AppData\Local\conda\conda\envs\python36\lib\shutil.py", line 384, in _rmtree_unsafe
    _rmtree_unsafe(fullname, onerror)
  File "C:\Users\yang.liu\AppData\Local\conda\conda\envs\python36\lib\shutil.py", line 393, in _rmtree_unsafe
    onerror(os.rmdir, path, sys.exc_info())
  File "C:\Users\yang.liu\AppData\Local\conda\conda\envs\python36\lib\shutil.py", line 391, in _rmtree_unsafe
    os.rmdir(path)
OSError: [WinError 145] The directory is not empty: 'C:\\Users\\yang.liu\\AppData\\Local\\Temp\\Finetunerxoy0m5o\\eval'

after running this code:

def train(X_train, Y_train):
    model = Classifier()
    model.fit(X_train, Y_train)
    print('trained model')
    out_dir = os.path.join('models', 'finetune.model')
    model.save(out_dir)
    return model

Expected behavior
Expected to delete the temp file if run as administrator

Additional context
Please don't ask why I use Windows. I know Linux works.

Load a model trained to predict next character or subword instead of next word

Hi,

Character based language modeling has its advantages over word level prediction, and I'm wondering if I'll be able to use this wrapper or not.

My plan is to train a model using Google's T2T as documented here. The model can be trained using subword encoding (default), character or word level encodings. If I were to use any of this options, would the model saved work out of the box with finetune? Should I beware of any details when training the model?

The repo looks like it is very well made, I hope this would be seamless. Does anyone know?

Can I use to generate text?

Hi, seems great work done by the team.
According to the documentation, I understand that every model uses a pre-trained language model.
Can I use it for the following scenario, if yes how?:

  1. Fine-tune the pre-trained language model on my own text corpus and then generate(sample) text.
  2. Fine-tune the pre-trained language model on my own text corpus and then score any given text/sentence.
    Thanks.

InvalidArgumentError (see above for traceback): You must feed a value for placeholder tensor 'Placeholder_3' with dtype float

Describe the bug

  File "/home/tingkai/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1322, in _do_call
    return fn(*args)
  File "/home/tingkai/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1307, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/home/tingkai/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1409, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: You must feed a value for placeholder tensor 'Placeholder_3' with dtype float
	 [[Node: Placeholder_3 = Placeholder[dtype=DT_FLOAT, shape=<unknown>, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]
	 [[Node: Mean_3/_943 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_3032_Mean_3", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "test.py", line 18, in <module>
    model.fit(train_X, train_y)
  File "/home/tingkai/finetune/finetune/lm_base.py", line 169, in fit
    return self.finetune(*args, **kwargs)
  File "/home/tingkai/finetune/finetune/lm_classifier.py", line 47, in finetune
    return self._finetune(X, Y=Y, batch_size=batch_size)
  File "/home/tingkai/finetune/finetune/lm_base.py", line 130, in _finetune
    summary = self.sess.run(self.summaries, {self.X: xmb, self.M: mmb, self.Y: ymb})
  File "/home/tingkai/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 900, in run
    run_metadata_ptr)
  File "/home/tingkai/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1135, in _run
    feed_dict_tensor, options, run_metadata)
  File "/home/tingkai/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1316, in _do_run
    run_metadata)
  File "/home/tingkai/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1335, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: You must feed a value for placeholder tensor 'Placeholder_3' with dtype float
	 [[Node: Placeholder_3 = Placeholder[dtype=DT_FLOAT, shape=<unknown>, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]
	 [[Node: Mean_3/_943 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_3032_Mean_3", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Caused by op 'Placeholder_3', defined at:
  File "test.py", line 18, in <module>
    model.fit(train_X, train_y)
  File "/home/tingkai/finetune/finetune/lm_base.py", line 169, in fit
    return self.finetune(*args, **kwargs)
  File "/home/tingkai/finetune/finetune/lm_classifier.py", line 47, in finetune
    return self._finetune(X, Y=Y, batch_size=batch_size)
  File "/home/tingkai/finetune/finetune/lm_base.py", line 111, in _finetune
    self._build_model(n_updates_total=n_updates_total, target_dim=self.target_dim)
  File "/home/tingkai/finetune/finetune/lm_base.py", line 357, in _build_model
    self._construct_graph(n_updates_total, target_dim, train=train)
  File "/home/tingkai/finetune/finetune/lm_base.py", line 277, in _construct_graph
    self._define_placeholders()
  File "/home/tingkai/finetune/finetune/lm_base.py", line 386, in _define_placeholders
    self.do_dropout = tf.placeholder(tf.float32)  # 1 for do dropout and 0 to not do dropout
  File "/home/tingkai/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/array_ops.py", line 1808, in placeholder
    return gen_array_ops.placeholder(dtype=dtype, shape=shape, name=name)
  File "/home/tingkai/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/gen_array_ops.py", line 4848, in placeholder
    "Placeholder", dtype=dtype, shape=shape, name=name)
  File "/home/tingkai/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/home/tingkai/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3392, in create_op
    op_def=op_def)
  File "/home/tingkai/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1718, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

InvalidArgumentError (see above for traceback): You must feed a value for placeholder tensor 'Placeholder_3' with dtype float
	 [[Node: Placeholder_3 = Placeholder[dtype=DT_FLOAT, shape=<unknown>, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]
	 [[Node: Mean_3/_943 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_3032_Mean_3", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Minimum Reproducible Example

I was running the example code with my own data.
train_X is a list of string,
train_y is numpy array of int label

model = LanguageModelClassifier()
model.fit(train_X, train_y)
predictions = model.predict(test_X)
model.save('test_model/model_0') 

Expected behavior
A clear and concise description of what you expected to happen.

Additional context
Add any other context about the problem here.

class_weight of type log produces 0.0 weight on most frequent class

This makes that training ignores the most frequent class.

Reproduce example:

`import numpy
import pandas
from finetune.imbalance import compute_class_weights

np.random.seed(0)
y = pd.Series(np.random.choice(a=[0, 1, 2], size=1000, p=[0.3, 0.6, 0.1]))
print(y.value_counts(normalize=True))
print(compute_class_weights('log', y))`

Support for pre-training the language model

Is your feature request related to a problem? Please describe.
In order to use the classifier on different languages / specific domains it would be useful to be able to pretrain the language model.

Describe the solution you'd like
Calling .fit on a corpus (i.e.) no labels should train the language model.

model.fit(corpus)

Describe alternatives you've considered
Use the original repo which doesn't have a simple to use interface.

Validation fails with unsupervised pretraining

Describe the bug

When I attempt to pretrain a model using unlabeled data, I get an error if I have 50 or more examples:

ValueError: Cannot feed value of shape (2, 0) for Tensor 'Placeholder_3:0', which has shape '(?, 1)'

I'm guessing this is related to validation:

:param val_size: Validation set size as a percentage of all training data.  Validation will not be run by default if n_examples < 50.

Minimum Reproducible Example

It seems to be dataset independent, however this does the trick:

from sklearn.datasets import fetch_20newsgroups
dataset = fetch_20newsgroups()

from finetune import Classifier
model = Classifier()
model.fit(dataset.data[:50])

Note that if I instead take the slice dataset.data[:49] training succeeds.

A different way of doing the similarity/comparison task?

Hey! Thanks for the awsome work. I was wondering if I could use and update finetune to do the following:

Instead of using (Start, Text1, Delim, Text2, Extract) and (Start, Text2, Delim, Text1, Extract) as in the paper, can we use (Start, Text1, Extract) and (Start, Text2, Extract) separately through the transformer?

This could be thought of as obtaining sentence/document embeddings for Text1 and Text2 separately. Upon doing that, I would like to compare their similarity using a distance metric such as cosine distance. (i.e. train the transformer as a siamese network.)

Would you suggest I build such a model on top of a fork of finetune?

Speed issue

Looking at #107 there is a notebook reported order of 10k iterations per second on the GPU see this notebook I'm getting <10 iterations per second see this notebook

Any ideas of what is going on? I tried 0.3.1 but that didn't speed anything up.

Numerical features

Sorry if this is a stupid question..

I'm curious if it's possible to have numerical features using this model? The documentation says that X should be an array of text.

Thanks for your time and really nice project

implementation of ExponentialMovingAverage is not correct

def get_ema_if_exists(v, gvs):
    name = v.name.split(':')[0]
    ema_name = name+'/ExponentialMovingAverage:0'
    ema_v = [v for v in gvs if v.name == ema_name]
    if len(ema_v) == 0:
        ema_v = [v]
    return ema_v[0]

def get_ema_vars(*vs):
    if tf.get_variable_scope().reuse:
        gvs = tf.global_variables()
        vs = [get_ema_if_exists(v, gvs) for v in vs]
    if len(vs) == 1:
        return vs[0]
    else:
        return vs

g, b = get_ema_vars(g, b) I think g, b is the original tensor, not the ema

Possibility to finetune language model only (unsupervised)

Hi all,

Is it possible to finetune in unsupervised way, like language model firstly/only?

I have a lot of unlabelled data and just a hundreds of labelled examples, so I'd like to firstly finetune LM in unsupervised way and then finetune on specific supervised task. Such process was described in ULMFit paper.

It might be also pretty useful for getting better deep representation if you don't have labelled examples at all.

Regressor predict output is None

Describe the bug
After fitting the Regressor model, calling predict returns None instead of list or array of predictions.

Minimum Reproducible Example
import numpy as np
from finetune import Regressor

x_test = np.array(['the quick fox jumped over the lazy brown dog'] * 100)
y_test = np.random.random(100)
model_test = Regressor(n_epochs=1, val_interval=100/2/3)
model_test.fit(x_test, y_test)
model_test.predict(x_test) # Returns None

Additional context
Finetune was installed from source, 0.3.1 master branch. Environment is google colab python3 (gpu).

Model load fails if tensorboard directory is inaccessible

I can't load a model on a different computer, because the home directory is different, which causes this:

if self.config.tensorboard_folder is not None:
self.estimator_dir = os.path.abspath(
os.path.join(self.config.tensorboard_folder, str(int(time.time())))
)
pathlib.Path(self.estimator_dir).mkdir(parents=True, exist_ok=True)
self.cleanup_glob = None

to fail with Permission denied.

I worked around it with this script:

import joblib
import sys
p = sys.argv[1]
a, b = joblib.load(p)

b.config.tensorboard_folder = None

joblib.dump((a,b), p+".export")

but I feel like this case should be handled by the library.

Out of Memory on Small Dataset

Describe the bug
When attempting to train a classifier on a small dataset of 8,000 documents, I get an out of memory error and the script stops running.

Minimum Reproducible Example
Version of finetune = 0.4.1
Version of tensorflow-gpu = 1.8.0
Version of cuda = release 9.0, V9.0.176
Windows 10 Pro
Load a dataset of documents (X_train) and labels (Y_train), where each document and label is simply a string.
model = finetune.Classifier(max_length = 256, batch_size = 1) #tried reducing the memory footprint
model.fit(X_train, Y_train)

Expected behavior
I expected the model to train, but it doesn't manage to start training.

Additional context
I get the following warnings in the jupyter notebook:

C:\Users...\Python35\site-packages\finetune\encoding.py:294: UserWarning: Some examples are longer than the max_length. Please trim documents or increase max_length. Fallback behaviour is to use the first 254 byte-pair encoded tokens
"Fallback behaviour is to use the first {} byte-pair encoded tokens".format(max_length - 2)
C:\Users...\Python35\site-packages\finetune\encoding.py:233: UserWarning: Document is longer than max length allowed, trimming document to 256 tokens.
max_length
C:\Users...\tensorflow\python\ops\gradients_impl.py: 100: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
"Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
WARNING:tensorflow:From C:\Users...\tensorflow\python\util\tf_should_use.py:118: initialize_variables (from tensorflow.python.ops.variables) is deprecated and will be removed after 2017-03-02.
Instructions for updating:
Use tf.variables_initializer instead.

And then I get the following diagnostic info showing up in the command prompt:

2018-10-04 17:26:36.920118: I T:\src\github\tensorflow\tensorflow\core\platform\cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2018-10-04 17:26:37.716883: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1356] Found device 0 with properties:
name: Quadro M1200 major: 5 minor: 0 memoryClockRate(GHz): 1.148
pciBusID: 0000:01:00.0
totalMemory: 4.00GiB freeMemory: 3.35GiB
2018-10-04 17:26:37.725637: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1435] Adding visible gpu devices: 0
2018-10-04 17:26:38.412484: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-10-04 17:26:38.417413: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:929] 0
2018-10-04 17:26:38.419392: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:942] 0: N
2018-10-04 17:26:38.421353: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1053] Created TensorFlow device (/device:GPU:0 with 3083 MB memory) -> physical GPU (device: 0, name: Quadro M1200, pci bus id: 0000:01:00.0, compute capability: 5.0)
[I 17:28:26.081 NotebookApp] Saving file at /projects/language-models/Finetune Package.ipynb
2018-10-04 17:29:14.118663: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1435] Adding visible gpu devices: 0
2018-10-04 17:29:14.123595: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-10-04 17:29:14.127649: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:929] 0
2018-10-04 17:29:14.135411: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:942] 0: N
2018-10-04 17:29:14.138698: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3083 MB memory) -> physical GPU (device: 0, name: Quadro M1200, pci bus id: 0000:01:00.0, compute capability: 5.0)
2018-10-04 17:30:06.881174: W T:\src\github\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:275] Allocator (GPU_0_bfc) ran out of memory trying to allocate 9.00MiB. Current allocation summary follows.
2018-10-04 17:30:06.900550: I T:\src\github\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:630] Bin (256):
Total Chunks: 60, Chunks in use: 60. 15.0KiB allocated for chunks. 15.0KiB in use in bin. 312B client-requested in use in bin.
2018-10-04 17:30:06.929551: I T:\src\github\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:630] Bin (512):
Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2018-10-04 17:30:06.964647: I T:\src\github\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:630] Bin (1024): Total Chunks: 2, Chunks in use: 2. 2.5KiB allocated for chunks. 2.5KiB in use in bin. 2.0KiB client-requested in use in bin.
2018-10-04 17:30:06.995394: I T:\src\github\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:630] Bin (2048): Total Chunks: 532, Chunks in use: 532. 1.56MiB allocated for chunks. 1.56MiB in use in bin. 1.56MiB client-requested in use in bin.
2018-10-04 17:30:07.031613: I T:\src\github\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:630] Bin (4096): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2018-10-04 17:30:07.061013: I T:\src\github\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:630] Bin (8192): Total Chunks: 137, Chunks in use: 137. 1.39MiB allocated for chunks. 1.39MiB in use in bin. 1.39MiB client-requested in use in bin.
2018-10-04 17:30:07.093603: I T:\src\github\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:630] Bin (16384): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2018-10-04 17:30:07.130530: I T:\src\github\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:630] Bin (32768): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2018-10-04 17:30:07.170321: I T:\src\github\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:630] Bin (65536): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2018-10-04 17:30:07.212730: I T:\src\github\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:630] Bin (131072): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2018-10-04 17:30:07.246329: I T:\src\github\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:630] Bin (262144): Total Chunks: 2, Chunks in use: 2. 512.0KiB allocated for chunks. 512.0KiB in use in bin. 512.0KiB client-requested in use in bin.
2018-10-04 17:30:07.288640: I T:\src\github\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:630] Bin (524288): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2018-10-04 17:30:07.303248: I T:\src\github\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:630] Bin (1048576): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2018-10-04 17:30:07.332990: I T:\src\github\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:630] Bin (2097152): Total Chunks: 71, Chunks in use: 71. 159.75MiB allocated for chunks. 159.75MiB in use in bin. 159.75MiB client-requested in use in bin.
2018-10-04 17:30:07.364897: I T:\src\github\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:630] Bin (4194304): Total Chunks: 69, Chunks in use: 68. 466.99MiB allocated for chunks. 459.00MiB in use in bin. 459.00MiB client-requested in use in bin.
2018-10-04 17:30:07.396862: I T:\src\github\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:630] Bin (8388608): Total Chunks: 140, Chunks in use: 140. 1.23GiB allocated for chunks. 1.23GiB in use in bin. 1.23GiB client-requested in use in bin.
2018-10-04 17:30:07.428029: I T:\src\github\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:630] Bin (16777216): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2018-10-04 17:30:07.464813: I T:\src\github\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:630] Bin (33554432): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2018-10-04 17:30:07.494067: I T:\src\github\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:630] Bin (67108864): Total Chunks: 10, Chunks in use: 10. 1.17GiB allocated for chunks. 1.17GiB in use in bin. 1.17GiB client-requested in use in bin.
2018-10-04 17:30:07.524156: I T:\src\github\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:630] Bin (134217728): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2018-10-04 17:30:07.550345: I T:\src\github\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:630] Bin (268435456): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2018-10-04 17:30:07.578392: I T:\src\github\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:646] Bin for 9.00MiB was 8.00MiB, Chunk State:
2018-10-04 17:30:07.600123: I T:\src\github\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:665] Chunk at 0000000801980000 of size 1280
2018-10-04 17:30:07.629493: I T:\src\github\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:665] Chunk at 0000000801980500 of size 1280
2018-10-04 17:30:07.649189: I T:\src\github\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:665] Chunk at 0000000801980A00 of size 125144064
2018-10-04 17:30:07.676965: I T:\src\github\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:665] Chunk at 00000008090D9600 of size 7077888
2018-10-04 17:30:07.699245: I T:\src\github\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:665] Chunk at 0000000809799600 of size 3072
2018-10-04 17:30:07.718738: I T:\src\github\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:665] Chunk at 000000080979A200 of size 3072

...and so on. This is, in my opinion a pretty small dataset and I've made the max characters pretty small so I don't think this is a hardware limitation, but a bug.

Loading a model from 0.4.1 in 0.5.11

Describe the bug
After saving a model on 5.10 using Classifier.save("my_model.bin"), upgrading to 5.11.
Loading using Classifier.load("my_model.bin") results in KeyError: 'base_model_path'

Serving a model

Is your feature request related to a problem? Please describe.
Serving a trained model in production.

Describe the solution you'd like
I'd like to understand how to interface with tensorflow.

Describe alternatives you've considered
I'm able to save and load a model, but not sure how to restore and serve it using TF.

Slow unsupervised training

Thank you for your library, the supervised finetuning works very well. However, when I try to train on unlabelled data ( model.fit(unlabeledX) ), the training is much slower (9s/it) compared to supervised training (1.7s/it). This is on one K80 gpu. I am not sure why unsupervised training is slower, as doesn't the supervised training tune the language model as well?

Very slow inference in 0.5.11

After training a default classifier, saving and loading.
model.predict("lorem ipsum") and model.predict_prob take in average 14 seconds even on a hefty server such as AWS p3.16Xlarge.

Cannot specify max_length more than 512

Hello,

I've tried to use max_length more than 512 to featurize text:

model = finetune.Classifier()
trn_X_q_vecs = model.featurize(trn_X_q, max_length=1000)

But I got the following exception:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-7-d3d9e8b820e5> in <module>()
----> 1 trn_X_q_vecs = model.featurize(trn_X_q, max_length=1000)

/opt/conda/lib/python3.6/site-packages/finetune/classifier.py in featurize(self, X, max_length)
     24         :returns: np.array of features of shape (n_examples, embedding_size).
     25         """
---> 26         return super().featurize(X, max_length=max_length)
     27 
     28     def predict(self, X, max_length=None):

/opt/conda/lib/python3.6/site-packages/finetune/base.py in featurize(self, *args, **kwargs)
    386         These features are the same that are fed into the target_model.
    387         """
--> 388         return self._featurize(*args, **kwargs)
    389 
    390     @classmethod

/opt/conda/lib/python3.6/site-packages/finetune/base.py in _featurize(self, Xs, max_length)
    371             warnings.filterwarnings("ignore")
    372             max_length = max_length or self.config.max_length
--> 373             for xmb, mmb in self._infer_prep(Xs, max_length=max_length):
    374                 feature_batch = self.sess.run(self.features, {
    375                     self.X: xmb,

/opt/conda/lib/python3.6/site-packages/finetune/base.py in _infer_prep(self, Xs, max_length)
    400     def _infer_prep(self, Xs, max_length=None):
    401         max_length = max_length or self.config.max_length
--> 402         arr_encoded = self._text_to_ids(Xs, max_length=max_length)
    403         n_batch_train = self.config.batch_size * max(len(self.config.visible_gpus), 1)
    404         self._build_model(n_updates_total=0, target_dim=self.target_dim, train=False)

/opt/conda/lib/python3.6/site-packages/finetune/base.py in _text_to_ids(self, Xs, Y, max_length)
    156         else:
    157             encoder_out = self.encoder.encode_multi_input(Xs, Y=Y, max_length=max_length)
--> 158             return self._array_format(encoder_out)
    159 
    160 

/opt/conda/lib/python3.6/site-packages/finetune/base.py in _array_format(self, encoded_output)
    421         for i, seq_length in enumerate(seq_lengths):
    422             # BPE embedding
--> 423             x[i, :seq_length, 0] = encoded_output.token_ids[i]
    424             # masking: value of 1 means "consider this in cross-entropy LM loss"
    425             mask[i, 1:seq_length] = 1

ValueError: cannot copy sequence with size 667 to array axis with dimension 512

Default model no longer seems coherent

Describe the bug

It doesn't seem like the default model is pretrained in the latest development branch.

In prior versions, the default model generated coherent text with generate_text() using a wide variety of seed words. With the current default model I haven't been able to generate any coherent text at all. This includes seeding with many different words.

I'm mostly trying to use this as a sanity check that things are working. I don't mean that the generated text would need to be the same as prior versions, but this is giving me the impression that either the model is no longer pretrained or something went wrong in loading the model. Is it still expected that the default model is pretrained?

Minimum Reproducible Example

>>> import finetune
>>> finetune.__version__
'0.5.9'

The current version outputs things along these lines, regardless of seed word:

>>> from finetune import Classifier
>>> model = Classifier()
>>> model.generate_text('potatoes')
`'_start_potatoes " \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n greyson \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n greyson \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n , \n \n \n \n \n greyson \n \n greyson greyson greyson greyson greyson greyson greyson greyson greyson \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n greyson \n \n \n \n \n \n \n \n '`

Expected behavior

>>> import finetune
>>> finetune.__version__
'0.4.1'

This older version would generate text along these lines with a wide variety of seed words:

>>> from finetune import Classifier
>>> model = Classifier()
>>> model.generate_text('potatoes')
`'_start_potatoes , " she said . \n " i do n\'t know what you mean . " \n " you \'re the one who said you wanted to be a chef . " \n " i did ? " \n " yes . " \n " i do n\'t know what you \'re talking about . " \n " you do n\'t have to . " \n " i do n\'t ? " \n " no . " \n " i do n\'t ? " \n " no . " \n " why not ? " \n " because i do n\'t want to . " \n " why not ? " \n " because i do n\'t want to . " \n " why not ? " \n " because i do n\'t want to . " \n " why not ? " \n " because i do n\'t want to . " \n " why not ? " \n " because i do n\'t want to . " \n " why not ? " \n " because i do n\'t want to . " \n " why not ? " \n " because i do n\'t want to . " \n " why not ? " \n " because i do n\'t want to . " \n " why not ? " \n " because i do n\'t want to . " \n " why not ? " \n " because i do n\'t want to . " \n " why not ? " \n " because i do n\'t want to . " \n " why not ? " \n " because i do n\'t want to . " \n " why not ? " \n " because i do n\'t want to . " \n " why not ? " \n " because i do n\'t want to . " \n " why not ? " \n " because i do n\'t want to . " \n " why not ? " \n " because i do n\'t want to . " \n " why not ? " \n " because i do n\'t want to . " \n " why not ? " \n " because i do n\'t want to . " \n " why not ? " \n " because i do n\'t want to . " \n " why not ? " \n " because i do n\'t want to . " \n " why not ? " \n " because i do n\'t want to . " \n " why not ? " \n " because i do n\'t want to . " \n " why not ? " \n " because i do n\'t want to . " \n " why not ? " \n " because i do n\'t want to . " \n " why not ? " \n " because i do n\'t want to . " \n " why not ? " \n " because i do n\'t want to . " \n " why not '`

Gradual Unfreezing Question

I am hoping to implement gradual unfreezing while finetuning for an article classification task. I see the config setting called num_layers_trained. I thought I could change this setting after each epoch of finetuning, but it seems like the setting is only used during initialization. Is there a recommended way to accomplish this? Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.