tensorflow / skflow Goto Github PK
View Code? Open in Web Editor NEWSimplified interface for TensorFlow (mimicking Scikit Learn) for Deep Learning
License: Apache License 2.0
Simplified interface for TensorFlow (mimicking Scikit Learn) for Deep Learning
License: Apache License 2.0
Currently, if a dataset has categorical variables like gender or education - it requires additional processing to get it into one-hot form. Embeddings allow to use IDs to lookup distributed representations for categories. It should be easy to use and combine with regular features.
When I run the python setup.py install
the first time it went well but after I ran the python setup.py installl
all the modules that skflow depends to were broken. The following is the detail information.
Python 2.7.11 (default, Dec 23 2015, 12:23:20)
[GCC 4.2.1 Compatible Apple LLVM 7.0.2 (clang-700.1.81)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/tsangbosco/.pyenv/versions/2.7.11/lib/python2.7/site-packages/numpy/__init__.py", line 180, in <module>
from . import add_newdocs
File "/Users/tsangbosco/.pyenv/versions/2.7.11/lib/python2.7/site-packages/numpy/add_newdocs.py", line 13, in <module>
from numpy.lib import add_newdoc
File "/Users/tsangbosco/.pyenv/versions/2.7.11/lib/python2.7/site-packages/numpy/lib/__init__.py", line 8, in <module>
from .type_check import *
File "/Users/tsangbosco/.pyenv/versions/2.7.11/lib/python2.7/site-packages/numpy/lib/type_check.py", line 11, in <module>
import numpy.core.numeric as _nx
File "/Users/tsangbosco/.pyenv/versions/2.7.11/lib/python2.7/site-packages/numpy/core/__init__.py", line 58, in <module>
from numpy.testing import Tester
File "/Users/tsangbosco/.pyenv/versions/2.7.11/lib/python2.7/site-packages/numpy/testing/__init__.py", line 14, in <module>
from .utils import *
File "/Users/tsangbosco/.pyenv/versions/2.7.11/lib/python2.7/site-packages/numpy/testing/utils.py", line 15, in <module>
from tempfile import mkdtemp
File "/Users/tsangbosco/.pyenv/versions/2.7.11/lib/python2.7/tempfile.py", line 32, in <module>
import io as _io
File "build/bdist.macosx-10.11-x86_64/egg/io/__init__.py", line 16, in <module>
File "skflow/__init__.py", line 17, in <module>
import tensorflow as tf
File "/Users/tsangbosco/.pyenv/versions/2.7.11/lib/python2.7/site-packages/tensorflow/__init__.py", line 23, in <module>
from tensorflow.python import *
File "/Users/tsangbosco/.pyenv/versions/2.7.11/lib/python2.7/site-packages/tensorflow/python/__init__.py", line 43, in <module>
raise ImportError(msg)
ImportError: Traceback (most recent call last):
File "/Users/tsangbosco/.pyenv/versions/2.7.11/lib/python2.7/site-packages/tensorflow/python/__init__.py", line 37, in <module>
from tensorflow.core.framework.graph_pb2 import *
File "/Users/tsangbosco/.pyenv/versions/2.7.11/lib/python2.7/site-packages/tensorflow/core/framework/graph_pb2.py", line 8, in <module>
from google.protobuf import reflection as _reflection
File "/Users/tsangbosco/.pyenv/versions/2.7.11/lib/python2.7/site-packages/google/protobuf/reflection.py", line 58, in <module>
from google.protobuf.internal import python_message as message_impl
File "/Users/tsangbosco/.pyenv/versions/2.7.11/lib/python2.7/site-packages/google/protobuf/internal/python_message.py", line 53, in <module>
from io import BytesIO
ImportError: cannot import name BytesIO
Tensorflow 0.6.0 is not officially released, but importing skflow with tensorflow 0.6.0 installed raises an error.
AttributeError Traceback (most recent call last)
in ()
----> 1 import skflow
/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/skflow/init.py in ()
27
28 from skflow.trainer import TensorFlowTrainer
---> 29 from skflow import models, data_feeder
30 from skflow import preprocessing
31
/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/skflow/models.py in ()
16 import tensorflow as tf
17
---> 18 from skflow.ops import mean_squared_error_regressor, softmax_classifier, dnn
19
20
/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/skflow/ops/init.py in ()
16
17 from skflow.ops.conv_ops import *
---> 18 from skflow.ops.dnn_ops import *
19 from skflow.ops.embeddings_ops import *
20 from skflow.ops.losses_ops import *
/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/skflow/ops/dnn_ops.py in ()
17
18 import tensorflow as tf
---> 19 from tensorflow.models.rnn import linear
20
21
/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/tensorflow/models/rnn/linear.py in ()
24 import tensorflow as tf
25
---> 26 linear = tf.nn.linear
AttributeError: module 'tensorflow.python.ops.nn' has no attribute 'linear'
Show an example of Neural Translation (Sequence to Sequence) model, which can showcase:
Hello
Today I tried to use early_stopping feature (as here), and I found that my current skflow cannot find the keyword early_stopping_rounds
.
I tried to reinstall skflow but it cannot upgrade because the version are the same (0.0.1). I have to first uninstall skflow then install it again.
So I think it would be good to update the version of skflow to update easier.
I'm hoping that I just overlooked something but I can't seem recover the correct weights for simple linear test cases using TensorFlowLinearRegressor
. Here's a gist of the full test file (with pytest
) and matching test case using scikit-learn LinearRegression
. As the comment in the gist says, the scikit module recovers the weights faithfully while skflow returns odd weights.
Package details:
Python 3.5
tensorflow 0.6.0
skflow 0.0.1
I coded up a quick linear regression module myself using tensorflow as backend (using the SGD optimizer) and it was able to recover the weights using the same iterations, learning rate, etc... as default TensorFlowLinearRegressor
so I'm curious what I'm doing wrong with skflow (or the test case) or if there's an issue somewhere.
Thanks for the comprehensive work on the library. Looking forward to using it a bunch.
The import of sklearn.utils.validation.NotFittedError
in skflow/estimators/base.py
is no longer correct with the latest version if scikit-learn. In particular, this commit to scikit-learn moves NotFittedError
to sklearn.exceptions.NotFittedError
.
Which versions of scikit-learn is skflow expected to support? (I would send a PR, but am not sure how you want to deal with backwards compatibility here.)
(Source: this StackOverflow question.)
Restore is still not working for the text_classification.py example. I am getting the following exception:
I tensorflow/core/common_runtime/local_device.cc:40] Local device intra op parallelism threads: 4
I tensorflow/core/common_runtime/direct_session.cc:58] Direct session inter op parallelism threads: 4
Traceback (most recent call last):
File "/Users/harsimranb/PycharmProjects/TensorFlowTest/TextRNN.py", line 103, in <module>
runner.run_classification("amigo what time you close?")
File "/Users/harsimranb/PycharmProjects/TensorFlowTest/TextRNN.py", line 56, in run_classification
classifier = skflow.TensorFlowEstimator.restore(self.trained_model_path)
File "/Users/harsimranb/Library/Python/2.7/lib/python/site-packages/skflow/__init__.py", line 332, in restore
estimator._restore(path)
File "/Users/harsimranb/Library/Python/2.7/lib/python/site-packages/skflow/__init__.py", line 314, in _restore
self._saver.restore(self._session, checkpoint_path)
File "/Users/harsimranb/Library/Python/2.7/lib/python/site-packages/tensorflow/python/training/saver.py", line 891, in restore
sess.run([self._restore_op_name], {self._filename_tensor_name: save_path})
File "/Users/harsimranb/Library/Python/2.7/lib/python/site-packages/tensorflow/python/client/session.py", line 368, in run
results = self._do_run(target_list, unique_fetch_targets, feed_dict_string)
File "/Users/harsimranb/Library/Python/2.7/lib/python/site-packages/tensorflow/python/client/session.py", line 446, in _do_run
six.reraise(e_type, e_value, e_traceback)
File "/Users/harsimranb/Library/Python/2.7/lib/python/site-packages/tensorflow/python/client/session.py", line 428, in _do_run
target_list)
tensorflow.python.pywrap_tensorflow.StatusNotOK: Internal: Unable to get element from the feed.
Related to #40
DBPedia classification like in "Character-level Convolutional Networks for Text Classification" is a good example to showcase:
Is there any way of storing a model? I'm trying to pickle it with pickle and dill but it does not work...
Thank you
Hello all
As I know, there are several ways to optimize DNN. I think the most important parameters for DNN should be hidden_units
.
Is it possible to somehow find best hidden_units
automatically?
Hi,
I'm trying to run the text_classification examples and I get the following error.
TypeError: Input 'values' of 'HistogramSummary' Op has type int64 that does not match expected type of float32.
The stack trace is below:
Traceback (most recent call last):
File "text_classification.py", line 82, in
classifier.fit(X_train, y_train, logdir='/tmp/tf_examples/word_rnn')
File "/usr/local/lib/python2.7/site-packages/skflow/estimators/base.py", line 186, in fit
self._setup_training()
File "/usr/local/lib/python2.7/site-packages/skflow/estimators/base.py", line 119, in _setup_training
tf.histogram_summary("X", self._inp)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/ops/summary_ops.py", line 39, in histogram_summary
tag=tag, values=values, name=scope)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/ops/gen_summary_ops.py", line 34, in _histogram_summary
name=name)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/ops/op_def_library.py", line 405, in apply_op
(prefix, types_lib.as_dtype(input_arg.type).name))
TypeError: Input 'values' of 'HistogramSummary' Op has type int64 that does not match expected type of float32.
Am I doing something wrong? I installed tensorflow and skflow as mentioned in the medium tutorial.
Reproducible example: (master branch with update for dask stuff)
from skflow.io import *
import skflow
from sklearn import datasets
import random
random.seed(42)
iris = datasets.load_iris()
data = pd.DataFrame(iris.data)
data = dd.from_pandas(data, npartitions=2)
labels = pd.DataFrame(iris.target)
labels = dd.from_pandas(labels, npartitions=2)
classifier = skflow.TensorFlowLinearClassifier(n_classes=3)
classifier.fit(data, labels)
gives:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-1-8e0c6d4b2deb> in <module>()
11 labels = dd.from_pandas(labels, npartitions=2)
12 classifier = skflow.TensorFlowLinearClassifier(n_classes=3)
---> 13 classifier.fit(data, labels)
/Library/Python/2.7/site-packages/skflow-0.0.1-py2.7.egg/skflow/estimators/base.pyc in fit(self, X, y, logdir)
185 if not self.continue_training or not self._initialized:
186 # Sets up model and trainer.
--> 187 self._setup_training()
188 # Initialize model parameters.
189 self._trainer.initialize(self._session)
/Library/Python/2.7/site-packages/skflow-0.0.1-py2.7.egg/skflow/estimators/base.pyc in _setup_training(self)
118 # Add histograms for X and y if they are floats.
119 if self._data_feeder.input_dtype in (np.float32, np.float64):
--> 120 tf.histogram_summary("X", self._inp)
121 if self._data_feeder.output_dtype in (np.float32, np.float64):
122 tf.histogram_summary("y", self._out)
/Library/Python/2.7/site-packages/tensorflow/python/ops/summary_ops.pyc in histogram_summary(tag, values, collections, name)
37 with ops.op_scope([tag, values], name, "HistogramSummary") as scope:
38 val = gen_summary_ops._histogram_summary(
---> 39 tag=tag, values=values, name=scope)
40 _Collect(val, collections, [ops.GraphKeys.SUMMARIES])
41 return val
/Library/Python/2.7/site-packages/tensorflow/python/ops/gen_summary_ops.pyc in _histogram_summary(tag, values, name)
32 """
33 return _op_def_lib.apply_op("HistogramSummary", tag=tag, values=values,
---> 34 name=name)
35
36
/Library/Python/2.7/site-packages/tensorflow/python/ops/op_def_library.pyc in apply_op(self, op_type_name, g, name, **keywords)
403 if input_arg.type != types_pb2.DT_INVALID:
404 raise TypeError("%s expected type of %s." %
--> 405 (prefix, types_lib.as_dtype(input_arg.type).name))
406 else:
407 raise TypeError(
TypeError: Input 'values' of 'HistogramSummary' Op has type float64 that does not match expected type of float32.
Not sure exactly what happened but I tried to add more supported types in https://github.com/tensorflow/skflow/blob/master/skflow/estimators/base.py#L119
-- still getting this error.
Anything I missed?
Hi,
I think it would be good to provide examples and make the directory examples be a list of recipe to use skflow.
Thanks for the tutorial. It was very easy to follow. Like many might do, I tested my new knowledge by playing with the other categorical variables but this led to a difficult to understand stack trace.
To reproduce, just replace "Embarked" with "Pclass" in the lines where X
and `embarked_classes" get assigned. When trying to fit the model this will throw index out of range errors.
I eventually fixed it by increasing n_classes
to unique classes plus 1. I think the error stems from the embedding wanting a row for nan
(the Embarked column has missing values but Pclass does not) but I could not get my head around the code to confirm the actual source.
I am only mentioning this as it is a beginners tutorial and the most basic fix is a comment in the code to explain that n_classes
needs to account for nan
(assuming I am understanding the error correctly). I also wonder if n_classes
should just get handled inside categorical_variable()
.
I'm looking at the text_classification.py.
classifier.predict(X-test)
gets the class number with the highest probability. But I wonder how to get the probabilities for all the classes per input.
Thanks in advance!
Hi,
I am using skflow.ops.dnn
to classify two - classes dataset (True and False). The percentage of True example is very small, so I have an imbalanced dataset.
It seems to me that one way to resolve the issue is to use weighted classes. However, when I look to the implementation of skflow.ops.dnn
, I do not know how could I do weighted classes with DNN.
Is it possible to do that with skflow, or is there another technique to deal with imbalanced dataset problem in skflow?
Thanks
I just installed Scikit Flow and tried to execute the simple linear classification example. Unfortunately this does not work:
/home/chris/anaconda3/lib/python3.4/site-packages/skflow/io/pandas_io.py in extract_pandas_data(data)
27 def extract_pandas_data(data):
28 """Extract data from pandas.DataFrame for predictors"""
---> 29 if not isinstance(data, pd.DataFrame):
30 return data
31
AttributeError: 'module' object has no attribute 'DataFrame'
I'm using Python 3.4, Tensorflow 0.6, Sckit-Learn 0.17 and Pandas 0.16.2
I am using the text_classification.py example. I have added the following code to save and restore a model:
To save:
classifier = skflow.TensorFlowEstimator(model_fn=self.rnn_model, n_classes=15,
steps=1000, optimizer='Adam', learning_rate=0.01,
continue_training=True
To restore:
classifier = skflow.TensorFlowEstimator.restore(self.trained_model_path)
On restore, I'm getting the following error:
Traceback (most recent call last):
File "/Users/me/PycharmProjects/TensorFlowTest/TextRNN.py", line 56, in run_classification
classifier = skflow.TensorFlowEstimator.restore(self.trained_model_path)
File "/Users/me/Library/Python/2.7/lib/python/site-packages/skflow/__init__.py", line 317, in restore
estimator = eval(model_def)
File "<string>", line 2
model_fn=<bound method TextRNN.rnn_model of <__main__.TextRNN object at 0x10ef3f450>>,
^
SyntaxError: invalid syntax
Within the restore method, model_def looks like this:
TensorFlowEstimator(batch_size=32, continue_training=True, learning_rate=0.01,
model_fn=<bound method TextRNN.rnn_model of <__main__.TextRNN object at 0x10ef3f450>>,
n_classes=15, num_cores=4, optimizer='Adam', steps=1000,
tf_master='', tf_random_seed=42, verbose=1)
I am doing regression with DNN.
Final MSE for contiguous run of 200 steps: 1.45781016655
Final MSE for 4 runs with 50 steps each: 1.44524233948
Score for contiguous run:
Step #1, epoch #1, avg. loss: 27.95941
Step #21, epoch #21, avg. loss: 5.64051
Step #41, epoch #41, avg. loss: 1.78990
Step #61, epoch #61, avg. loss: 1.53639
Step #81, epoch #81, avg. loss: 1.49865
Step #101, epoch #101, avg. loss: 1.48255
Step #121, epoch #121, avg. loss: 1.47312
Step #141, epoch #141, avg. loss: 1.46747
Step #161, epoch #161, avg. loss: 1.46394
Step #181, epoch #181, avg. loss: 1.46122
Score for 4 runs 50 steps each:
Step #1, epoch #1, avg. loss: 27.95941
Step #6, epoch #6, avg. loss: 13.49244
Step #11, epoch #11, avg. loss: 4.11436
Step #16, epoch #16, avg. loss: 2.69326
Step #21, epoch #21, avg. loss: 2.26197
Step #26, epoch #26, avg. loss: 2.02976
Step #31, epoch #31, avg. loss: 1.79997
Step #36, epoch #36, avg. loss: 1.71287
Step #41, epoch #41, avg. loss: 1.61699
Step #46, epoch #46, avg. loss: 1.56702
Step #51, epoch #1, avg. loss: 1.52925
Step #56, epoch #6, avg. loss: 1.52344
Step #61, epoch #11, avg. loss: 1.51318
Step #66, epoch #16, avg. loss: 1.50661
Step #71, epoch #21, avg. loss: 1.50114
Step #76, epoch #26, avg. loss: 1.49584
Step #81, epoch #31, avg. loss: 1.49099
Step #86, epoch #36, avg. loss: 1.48698
Step #91, epoch #41, avg. loss: 1.48371
Step #96, epoch #46, avg. loss: 1.48097
Step #101, epoch #1, avg. loss: 1.47760
Step #106, epoch #6, avg. loss: 1.47609
Step #111, epoch #11, avg. loss: 1.47386
Step #116, epoch #16, avg. loss: 1.47201
Step #121, epoch #21, avg. loss: 1.47048
Step #126, epoch #26, avg. loss: 1.46914
Step #131, epoch #31, avg. loss: 1.46795
Step #136, epoch #36, avg. loss: 1.46686
Step #141, epoch #41, avg. loss: 1.46591
Step #146, epoch #46, avg. loss: 1.46506
Step #151, epoch #1, avg. loss: 1.46384
Step #156, epoch #6, avg. loss: 1.46348
Step #161, epoch #11, avg. loss: 1.46276
Step #166, epoch #16, avg. loss: 1.46212
Step #171, epoch #21, avg. loss: 1.46144
Step #176, epoch #26, avg. loss: 1.46086
Step #181, epoch #31, avg. loss: 1.46028
Step #186, epoch #36, avg. loss: 1.45976
Step #191, epoch #41, avg. loss: 1.45914
Step #196, epoch #46, avg. loss: 1.45857
Numpy arrays can be serialized to disk and it's possible to do random seeks into them.
Implementing a DataFeeder for such data format will remove requirement to have full dataset in the memory and still do random seeks for sampling of batches.
In README, the custom model used logistic_classifier
, but there is no such thing in the implementation.
All the passed Parameter shape is correct . i have cross validated that .
Is there any way of determining the number of cores in OSX? I think it calculates it automatically but in my case, I see in the system Monitor a Python process with a %CPU of 150% while other algorithms like XGBOOST mark over 300%. Could I set it manually?
Note that I'm executing the iris_custom_model algorithm with a larger dataset
Thank you
As per title, it would be great to have the ability to build and train RNNs easily with this library.
Currently, feed dicts are working in the same thread as main training thread, which slows down the training loop by the time it takes to process and sample record.
A better option would be to run sampling in the thread and feed into Queue and then main thread will just take full batches out of the queue.
As a performance test, would be interesting to get a match between speed of sklearn.linear_model.LogisticRegression
and skflow.TensorFlowLinearClassifier
.
Support Saver:
Both of scikit-learn and tensorflow support Python 3.X
It would be great if skflow supports Python 3.X.
Hello
I try DNNMetaParameter optimization using GridSearchCV..
but not wok it
Statck Trace..
File "/usr/local/lib/python2.7/site-packages/sklearn/grid_search.py", line 804, in fit
return self._fit(X, y, ParameterGrid(self.param_grid))
File "/usr/local/lib/python2.7/site-packages/sklearn/grid_search.py", line 553, in _fit
for parameters in parameter_iterable
File "/usr/local/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 804, in call
while self.dispatch_one_batch(iterator):
File "/usr/local/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 662, in dispatch_one_batch
self._dispatch(tasks)
File "/usr/local/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 570, in _dispatch
job = ImmediateComputeBatch(batch)
File "/usr/local/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 183, in init
self.results = batch()
File "/usr/local/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 72, in call
return [func(_args, *_kwargs) for func, args, kwargs in self.items]
File "/usr/local/lib/python2.7/site-packages/sklearn/cross_validation.py", line 1531, in _fit_and_score
estimator.fit(X_train, y_train, **fit_params)
File "/usr/local/lib/python2.7/site-packages/skflow/init.py", line 163, in fit
self._setup_training()
File "/usr/local/lib/python2.7/site-packages/skflow/init.py", line 113, in _setup_training
self._inp, self._out)
File "/usr/local/lib/python2.7/site-packages/skflow/models.py", line 73, in dnn_estimator
layers = dnn(X, hidden_units)
File "/usr/local/lib/python2.7/site-packages/skflow/ops/dnn_ops.py", line 41, in dnn
for i, n_units in enumerate(hidden_units):
TypeError: 'NoneType' object is not iterable
Problem
"hidden_units" is None Type in GridSearchCV
"hidden_units" is member of TensorFlowEstimator but not member of TensorFlowDNNClassifier
https://github.com/google/skflow/blob/master/skflow/__init__.py#L415
GridSearchCV can'find "hidden_units"
There are number of problems that require multiple outputs (including RNNs).
Support needs to be added across all models and data feeders.
Hi, this is not an issue but wanted to get clear on some points or may be dumb questions.
I couldn't see parameters for early stopping and epoch (maybe missed it) But I saw steps. Is that equivalent to epoch or iterations.
Also, is unsupervised learning supported like in clustering , word2vec, self-taught type feature learning. Or there are future plans.
Hi,
I have a dataset with 24 inputs and 1 categorical output, so I am trying to adapt the example https://github.com/google/skflow/blob/master/examples/text_classification_character_cnn.py to my case.
However, in the example, I saw
byte_list = tf.reshape(skflow.ops.one_hot_matrix(X, 256),
[-1, MAX_DOCUMENT_LENGTH, 256, 1])
which I do not know how should I adapt to my code? Could you please help?
My data looks like:
input1 input2 ... input_n output
2 1.2 ... -0.44 "b"
1 0.2 ... 3.2 "f"
3 1 ... 2.1 "a"
Hi,
I suspect there is a error in _feed_dict_fn.
Why don't you save used sample indices? They could be repeated when you call _feed_dict_fn() again in the same epoch.
sample = random.randint(0, self.X.shape[0] - 1)
inp[i, :] = self.X[sample, :]
It is not an epoch but only one step for one batch.
for step in xrange(steps):
feed_dict = feed_dict_fn()
global_step, loss, _ = sess.run([self.global_step, self.loss, self.trainer], feed_dict=feed_dict)
I think, it must be something like this:
for step in xrange(steps):
for i in xrange(X.shape[0]/batch_size):
feed_dict = feed_dict_fn()
global_step, loss, _ = sess.run([self.global_step, self.loss, self.trainer], feed_dict=feed_dict)
Am I right? Or I have an incorrect guess?
HDF5 is a popular format to store complicated datasets. It also supports different random seekings.
It would be awesome to have a full support for it.
It would be nice if skflow had some support for validation sets to be used for early stopping and monitor validation set loss during training. This could be realized failry easily by adding a fraction_validationset
to the TensorFlowEstimator. Within fit
, the given training set could then be split into two parts.
in your example for multioutput regression, it seems that you should add the following import command, so the code will work properly
import skflow.ops
MP
Many datasets can't fit into memory, and TF doesn't actually requires the whole dataset in the memory.
Instead of loading the dataset into memory, it's possible to stream it using iterators and feed data that way.
Fit/predict functions in Estimators should be able to take X and y as iterators and read from them while processing data.
Great effort by Google to simplify using TensorFlow.
I use command
pip install git+git://github.com/google/skflow.git
to install it on my Mac OS 10.11 inside a VirtualEnv where the TF is also installed there, but I can't import the skflow
I checked V.Env python site-packages
and the skflow
folder is there.
Will be happy if I can test it, because I'm also eager to provide a wrapper for TF and I want to join this Library.
>>> import skflow
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "skflow.py", line 15, in <module>
classifier = skflow.TensorFlowDNNClassifier(hidden_units=[10, 20, 10],
AttributeError: 'module' object has no attribute 'TensorFlowDNNClassifier'
Currently, dropout is still applied at test time, which leads to incorrect results.
The proposed solution is to gather all dropout probability nodes and feed dict the probability 0 when run predict.
very good tutorial. I tried installing tensoflow using the pip and i successfully installed and i tried to run the rnn examples but that folder will not be installed when we use pip. So i got suggestion to install bazel to compile and build the rnn examples.Obviously i installed the python and numpy inorder to run the tensorflow examples and i am able to compile and run the rnn examples successfully.
To follow your tutorial obviously i need to install scipy, scikit and and finally skflow. I am using python 2.7. Could you please tell us how to install scipy , scikit and skflow with proper dependencies (using pip) and finally after installation, how to run skflow examples using bazel (command for compiling and running skflow examples)
I'm trying the tutorial on Medium (https://medium.com/@ilblackdragon/tensorflow-tutorial-part-2-9ffe47049c92#.608vwpu2a) and the first DNN have raised this error:
Traceback (most recent call last):
File "skflow_test.py", line 32, in <module>
deep.fit(X_train, y_train)
File "/Users/metjush/anaconda/lib/python2.7/site-packages/skflow/estimators/base.py", line 189, in fit
self._setup_training()
File "/Users/metjush/anaconda/lib/python2.7/site-packages/skflow/estimators/base.py", line 128, in _setup_training
self._inp, self._out)
File "/Users/metjush/anaconda/lib/python2.7/site-packages/skflow/estimators/dnn.py", line 76, in _model_fn
models.logistic_regression)(X, y)
File "/Users/metjush/anaconda/lib/python2.7/site-packages/skflow/models.py", line 91, in dnn_estimator
layers = dnn(X, hidden_units)
File "/Users/metjush/anaconda/lib/python2.7/site-packages/skflow/ops/dnn_ops.py", line 39, in dnn
tensor_in = tf.nn.rnn_cell.linear(tensor_in, n_units, True)
AttributeError: 'module' object has no attribute 'rnn_cell'
Is this a problem with my tensorflow installation? Or is there something different amiss?
I have an up-to-date installation of TF, this is what happens when I run pip install tensorflow --upgrade
:
Requirement already up-to-date: tensorflow in /Users/metjush/anaconda/lib/python2.7/site-packages
Requirement already up-to-date: six>=1.10.0 in /Users/metjush/anaconda/lib/python2.7/site-packages (from tensorflow)
Requirement already up-to-date: numpy>=1.9.2 in /Users/metjush/anaconda/lib/python2.7/site-packages (from tensorflow)
This is my Python setup:
Python 2.7.10 |Anaconda 2.3.0 (x86_64)| (default, Sep 15 2015, 14:29:08)
[GCC 4.2.1 (Apple Inc. build 5577)] on darwin
Hi, I love the idea of skflow and am trying to learn it :)
When I was running the exactly same code on the blog (https://medium.com/@ilblackdragon/tensorflow-tutorial-part-3-c5fc0662bc08#.jsxv1w8n9) in my local, I got the error below. I thought that the class variables would need to be float instead of integer and tried that but it didn't solve. Could someone help?
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-28-62de22c2cf78> in <module>()
15 classifier = skflow.TensorFlowEstimator(model_fn=categorical_model,
16 n_classes=2)
---> 17 classifier.fit(X_train, y_train)
18
19 print("Accuracy: {0}".format(metrics.accuracy_score(classifier.predict(X_test), y_test)))
/Users/a/anaconda/lib/python2.7/site-packages/skflow/estimators/base.pyc in fit(self, X, y, logdir)
166 if not self.continue_training or not self._initialized:
167 # Sets up model and trainer.
--> 168 self._setup_training()
169 # Initialize model parameters.
170 self._trainer.initialize(self._session)
/Users/a/anaconda/lib/python2.7/site-packages/skflow/estimators/base.pyc in _setup_training(self)
102
103 # Add histograms for X and y.
--> 104 tf.histogram_summary("X", self._inp)
105 tf.histogram_summary("y", self._out)
106
/Users/a/anaconda/lib/python2.7/site-packages/tensorflow/python/ops/summary_ops.pyc in histogram_summary(tag, values, collections, name)
37 with ops.op_scope([tag, values], name, "HistogramSummary") as scope:
38 val = gen_summary_ops._histogram_summary(
---> 39 tag=tag, values=values, name=scope)
40 _Collect(val, collections, [ops.GraphKeys.SUMMARIES])
41 return val
/Users/a/anaconda/lib/python2.7/site-packages/tensorflow/python/ops/gen_summary_ops.pyc in _histogram_summary(tag, values, name)
32 """
33 return _op_def_lib.apply_op("HistogramSummary", tag=tag, values=values,
---> 34 name=name)
35
36
/Users/a/anaconda/lib/python2.7/site-packages/tensorflow/python/ops/op_def_library.pyc in apply_op(self, op_type_name, g, name, **keywords)
403 if input_arg.type != types_pb2.DT_INVALID:
404 raise TypeError("%s expected type of %s." %
--> 405 (prefix, types_lib.as_dtype(input_arg.type).name))
406 else:
407 raise TypeError(
TypeError: Input 'values' of 'HistogramSummary' Op has type int64 that does not match expected type of float32.
Hi,
In the text classification example with RNN, I think the document is represented by bag-of-words method.
I want to apply Doc2Vec method, then I have X_train.shape is (20000,500) and X_test shape is (5000,500) with the values are float.
Then I applied
def rnn_model(X, y):
"""Recurrent neural network model to predict from sequence of words
to a class."""
# Convert indexes of words into embeddings.
# This creates embeddings matrix of [n_words, EMBEDDING_SIZE] and then
# maps word indexes of the sequence into [batch_size, sequence_length,
# EMBEDDING_SIZE].
word_vectors = skflow.ops.categorical_variable(X, n_classes=n_words,
embedding_size=EMBEDDING_SIZE, name='words')
# Split into list of embedding per word, while removing doc length dim.
# word_list results to be a list of tensors [batch_size, EMBEDDING_SIZE].
word_list = skflow.ops.split_squeeze(1, MAX_DOCUMENT_LENGTH, word_vectors)
# Create a Gated Recurrent Unit cell with hidden size of EMBEDDING_SIZE.
cell = rnn_cell.GRUCell(EMBEDDING_SIZE)
# Create an unrolled Recurrent Neural Networks to length of
# MAX_DOCUMENT_LENGTH and passes word_list as inputs for each unit.
_, encoding = rnn.rnn(cell, word_list, dtype=tf.float32)
# Given encoding of RNN, take encoding of last step (e.g hidden size of the
# neural network of last step) and pass it as features for logistic
# regression over output classes.
return skflow.models.logistic_regression(encoding[-1], y)
classifier = skflow.TensorFlowEstimator(model_fn=rnn_model, n_classes=15,
steps=1000, optimizer='Adam', learning_rate=0.01, continue_training=True)
# Continuesly train for 1000 steps & predict on test set.
while True:
classifier.fit(X_train, y_train, logdir='/tmp/tf_examples/word_rnn')
score = metrics.accuracy_score(classifier.predict(X_test), y_test)
print('Accuracy: {0:f}'.format(score))
then I met an error
TypeError: DataType float32 for attr 'Tindices' not in list of allowed values: int32, int64
What is a better way to apply Doc2Vec with RNN using skflow?
This sparkit-learn package recently came out that utilizes Spark to scale scikit-learn and aid GridSearch.
https://github.com/lensacom/sparkit-learn
Since skflow has a sklearn-like interface, it might be a good idea to show an example of using sparkit-learn to scale skflow with minor changes.
Blogpost can be found here: https://databricks.com/blog/2016/02/08/auto-scaling-scikit-learn-with-spark.html
PRs are welcomed for this.
Hi,
I've been working through some of the tutorials and they often use random.seed()
at the beginning. I tried playing with this value to see how it effected the output of a DNN and it doesn't change anything.
A little digging found that by default tf_random_seed
in dnn.py
is 42 and it must be specified when TensorFlowDNNClassifier()
in created if you want anything other than 42.
I found this somewhat confusing and, unless there are other reasons for setting a default random seed in dnn.py
(as opposed to the user doing this in their code), I would argue that the default behaviour should leave tf_random_seed
undefined. eg tf_random_seed=None
instead of tf_random_seed=42
.
Thanks for making this awesome project and tutorials, i'm finding them really helpful in my exploration of machine learning!
There's an error on restoring model when I executing a code below.
import skflow
from sklearn import datasets, metrics
iris = datasets.load_iris()
classifier = skflow.TensorFlowDNNClassifier(hidden_units=[10, 20, 10], n_classes=3)
classifier.fit(iris.data, iris.target)
classifier.save('test_model')
new_clf = skflow.TensorFlowEstimator.restore('test_model')
score = metrics.accuracy_score(new_clf.predict(iris.data), iris.target)
print("Accuracy: %f" % score)
Traceback (most recent call last):
File "dnn_save.py", line 8, in
new_clf = skflow.TensorFlowEstimator.restore('test_model')
File "/usr/local/lib/python2.7/dist-packages/skflow/init.py", line 353, in restore
estimator._restore(path)
File "/usr/local/lib/python2.7/dist-packages/skflow/init.py", line 325, in _restore
self._saver.restore(self._session, checkpoint_path)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 864, in restore
sess.run([self._restore_op_name], {self._filename_tensor_name: save_path})
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 345, in run
results = self._do_run(target_list, unique_fetch_targets, feed_dict_string)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 404, in _do_run
target_list)
tensorflow.python.pywrap_tensorflow.StatusNotOK: Internal: Unable to get element from the feed.
Hi,
Sorry I asked too much.
I think plotting is always a nice feature. Is it possible right now for skflow (or can we do that through tensorflow directly)?
Add an example of language model (RNN). For example character level on sheikspear book (similar to https://github.com/sherjilozair/char-rnn-tensorflow).
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.