tensorflow / skflow Goto Github PK

View Code? Open in Web Editor NEW

3.2K 162.0 441.0 779 KB

Simplified interface for TensorFlow (mimicking Scikit Learn) for Deep Learning

License: Apache License 2.0

Python 89.12% Shell 10.88%

skflow's People

Contributors

Stargazers

Watchers

Forkers

cbonnett lenguyenthedat dwyatte snazz2001 tpnguyen jokame mth4saurabh rockhowse ml-lab dmollaaliod yuyincug frol linshijie 7ujian ye-lun xubenben glenncameronjr fabienbaradel neocortex nkhuyu killedision mathkann yiiwood noelnamai liaobs garipan vibster erkhamion pombredanne elenita1221 prabhjotsl xypan1232 wubr2000 codeaudit ml-ai-nlp-ir ssatyacc pruthvishetty dharmogata dprop-developers nikolayvoronchikhin daishichao dfd rayleyva wombatpm caidongyun daodaoliang hulu12 kakamessi99 caohy1988 ezhangle ipv1337 lipengyu huiyi1990 manishch6652 michiexile shyamalschandra xsongx yangls06 grgomez xuxuanxuan wdbm jbrambledc avsolatorio likaiguo tnummy riyazbhat le02146 yonglehou priyamuurali riccitensor adam-singer kmitchner poldrack petrex raeed20 gitter-badger twinklestar93 shendasai winning1120xx suqi soledad89 qjay612 mydaisy2 vinodrajendran001 digidesk-io ycyoon aadilh spenthil zmoon111 zhangkom benjamwhite yoon-gu zhouruiapple davidnasar woshiqchi hoardboard anukat2015 chagri stephanekazmierczak hellios78

skflow's Issues

Support categorical variables out-of-the-box

Currently, if a dataset has categorical variables like gender or education - it requires additional processing to get it into one-hot form. Embeddings allow to use IDs to lookup distributed representations for categories. It should be easy to use and combine with regular features.

python setup.py install broke other packages in os X

When I run the python setup.py install the first time it went well but after I ran the python setup.py installl all the modules that skflow depends to were broken. The following is the detail information.

Python 2.7.11 (default, Dec 23 2015, 12:23:20) 
[GCC 4.2.1 Compatible Apple LLVM 7.0.2 (clang-700.1.81)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/tsangbosco/.pyenv/versions/2.7.11/lib/python2.7/site-packages/numpy/__init__.py", line 180, in <module>
    from . import add_newdocs
  File "/Users/tsangbosco/.pyenv/versions/2.7.11/lib/python2.7/site-packages/numpy/add_newdocs.py", line 13, in <module>
    from numpy.lib import add_newdoc
  File "/Users/tsangbosco/.pyenv/versions/2.7.11/lib/python2.7/site-packages/numpy/lib/__init__.py", line 8, in <module>
    from .type_check import *
  File "/Users/tsangbosco/.pyenv/versions/2.7.11/lib/python2.7/site-packages/numpy/lib/type_check.py", line 11, in <module>
    import numpy.core.numeric as _nx
  File "/Users/tsangbosco/.pyenv/versions/2.7.11/lib/python2.7/site-packages/numpy/core/__init__.py", line 58, in <module>
    from numpy.testing import Tester
  File "/Users/tsangbosco/.pyenv/versions/2.7.11/lib/python2.7/site-packages/numpy/testing/__init__.py", line 14, in <module>
    from .utils import *
  File "/Users/tsangbosco/.pyenv/versions/2.7.11/lib/python2.7/site-packages/numpy/testing/utils.py", line 15, in <module>
    from tempfile import mkdtemp
  File "/Users/tsangbosco/.pyenv/versions/2.7.11/lib/python2.7/tempfile.py", line 32, in <module>
    import io as _io
  File "build/bdist.macosx-10.11-x86_64/egg/io/__init__.py", line 16, in <module>

  File "skflow/__init__.py", line 17, in <module>
    import tensorflow as tf
  File "/Users/tsangbosco/.pyenv/versions/2.7.11/lib/python2.7/site-packages/tensorflow/__init__.py", line 23, in <module>
    from tensorflow.python import *
  File "/Users/tsangbosco/.pyenv/versions/2.7.11/lib/python2.7/site-packages/tensorflow/python/__init__.py", line 43, in <module>
    raise ImportError(msg)
ImportError: Traceback (most recent call last):
  File "/Users/tsangbosco/.pyenv/versions/2.7.11/lib/python2.7/site-packages/tensorflow/python/__init__.py", line 37, in <module>
    from tensorflow.core.framework.graph_pb2 import *
  File "/Users/tsangbosco/.pyenv/versions/2.7.11/lib/python2.7/site-packages/tensorflow/core/framework/graph_pb2.py", line 8, in <module>
    from google.protobuf import reflection as _reflection
  File "/Users/tsangbosco/.pyenv/versions/2.7.11/lib/python2.7/site-packages/google/protobuf/reflection.py", line 58, in <module>
    from google.protobuf.internal import python_message as message_impl
  File "/Users/tsangbosco/.pyenv/versions/2.7.11/lib/python2.7/site-packages/google/protobuf/internal/python_message.py", line 53, in <module>
    from io import BytesIO
ImportError: cannot import name BytesIO

Error raised when importing skflow with tensorflow 0.6.0 installed

Tensorflow 0.6.0 is not officially released, but importing skflow with tensorflow 0.6.0 installed raises an error.

AttributeError Traceback (most recent call last)
in ()
----> 1 import skflow

/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/skflow/init.py in ()
27
28 from skflow.trainer import TensorFlowTrainer
---> 29 from skflow import models, data_feeder
30 from skflow import preprocessing
31

/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/skflow/models.py in ()
16 import tensorflow as tf
17
---> 18 from skflow.ops import mean_squared_error_regressor, softmax_classifier, dnn
19
20

/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/skflow/ops/init.py in ()
16
17 from skflow.ops.conv_ops import *
---> 18 from skflow.ops.dnn_ops import *
19 from skflow.ops.embeddings_ops import *
20 from skflow.ops.losses_ops import *

/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/skflow/ops/dnn_ops.py in ()
17
18 import tensorflow as tf
---> 19 from tensorflow.models.rnn import linear
20
21

/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/tensorflow/models/rnn/linear.py in ()
24 import tensorflow as tf
25
---> 26 linear = tf.nn.linear

AttributeError: module 'tensorflow.python.ops.nn' has no attribute 'linear'

Add Neural Translation example

Show an example of Neural Translation (Sequence to Sequence) model, which can showcase:

Work with text and embeddings
Streaming of inputs
Multi-dimensional outputs.

Error while using early_stopping

Hello

Today I tried to use early_stopping feature (as here), and I found that my current skflow cannot find the keyword early_stopping_rounds.

I tried to reinstall skflow but it cannot upgrade because the version are the same (0.0.1). I have to first uninstall skflow then install it again.

So I think it would be good to update the version of skflow to update easier.

TensorFlowLinearRegressor unable to recover weights?

I'm hoping that I just overlooked something but I can't seem recover the correct weights for simple linear test cases using TensorFlowLinearRegressor. Here's a gist of the full test file (with pytest) and matching test case using scikit-learn LinearRegression. As the comment in the gist says, the scikit module recovers the weights faithfully while skflow returns odd weights.

Package details:
Python 3.5
tensorflow 0.6.0
skflow 0.0.1

I coded up a quick linear regression module myself using tensorflow as backend (using the SGD optimizer) and it was able to recover the weights using the same iterations, learning rate, etc... as default TensorFlowLinearRegressor so I'm curious what I'm doing wrong with skflow (or the test case) or if there's an issue somewhere.

Thanks for the comprehensive work on the library. Looking forward to using it a bunch.

ImportError: cannot import name NotFittedError

The import of sklearn.utils.validation.NotFittedError in skflow/estimators/base.py is no longer correct with the latest version if scikit-learn. In particular, this commit to scikit-learn moves NotFittedError to sklearn.exceptions.NotFittedError.

Which versions of scikit-learn is skflow expected to support? (I would send a PR, but am not sure how you want to deal with backwards compatibility here.)

(Source: this StackOverflow question.)

TensorFlowEstimator.restore Error: tensorflow.python.pywrap_tensorflow.StatusNotOK

Restore is still not working for the text_classification.py example. I am getting the following exception:

I tensorflow/core/common_runtime/local_device.cc:40] Local device intra op parallelism threads: 4
I tensorflow/core/common_runtime/direct_session.cc:58] Direct session inter op parallelism threads: 4
Traceback (most recent call last):
  File "/Users/harsimranb/PycharmProjects/TensorFlowTest/TextRNN.py", line 103, in <module>
    runner.run_classification("amigo what time you close?")
  File "/Users/harsimranb/PycharmProjects/TensorFlowTest/TextRNN.py", line 56, in run_classification
    classifier = skflow.TensorFlowEstimator.restore(self.trained_model_path)
  File "/Users/harsimranb/Library/Python/2.7/lib/python/site-packages/skflow/__init__.py", line 332, in restore
    estimator._restore(path)
  File "/Users/harsimranb/Library/Python/2.7/lib/python/site-packages/skflow/__init__.py", line 314, in _restore
    self._saver.restore(self._session, checkpoint_path)
  File "/Users/harsimranb/Library/Python/2.7/lib/python/site-packages/tensorflow/python/training/saver.py", line 891, in restore
    sess.run([self._restore_op_name], {self._filename_tensor_name: save_path})
  File "/Users/harsimranb/Library/Python/2.7/lib/python/site-packages/tensorflow/python/client/session.py", line 368, in run
    results = self._do_run(target_list, unique_fetch_targets, feed_dict_string)
  File "/Users/harsimranb/Library/Python/2.7/lib/python/site-packages/tensorflow/python/client/session.py", line 446, in _do_run
    six.reraise(e_type, e_value, e_traceback)
  File "/Users/harsimranb/Library/Python/2.7/lib/python/site-packages/tensorflow/python/client/session.py", line 428, in _do_run
    target_list)
tensorflow.python.pywrap_tensorflow.StatusNotOK: Internal: Unable to get element from the feed.

Related to #40

Add DBpedia classification example

DBPedia classification like in "Character-level Convolutional Networks for Text Classification" is a good example to showcase:

Character level inputs
CNN for characters
show how much simpler than https://github.com/zhangxiangxiao/Crepe

Model Persistence

Is there any way of storing a model? I'm trying to pickle it with pickle and dill but it does not work...

Thank you

Training hidden_units of DNN with skflow

Hello all

As I know, there are several ways to optimize DNN. I think the most important parameters for DNN should be hidden_units.

Is it possible to somehow find best hidden_units automatically?

Error while running text_classification*

Hi,

I'm trying to run the text_classification examples and I get the following error.

TypeError: Input 'values' of 'HistogramSummary' Op has type int64 that does not match expected type of float32.

The stack trace is below:

Traceback (most recent call last):
File "text_classification.py", line 82, in
classifier.fit(X_train, y_train, logdir='/tmp/tf_examples/word_rnn')
File "/usr/local/lib/python2.7/site-packages/skflow/estimators/base.py", line 186, in fit
self._setup_training()
File "/usr/local/lib/python2.7/site-packages/skflow/estimators/base.py", line 119, in _setup_training
tf.histogram_summary("X", self._inp)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/ops/summary_ops.py", line 39, in histogram_summary
tag=tag, values=values, name=scope)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/ops/gen_summary_ops.py", line 34, in _histogram_summary
name=name)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/ops/op_def_library.py", line 405, in apply_op
(prefix, types_lib.as_dtype(input_arg.type).name))
TypeError: Input 'values' of 'HistogramSummary' Op has type int64 that does not match expected type of float32.

Am I doing something wrong? I installed tensorflow and skflow as mentioned in the medium tutorial.

Input 'Values' of HistogramSummary Op Type Mismatch

Reproducible example: (master branch with update for dask stuff)

from skflow.io import *
import skflow
from sklearn import datasets
import random

random.seed(42)
iris = datasets.load_iris()
data = pd.DataFrame(iris.data)
data = dd.from_pandas(data, npartitions=2)
labels = pd.DataFrame(iris.target)
labels = dd.from_pandas(labels, npartitions=2)
classifier = skflow.TensorFlowLinearClassifier(n_classes=3)
classifier.fit(data, labels)

gives:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-1-8e0c6d4b2deb> in <module>()
     11 labels = dd.from_pandas(labels, npartitions=2)
     12 classifier = skflow.TensorFlowLinearClassifier(n_classes=3)
---> 13 classifier.fit(data, labels)

/Library/Python/2.7/site-packages/skflow-0.0.1-py2.7.egg/skflow/estimators/base.pyc in fit(self, X, y, logdir)
    185         if not self.continue_training or not self._initialized:
    186             # Sets up model and trainer.
--> 187             self._setup_training()
    188             # Initialize model parameters.
    189             self._trainer.initialize(self._session)

/Library/Python/2.7/site-packages/skflow-0.0.1-py2.7.egg/skflow/estimators/base.pyc in _setup_training(self)
    118             # Add histograms for X and y if they are floats.
    119             if self._data_feeder.input_dtype in (np.float32, np.float64):
--> 120                 tf.histogram_summary("X", self._inp)
    121             if self._data_feeder.output_dtype in (np.float32, np.float64):
    122                 tf.histogram_summary("y", self._out)

/Library/Python/2.7/site-packages/tensorflow/python/ops/summary_ops.pyc in histogram_summary(tag, values, collections, name)
     37   with ops.op_scope([tag, values], name, "HistogramSummary") as scope:
     38     val = gen_summary_ops._histogram_summary(
---> 39         tag=tag, values=values, name=scope)
     40     _Collect(val, collections, [ops.GraphKeys.SUMMARIES])
     41   return val

/Library/Python/2.7/site-packages/tensorflow/python/ops/gen_summary_ops.pyc in _histogram_summary(tag, values, name)
     32   """
     33   return _op_def_lib.apply_op("HistogramSummary", tag=tag, values=values,
---> 34                               name=name)
     35
     36

/Library/Python/2.7/site-packages/tensorflow/python/ops/op_def_library.pyc in apply_op(self, op_type_name, g, name, **keywords)
    403             if input_arg.type != types_pb2.DT_INVALID:
    404               raise TypeError("%s expected type of %s." %
--> 405                               (prefix, types_lib.as_dtype(input_arg.type).name))
    406             else:
    407               raise TypeError(

TypeError: Input 'values' of 'HistogramSummary' Op has type float64 that does not match expected type of float32.

Not sure exactly what happened but I tried to add more supported types in https://github.com/tensorflow/skflow/blob/master/skflow/estimators/base.py#L119
-- still getting this error.

Anything I missed?

Adding example of using LSTM for text classification

Hi,

I think it would be good to provide examples and make the directory examples be a list of recipe to use skflow.

Feedback on tutorial 3 - titanic embarked embedding

Thanks for the tutorial. It was very easy to follow. Like many might do, I tested my new knowledge by playing with the other categorical variables but this led to a difficult to understand stack trace.

To reproduce, just replace "Embarked" with "Pclass" in the lines where X and `embarked_classes" get assigned. When trying to fit the model this will throw index out of range errors.

I eventually fixed it by increasing n_classes to unique classes plus 1. I think the error stems from the embedding wanting a row for nan (the Embarked column has missing values but Pclass does not) but I could not get my head around the code to confirm the actual source.

I am only mentioning this as it is a beginners tutorial and the most basic fix is a comment in the code to explain that n_classes needs to account for nan (assuming I am understanding the error correctly). I also wonder if n_classes should just get handled inside categorical_variable().

Get probabilities for ALL the classes

I'm looking at the text_classification.py.

classifier.predict(X-test) gets the class number with the highest probability. But I wonder how to get the probabilities for all the classes per input.

Thanks in advance!

Class weight support

Hi,

I am using skflow.ops.dnn to classify two - classes dataset (True and False). The percentage of True example is very small, so I have an imbalanced dataset.

It seems to me that one way to resolve the issue is to use weighted classes. However, when I look to the implementation of skflow.ops.dnn, I do not know how could I do weighted classes with DNN.

Is it possible to do that with skflow, or is there another technique to deal with imbalanced dataset problem in skflow?

Thanks

simple linear classification example results in AttributeError in pandas_io.py

I just installed Scikit Flow and tried to execute the simple linear classification example. Unfortunately this does not work:

/home/chris/anaconda3/lib/python3.4/site-packages/skflow/io/pandas_io.py in extract_pandas_data(data)
     27 def extract_pandas_data(data):
     28     """Extract data from pandas.DataFrame for predictors"""
---> 29     if not isinstance(data, pd.DataFrame):
     30         return data
     31 

AttributeError: 'module' object has no attribute 'DataFrame'

I'm using Python 3.4, Tensorflow 0.6, Sckit-Learn 0.17 and Pandas 0.16.2

TensorFlowEstimator.restore error for text_classification.py example

I am using the text_classification.py example. I have added the following code to save and restore a model:

To save:

classifier = skflow.TensorFlowEstimator(model_fn=self.rnn_model, n_classes=15,
                                                    steps=1000, optimizer='Adam', learning_rate=0.01,
                                                    continue_training=True

To restore:

classifier = skflow.TensorFlowEstimator.restore(self.trained_model_path)

On restore, I'm getting the following error:

Traceback (most recent call last):
  File "/Users/me/PycharmProjects/TensorFlowTest/TextRNN.py", line 56, in run_classification
    classifier = skflow.TensorFlowEstimator.restore(self.trained_model_path)
  File "/Users/me/Library/Python/2.7/lib/python/site-packages/skflow/__init__.py", line 317, in restore
    estimator = eval(model_def) 
  File "<string>", line 2
    model_fn=<bound method TextRNN.rnn_model of <__main__.TextRNN object at 0x10ef3f450>>,
             ^
SyntaxError: invalid syntax

Within the restore method, model_def looks like this:

TensorFlowEstimator(batch_size=32, continue_training=True, learning_rate=0.01,
          model_fn=<bound method TextRNN.rnn_model of <__main__.TextRNN object at 0x10ef3f450>>,
          n_classes=15, num_cores=4, optimizer='Adam', steps=1000,
          tf_master='', tf_random_seed=42, verbose=1)

Loss scores are different for contiguous run of fit() for 200 steps and 4 runs of fit() for 50 steps

I am doing regression with DNN.

Final MSE for contiguous run of 200 steps: 1.45781016655
Final MSE for 4 runs with 50 steps each: 1.44524233948

Score for contiguous run:
Step #1, epoch #1, avg. loss: 27.95941
Step #21, epoch #21, avg. loss: 5.64051
Step #41, epoch #41, avg. loss: 1.78990
Step #61, epoch #61, avg. loss: 1.53639
Step #81, epoch #81, avg. loss: 1.49865
Step #101, epoch #101, avg. loss: 1.48255
Step #121, epoch #121, avg. loss: 1.47312
Step #141, epoch #141, avg. loss: 1.46747
Step #161, epoch #161, avg. loss: 1.46394
Step #181, epoch #181, avg. loss: 1.46122

Score for 4 runs 50 steps each:
Step #1, epoch #1, avg. loss: 27.95941
Step #6, epoch #6, avg. loss: 13.49244
Step #11, epoch #11, avg. loss: 4.11436
Step #16, epoch #16, avg. loss: 2.69326
Step #21, epoch #21, avg. loss: 2.26197
Step #26, epoch #26, avg. loss: 2.02976
Step #31, epoch #31, avg. loss: 1.79997
Step #36, epoch #36, avg. loss: 1.71287
Step #41, epoch #41, avg. loss: 1.61699
Step #46, epoch #46, avg. loss: 1.56702

Step #51, epoch #1, avg. loss: 1.52925
Step #56, epoch #6, avg. loss: 1.52344
Step #61, epoch #11, avg. loss: 1.51318
Step #66, epoch #16, avg. loss: 1.50661
Step #71, epoch #21, avg. loss: 1.50114
Step #76, epoch #26, avg. loss: 1.49584
Step #81, epoch #31, avg. loss: 1.49099
Step #86, epoch #36, avg. loss: 1.48698
Step #91, epoch #41, avg. loss: 1.48371
Step #96, epoch #46, avg. loss: 1.48097

Step #101, epoch #1, avg. loss: 1.47760
Step #106, epoch #6, avg. loss: 1.47609
Step #111, epoch #11, avg. loss: 1.47386
Step #116, epoch #16, avg. loss: 1.47201
Step #121, epoch #21, avg. loss: 1.47048
Step #126, epoch #26, avg. loss: 1.46914
Step #131, epoch #31, avg. loss: 1.46795
Step #136, epoch #36, avg. loss: 1.46686
Step #141, epoch #41, avg. loss: 1.46591
Step #146, epoch #46, avg. loss: 1.46506

Step #151, epoch #1, avg. loss: 1.46384
Step #156, epoch #6, avg. loss: 1.46348
Step #161, epoch #11, avg. loss: 1.46276
Step #166, epoch #16, avg. loss: 1.46212
Step #171, epoch #21, avg. loss: 1.46144
Step #176, epoch #26, avg. loss: 1.46086
Step #181, epoch #31, avg. loss: 1.46028
Step #186, epoch #36, avg. loss: 1.45976
Step #191, epoch #41, avg. loss: 1.45914
Step #196, epoch #46, avg. loss: 1.45857

DataFeeder to read serialized numpy arrays

Numpy arrays can be serialized to disk and it's possible to do random seeks into them.
Implementing a DataFeeder for such data format will remove requirement to have full dataset in the memory and still do random seeks for sampling of batches.

Custom model used logistic_classifier that is nonexistent

In README, the custom model used logistic_classifier, but there is no such thing in the implementation.

DB pedia text classification shows " TypeError: data type not understood"

All the passed Parameter shape is correct . i have cross validated that .

Please add an example of text classification CNN at word level

OSX Multi threading

Is there any way of determining the number of cores in OSX? I think it calculates it automatically but in my case, I see in the system Monitor a Python process with a %CPU of 150% while other algorithms like XGBOOST mark over 300%. Could I set it manually?

Note that I'm executing the iris_custom_model algorithm with a larger dataset

Thank you

Add functionality to construct and train RNNs

As per title, it would be great to have the ability to build and train RNNs easily with this library.

Multi-threaded feed dict

Currently, feed dicts are working in the same thread as main training thread, which slows down the training loop by the time it takes to process and sample record.

A better option would be to run sampling in the thread and feed into Queue and then main thread will just take full batches out of the queue.

As a performance test, would be interesting to get a match between speed of sklearn.linear_model.LogisticRegression and skflow.TensorFlowLinearClassifier.

Add Saver support

Support Saver:

Setting logdir for saving checkpoints.
Restoring model if checkpoints already exist.
Update examples like text_classification to use this.

skflow doesn't support Python3.X

Both of scikit-learn and tensorflow support Python 3.X
It would be great if skflow supports Python 3.X.

GridSearchCV is not work in TensorFlowDNNClassifier

Hello

I try DNNMetaParameter optimization using GridSearchCV..

but not wok it

Statck Trace..

File "/usr/local/lib/python2.7/site-packages/sklearn/grid_search.py", line 804, in fit
return self._fit(X, y, ParameterGrid(self.param_grid))
File "/usr/local/lib/python2.7/site-packages/sklearn/grid_search.py", line 553, in _fit
for parameters in parameter_iterable
File "/usr/local/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 804, in call
while self.dispatch_one_batch(iterator):
File "/usr/local/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 662, in dispatch_one_batch
self._dispatch(tasks)
File "/usr/local/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 570, in _dispatch
job = ImmediateComputeBatch(batch)
File "/usr/local/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 183, in init
self.results = batch()
File "/usr/local/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 72, in call
return [func(_args, *_kwargs) for func, args, kwargs in self.items]
File "/usr/local/lib/python2.7/site-packages/sklearn/cross_validation.py", line 1531, in _fit_and_score
estimator.fit(X_train, y_train, **fit_params)
File "/usr/local/lib/python2.7/site-packages/skflow/init.py", line 163, in fit
self._setup_training()
File "/usr/local/lib/python2.7/site-packages/skflow/init.py", line 113, in _setup_training
self._inp, self._out)
File "/usr/local/lib/python2.7/site-packages/skflow/models.py", line 73, in dnn_estimator
layers = dnn(X, hidden_units)
File "/usr/local/lib/python2.7/site-packages/skflow/ops/dnn_ops.py", line 41, in dnn
for i, n_units in enumerate(hidden_units):
TypeError: 'NoneType' object is not iterable

Problem

"hidden_units" is None Type in GridSearchCV

"hidden_units" is member of TensorFlowEstimator but not member of TensorFlowDNNClassifier

https://github.com/google/skflow/blob/master/skflow/__init__.py#L415

GridSearchCV can'find "hidden_units"

Multi-output regression and classification

There are number of problems that require multiple outputs (including RNNs).
Support needs to be added across all models and data feeders.

Early stopping , epoch and unsupervised learning.

Hi, this is not an issue but wanted to get clear on some points or may be dumb questions.
I couldn't see parameters for early stopping and epoch (maybe missed it) But I saw steps. Is that equivalent to epoch or iterations.

Also, is unsupervised learning supported like in clustering , word2vec, self-taught type feature learning. Or there are future plans.

How could I adapt the CNN model for 1 - dimension input datasets?

Hi,

I have a dataset with 24 inputs and 1 categorical output, so I am trying to adapt the example https://github.com/google/skflow/blob/master/examples/text_classification_character_cnn.py to my case.

However, in the example, I saw

byte_list = tf.reshape(skflow.ops.one_hot_matrix(X, 256), 
        [-1, MAX_DOCUMENT_LENGTH, 256, 1])

which I do not know how should I adapt to my code? Could you please help?

My data looks like:

input1 input2 ... input_n  output
2 1.2 ... -0.44 "b"
1 0.2 ... 3.2 "f"
3 1 ... 2.1 "a"

DataFeeder: Sampling without replacement

Hi,

I suspect there is a error in _feed_dict_fn.
Why don't you save used sample indices? They could be repeated when you call _feed_dict_fn() again in the same epoch.
sample = random.randint(0, self.X.shape[0] - 1)
inp[i, :] = self.X[sample, :]
It is not an epoch but only one step for one batch.
for step in xrange(steps):
feed_dict = feed_dict_fn()
global_step, loss, _ = sess.run([self.global_step, self.loss, self.trainer], feed_dict=feed_dict)

I think, it must be something like this:
for step in xrange(steps):
for i in xrange(X.shape[0]/batch_size):
feed_dict = feed_dict_fn()
global_step, loss, _ = sess.run([self.global_step, self.loss, self.trainer], feed_dict=feed_dict)

Am I right? Or I have an incorrect guess?

Support reading HDF5

HDF5 is a popular format to store complicated datasets. It also supports different random seekings.
It would be awesome to have a full support for it.

Add support for validation sets

It would be nice if skflow had some support for validation sets to be used for early stopping and monitor validation set loss during training. This could be realized failry easily by adding a fraction_validationset to the TensorFlowEstimator. Within fit, the given training set could then be split into two parts.

missing import

in your example for multioutput regression, it seems that you should add the following import command, so the code will work properly

import skflow.ops

Implement fit/predict to support iterators

Many datasets can't fit into memory, and TF doesn't actually requires the whole dataset in the memory.
Instead of loading the dataset into memory, it's possible to stream it using iterators and feed data that way.

Fit/predict functions in Estimators should be able to take X and y as iterators and read from them while processing data.

Can't import skflow or run tutorials

Great effort by Google to simplify using TensorFlow.
I use command
pip install git+git://github.com/google/skflow.git
to install it on my Mac OS 10.11 inside a VirtualEnv where the TF is also installed there, but I can't import the skflow
I checked V.Env python site-packages and the skflow folder is there.

Will be happy if I can test it, because I'm also eager to provide a wrapper for TF and I want to join this Library.

>>> import skflow
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "skflow.py", line 15, in <module>
    classifier = skflow.TensorFlowDNNClassifier(hidden_units=[10, 20, 10],
AttributeError: 'module' object has no attribute 'TensorFlowDNNClassifier'

Turn off dropout at predict time

Currently, dropout is still applied at test time, which leads to incorrect results.

The proposed solution is to gather all dropout probability nodes and feed dict the probability 0 when run predict.

how to install scipy, scikit and skflow (using pip) with proper dependencies to python 2.7 and how to compile skflow examples using the bazel

very good tutorial. I tried installing tensoflow using the pip and i successfully installed and i tried to run the rnn examples but that folder will not be installed when we use pip. So i got suggestion to install bazel to compile and build the rnn examples.Obviously i installed the python and numpy inorder to run the tensorflow examples and i am able to compile and run the rnn examples successfully.
To follow your tutorial obviously i need to install scipy, scikit and and finally skflow. I am using python 2.7. Could you please tell us how to install scipy , scikit and skflow with proper dependencies (using pip) and finally after installation, how to run skflow examples using bazel (command for compiling and running skflow examples)

'module' object has no attribute 'rnn_cell'

I'm trying the tutorial on Medium (https://medium.com/@ilblackdragon/tensorflow-tutorial-part-2-9ffe47049c92#.608vwpu2a) and the first DNN have raised this error:

Traceback (most recent call last):
  File "skflow_test.py", line 32, in <module>
    deep.fit(X_train, y_train)
  File "/Users/metjush/anaconda/lib/python2.7/site-packages/skflow/estimators/base.py", line 189, in fit
    self._setup_training()
  File "/Users/metjush/anaconda/lib/python2.7/site-packages/skflow/estimators/base.py", line 128, in _setup_training
    self._inp, self._out)
  File "/Users/metjush/anaconda/lib/python2.7/site-packages/skflow/estimators/dnn.py", line 76, in _model_fn
    models.logistic_regression)(X, y)
  File "/Users/metjush/anaconda/lib/python2.7/site-packages/skflow/models.py", line 91, in dnn_estimator
    layers = dnn(X, hidden_units)
  File "/Users/metjush/anaconda/lib/python2.7/site-packages/skflow/ops/dnn_ops.py", line 39, in dnn
    tensor_in = tf.nn.rnn_cell.linear(tensor_in, n_units, True)
AttributeError: 'module' object has no attribute 'rnn_cell'

Is this a problem with my tensorflow installation? Or is there something different amiss?
I have an up-to-date installation of TF, this is what happens when I run pip install tensorflow --upgrade:

Requirement already up-to-date: tensorflow in /Users/metjush/anaconda/lib/python2.7/site-packages
Requirement already up-to-date: six>=1.10.0 in /Users/metjush/anaconda/lib/python2.7/site-packages (from tensorflow)
Requirement already up-to-date: numpy>=1.9.2 in /Users/metjush/anaconda/lib/python2.7/site-packages (from tensorflow)

This is my Python setup:

Python 2.7.10 |Anaconda 2.3.0 (x86_64)| (default, Sep 15 2015, 14:29:08) 
[GCC 4.2.1 (Apple Inc. build 5577)] on darwin

An error following TensorFlow Tutorial — Part 3

Hi, I love the idea of skflow and am trying to learn it :)
When I was running the exactly same code on the blog (https://medium.com/@ilblackdragon/tensorflow-tutorial-part-3-c5fc0662bc08#.jsxv1w8n9) in my local, I got the error below. I thought that the class variables would need to be float instead of integer and tried that but it didn't solve. Could someone help?

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-28-62de22c2cf78> in <module>()
     15 classifier = skflow.TensorFlowEstimator(model_fn=categorical_model,
     16     n_classes=2)
---> 17 classifier.fit(X_train, y_train)
     18 
     19 print("Accuracy: {0}".format(metrics.accuracy_score(classifier.predict(X_test), y_test)))

/Users/a/anaconda/lib/python2.7/site-packages/skflow/estimators/base.pyc in fit(self, X, y, logdir)
    166         if not self.continue_training or not self._initialized:
    167             # Sets up model and trainer.
--> 168             self._setup_training()
    169             # Initialize model parameters.
    170             self._trainer.initialize(self._session)

/Users/a/anaconda/lib/python2.7/site-packages/skflow/estimators/base.pyc in _setup_training(self)
    102 
    103             # Add histograms for X and y.
--> 104             tf.histogram_summary("X", self._inp)
    105             tf.histogram_summary("y", self._out)
    106 

/Users/a/anaconda/lib/python2.7/site-packages/tensorflow/python/ops/summary_ops.pyc in histogram_summary(tag, values, collections, name)
     37   with ops.op_scope([tag, values], name, "HistogramSummary") as scope:
     38     val = gen_summary_ops._histogram_summary(
---> 39         tag=tag, values=values, name=scope)
     40     _Collect(val, collections, [ops.GraphKeys.SUMMARIES])
     41   return val

/Users/a/anaconda/lib/python2.7/site-packages/tensorflow/python/ops/gen_summary_ops.pyc in _histogram_summary(tag, values, name)
     32   """
     33   return _op_def_lib.apply_op("HistogramSummary", tag=tag, values=values,
---> 34                               name=name)
     35 
     36 

/Users/a/anaconda/lib/python2.7/site-packages/tensorflow/python/ops/op_def_library.pyc in apply_op(self, op_type_name, g, name, **keywords)
    403             if input_arg.type != types_pb2.DT_INVALID:
    404               raise TypeError("%s expected type of %s." %
--> 405                               (prefix, types_lib.as_dtype(input_arg.type).name))
    406             else:
    407               raise TypeError(

TypeError: Input 'values' of 'HistogramSummary' Op has type int64 that does not match expected type of float32.

How to use Doc2Vec instead of default document preprocessing of skflow

Hi,

In the text classification example with RNN, I think the document is represented by bag-of-words method.

I want to apply Doc2Vec method, then I have X_train.shape is (20000,500) and X_test shape is (5000,500) with the values are float.

Then I applied

def rnn_model(X, y):
    """Recurrent neural network model to predict from sequence of words
    to a class."""
    # Convert indexes of words into embeddings.
    # This creates embeddings matrix of [n_words, EMBEDDING_SIZE] and then
    # maps word indexes of the sequence into [batch_size, sequence_length,
    # EMBEDDING_SIZE].
    word_vectors = skflow.ops.categorical_variable(X, n_classes=n_words,
        embedding_size=EMBEDDING_SIZE, name='words')
    # Split into list of embedding per word, while removing doc length dim.
    # word_list results to be a list of tensors [batch_size, EMBEDDING_SIZE].
    word_list = skflow.ops.split_squeeze(1, MAX_DOCUMENT_LENGTH, word_vectors)
    # Create a Gated Recurrent Unit cell with hidden size of EMBEDDING_SIZE.
    cell = rnn_cell.GRUCell(EMBEDDING_SIZE)
    # Create an unrolled Recurrent Neural Networks to length of
    # MAX_DOCUMENT_LENGTH and passes word_list as inputs for each unit.
    _, encoding = rnn.rnn(cell, word_list, dtype=tf.float32)
    # Given encoding of RNN, take encoding of last step (e.g hidden size of the
    # neural network of last step) and pass it as features for logistic
    # regression over output classes.
    return skflow.models.logistic_regression(encoding[-1], y)

classifier = skflow.TensorFlowEstimator(model_fn=rnn_model, n_classes=15,
    steps=1000, optimizer='Adam', learning_rate=0.01, continue_training=True)

# Continuesly train for 1000 steps & predict on test set.
while True:
    classifier.fit(X_train, y_train, logdir='/tmp/tf_examples/word_rnn')
    score = metrics.accuracy_score(classifier.predict(X_test), y_test)
    print('Accuracy: {0:f}'.format(score))

then I met an error

TypeError: DataType float32 for attr 'Tindices' not in list of allowed values: int32, int64

What is a better way to apply Doc2Vec with RNN using skflow?

Add Example of Using Spark with skflow to Easily Scale

This sparkit-learn package recently came out that utilizes Spark to scale scikit-learn and aid GridSearch.
https://github.com/lensacom/sparkit-learn

Since skflow has a sklearn-like interface, it might be a good idea to show an example of using sparkit-learn to scale skflow with minor changes.

Blogpost can be found here: https://databricks.com/blog/2016/02/08/auto-scaling-scikit-learn-with-spark.html

PRs are welcomed for this.

Default random seed is 42 rather than defaulting to a random random seed

Hi,

I've been working through some of the tutorials and they often use random.seed() at the beginning. I tried playing with this value to see how it effected the output of a DNN and it doesn't change anything.

A little digging found that by default tf_random_seed in dnn.py is 42 and it must be specified when TensorFlowDNNClassifier() in created if you want anything other than 42.

I found this somewhat confusing and, unless there are other reasons for setting a default random seed in dnn.py (as opposed to the user doing this in their code), I would argue that the default behaviour should leave tf_random_seed undefined. eg tf_random_seed=None instead of tf_random_seed=42.

Thanks for making this awesome project and tutorials, i'm finding them really helpful in my exploration of machine learning!

Need to add cross_validation module to import in iris.py

Error on TensorFlowEstimator.restore (tensorflow.python.pywrap_tensorflow.StatusNotOK: Internal: Unable to get element from the feed.)

There's an error on restoring model when I executing a code below.

import skflow
from sklearn import datasets, metrics

iris = datasets.load_iris()
classifier = skflow.TensorFlowDNNClassifier(hidden_units=[10, 20, 10], n_classes=3)
classifier.fit(iris.data, iris.target)
classifier.save('test_model')
new_clf = skflow.TensorFlowEstimator.restore('test_model')
score = metrics.accuracy_score(new_clf.predict(iris.data), iris.target)
print("Accuracy: %f" % score)

Traceback (most recent call last):
File "dnn_save.py", line 8, in
new_clf = skflow.TensorFlowEstimator.restore('test_model')
File "/usr/local/lib/python2.7/dist-packages/skflow/init.py", line 353, in restore
estimator._restore(path)
File "/usr/local/lib/python2.7/dist-packages/skflow/init.py", line 325, in _restore
self._saver.restore(self._session, checkpoint_path)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 864, in restore
sess.run([self._restore_op_name], {self._filename_tensor_name: save_path})
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 345, in run
results = self._do_run(target_list, unique_fetch_targets, feed_dict_string)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 404, in _do_run
target_list)
tensorflow.python.pywrap_tensorflow.StatusNotOK: Internal: Unable to get element from the feed.

Plotting neural network built by skflow

Hi,

Sorry I asked too much.

I think plotting is always a nice feature. Is it possible right now for skflow (or can we do that through tensorflow directly)?

Example of language model

Add an example of language model (RNN). For example character level on sheikspear book (similar to https://github.com/sherjilozair/char-rnn-tensorflow).

tensorflow / skflow Goto Github PK

skflow's People

Contributors

Stargazers

Watchers

Forkers

skflow's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs