tobegit3hub / tensorflow_template_application Goto Github PK
View Code? Open in Web Editor NEWTensorFlow template application for deep learning
License: Apache License 2.0
TensorFlow template application for deep learning
License: Apache License 2.0
[x] Bug (Typo)
Semi-automated issue generated by
https://github.com/timgates42/meticulous/blob/master/docs/NOTE.md
To avoid wasting CI processing resources a branch with the fix has been
prepared but a pull request has not yet been created. A pull request fixing
the issue can be prepared from the link below, feel free to create it or
request @timgates42 create the PR.
https://github.com/timgates42/tensorflow_template_application/pull/new/bugfix_typos
Thanks.
Hi tobe,
I want to use the python and java client to get data back from tf serving server on another PC 10.10.10.229. It seems bazel and tensorflow is not necessary here.
But when I run the predict_client.py code, it seems the connection failed. How can I fix?
~/Desktop/tensorflow_template_application/python_predict_client$ ./predict_client.py --host 10.10.10.229 --port 9000 --model_name dense --model_version 1
Traceback (most recent call last):
File "./predict_client.py", line 51, in <module>
main()
File "./predict_client.py", line 46, in main
result = stub.Predict(request, request_timeout)
File "/usr/local/lib/python2.7/dist-packages/grpc/beta/_client_adaptations.py", line 309, in __call__
self._request_serializer, self._response_deserializer)
File "/usr/local/lib/python2.7/dist-packages/grpc/beta/_client_adaptations.py", line 195, in _blocking_unary_unary
raise _abortion_error(rpc_error_call)
grpc.framework.interfaces.face.face.AbortionError: AbortionError(code=StatusCode.UNAVAILABLE, details="Connect Failed")
distributed/cancer_classifier.py works in only one docker container.
It works in one container:
# both in 127.17.0.3
python cancer_classifier.py --ps_hosts=127.17.0.3:8222 --worker_hosts=127.17.0.3:8223 --job_name=ps --task_index=0
python cancer_classifier.py --ps_hosts=127.17.0.3:8222 --worker_hosts=127.17.0.3:8223 --job_name=worker --task_index=0
But it not work in two containers:
# ps in 127.17.0.3
python cancer_classifier.py --ps_hosts=127.17.0.3:8222 --worker_hosts=127.17.0.4:8223 --job_name=ps --task_index=0`
# worker in 127.17.0.4
python cancer_classifier.py --ps_hosts=127.17.0.3:8222 --worker_hosts=127.17.0.4:8223 --job_name=worker --task_index=0
the error msg I got in the worker:
I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:197] Initialize GrpcChannelCache for job ps -> {0 -> 127.17.0.3:8222}
I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:197] Initialize GrpcChannelCache for job worker -> {0 -> localhost:8222}
I tensorflow/core/distributed_runtime/rpc/grpc_server_lib.cc:211] Started server with target: grpc://localhost:8222
WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/python/training/supervisor.py:344 in __init__.: __init__ (from tensorflow.python.training.summary_io) is deprecated and will be removed after 2016-11-30.
Instructions for updating:
Please switch to tf.summary.FileWriter. The interface and behavior is the same; this is just a rename.
I tensorflow/core/distributed_runtime/master_session.cc:993] Start master session 91acfc1008531f4d with config:
Traceback (most recent call last):
File "cancer_classifier_new.py", line 241, in <module>
tf.app.run(main=main)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 43, in run
sys.exit(main(sys.argv[:1] + flags_passthrough))
File "cancer_classifier_new.py", line 209, in main
with sv.managed_session(server.target) as sess:
File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__
return self.gen.next()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/supervisor.py", line 974, in managed_session
self.stop(close_summary_writer=close_summary_writer)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/supervisor.py", line 802, in stop
stop_grace_period_secs=self._stop_grace_secs)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/coordinator.py", line 386, in join
six.reraise(*self._exc_info_to_raise)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/supervisor.py", line 963, in managed_session
start_standard_services=start_standard_services)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/supervisor.py", line 720, in prepare_or_wait_for_session
init_feed_dict=self._init_feed_dict, init_fn=self._init_fn)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/session_manager.py", line 227, in prepare_session
config=config)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/session_manager.py", line 173, in _restore_checkpoint
saver.restore(sess, ckpt.model_checkpoint_path)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 1388, in restore
{self.saver_def.filename_tensor_name: save_path})
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 766, in run
run_metadata_ptr)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 964, in _run
feed_dict_string, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1014, in _do_run
target_list, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1034, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot assign a device to node 'save/RestoreV2_8': Could not satisfy explicit device specification '/job:ps/task:0/device:CPU:0' because no devices matching that specification are registered in this process; available devices: /job:worker/replica:0/task:0/cpu:0
[[Node: save/RestoreV2_8 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:ps/task:0/device:CPU:0"](save/Const, save/RestoreV2_8/tensor_names, save/RestoreV2_8/shape_and_slices)]]
Caused by op u'save/RestoreV2_8', defined at:
File "cancer_classifier_new.py", line 241, in <module>
tf.app.run(main=main)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 43, in run
sys.exit(main(sys.argv[:1] + flags_passthrough))
File "cancer_classifier_new.py", line 191, in main
saver = tf.train.Saver()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 1000, in __init__
self.build()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 1030, in build
restore_sequentially=self._restore_sequentially)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 624, in build
restore_sequentially, reshape)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 361, in _AddRestoreOps
tensors = self.restore_op(filename_tensor, saveable, preferred_shard)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 200, in restore_op
[spec.tensor.dtype])[0])
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_io_ops.py", line 441, in restore_v2
dtypes=dtypes, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 759, in apply_op
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2240, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1128, in __init__
self._traceback = _extract_stack()
InvalidArgumentError (see above for traceback): Cannot assign a device to node 'save/RestoreV2_8': Could not satisfy explicit device specification '/job:ps/task:0/device:CPU:0' because no devices matching that specification are registered in this process; available devices: /job:worker/replica:0/task:0/cpu:0
[[Node: save/RestoreV2_8 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:ps/task:0/device:CPU:0"](save/Const, save/RestoreV2_8/tensor_names, save/RestoreV2_8/shape_and_slices)]]
上千万条转化为TF格式后数据,用sparse_classifier.py 的默认配置训练时不到1分钟就结束了,貌似样本没有读完。这个是什么原因?
when i send post request to /cancer_predict/predict/, occur error like this:File "/Users/terry/Downloads/deep_recommend_system-master/http_service/cancer_predict/views.py", line 38, in predict feed_dict[v] = np.array(examples[k])
i print examples
value:{u'features': u'10,10,10,8,6,1,8,9,1;6,2,1,1,1,1,7,1,1'}
and (k, v) value k:features, v:Placeholder:0
;
how can i fix the error, thanks for help~
hello ,
import org.tensorflow.framework.TensorProto;
import org.tensorflow.framework.TensorShapeProto;
import tensorflow.serving.Model;
import tensorflow.serving.Predict;
import tensorflow.serving.PredictionServiceGrpc;
is error.I don not have this package, and tried mvn clean install. how to solve this?thank you !!!
In DensePredictClient.java
project can't find these moduels:
import org.tensorflow.framework.DataType;
import org.tensorflow.framework.TensorProto;
import org.tensorflow.framework.TensorShapeProto;
I have involved tensorflow in pom.xml
org.tensorflow
tensorflow
1.3.0
also can't find these:
import tensorflow.serving.Model;
import tensorflow.serving.Predict;
import tensorflow.serving.PredictionServiceGrpc;
你好,上面的链接,我没有找到online_train实例的代码。可否提供tensorflow的online_train的思路,谢谢
i use python manage.py runserver 0.0.0.0:8000
command to start http service, but how can i use rest client to predict my own data;other words, when i post
http request, what's the content-type(Content-Type: application/x-www-form-urlencoded?) and http body(cancer_features="10,10,10,8,6,1,8,9,1;6,2,1,1,1,1,7,1,1" ?)
of request, thank you for guidance
Does this project support "Python gRPC server"? I open the link in the main page but found error 404.
After fixing issue of #15, i met more complicated issue in assert_is_compatible_with
Traceback (most recent call last):
File "./cancer_classifier.py", line 241, in <module>
tf.app.run()
File "/Library/Python/2.7/site-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "./cancer_classifier.py", line 174, in main
concated = tf.concat(1, [indices, sparse_labels])
File "/Library/Python/2.7/site-packages/tensorflow/python/ops/array_ops.py", line 1032, in concat
).assert_is_compatible_with(tensor_shape.scalar())
File "/Library/Python/2.7/site-packages/tensorflow/python/framework/tensor_shape.py", line 735, in assert_is_compatible_with
raise ValueError("Shapes %s and %s are incompatible" % (self, other))
ValueError: Shapes (2, 1024, 1) and () are incompatible
my tensorflow version:
python -c 'import tensorflow; print tensorflow.__version__'
1.1.0-rc0
I entried the command after the Run Training
$ ./dense_classifier.py --mode export
And I got this response.
Trying to search your Flag in the code , I found that there was not a Flag called "export" in Line 66.
And another Flag also couldn't export the report.
Did it just a bug? or I just mistaked something ?
Hello,I found a performance issue in the definition of main
,
tobegit3hub/tensorflow_template_application/blob/master/sparse_classifier.py,
train_dataset.map was called without num_parallel_calls.
I think it will increase the efficiency of your program if you add this.
The same issues also exist in validation_dataset.map ,
train_dataset.map and other three places
Here is the documemtation of tensorflow to support this thing.
Looking forward to your reply. Btw, I am very glad to create a PR to fix it if you are too busy.
I construct it using BoolVal : []bool{fasle}, with shape dim size = 1, but it has errors like behind:
The second input must be a scalar, but it has shape [1]
I don't know how to solve it.
Dear Team ,
I am new to tensorflow API , so with these code I am trying to build the model using the below command
I am trying to build the model using the below command,
./dense_classifier.py --batch_size 1024 --epoch_number 1000 --steps_to_validate 10 --optimizer adagrad --model dnn --dnn_struct "128 32 8"
But with the old code I was able to generate the model successfully ,after the code changes I am not able to generate the model.
From my understanding I have not seen the export_model method in the code, may be because of that model is not being generated, please correct me if I am wrong .
Also please let me know if there is any other step is required to generate the model foder.
Eagerly waiting for your reply. Thank you very much for giving very good project for practice .
Thanking you
I ran the sparse_classifier.py program and got this error in the export_model() function.
INFO:tensorflow:./sparse_model/00000001-tmp/export is not in all_model_checkpoint_pat
hs. Manually adding it.
*** Error in `python': double free or corruption (!prev): 0x0000000000825660 ***
I want to test the AUC of the entire validation set, setting the validation_batch as the size of validation set is too slow. So I want to export the model and predict AUC on the entire validation set, and I got this problem.
Is there any suggestion?
I tried to the CNN using command given in README.md
./dense_classifier.py --train_file ./data/lung/fa7a21165ae152b13def786e6afc3edf.dcm.csv.tfrecords --validate_file ./data/lung/fa7a21165ae152b13def786e6afc3edf.dcm.csv.tfrecords --feature_size 262144 --label_size 2 --batch_size 2 --validate_batch_size 2 --epoch_number -1 --model cnn
I get the following error
File "/home/root1/.virtualenv/tensorflow_template_application/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1363, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.NotFoundError: Key conv0/bias not found in checkpoint
[[Node: save/RestoreV2 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]
Caused by op u'save/RestoreV2', defined at:
File "./dense_classifier.py", line 580, in <module>
main()
File "./dense_classifier.py", line 438, in main
saver = tf.train.Saver()
NotFoundError (see above for traceback): Key conv0/bias not found in checkpoint
[[Node: save/RestoreV2 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]
Hi. I have data set with libsvm format where feaid
is uint64. Can you give me an example of how to fetch this data to tensorflow model? eg. sparse logistic regression. Thanks.
label feaid:value feaid:value ...
I want to output the accuracy with all the train and test data, not the batch_size train and test data. But when I change the "batch_xtrain" to "xtrain", there is a not match error. Why? What is the right way?
train_sigmoid = train_accuracy_logits
train_correct_prediction = tf.equal(tf.argmax(train_sigmoid, 1), tf.argmax(batch_ytrain, 1))
train_accuracy = tf.reduce_mean(tf.cast(train_correct_prediction, tf.float32))
ValueError: Shape must be rank 2 but is rank 1 for 'layer1_1/MatMul' (op: 'MatMul') with input shapes: [440], [440,500].
When I run "./predict_client.py ... ..." or "cloudml models predict ... ..."
It appear error: grpc.framework.interfaces.face.face.ExpirationError: ExpirationError(code=StatusCode.DEADLINE_EXCEEDED, details="Deadline Exceeded")
Hello! I've found a performance issue in your project: batch()
should be called before map()
, which could make your program more efficient. Here is the tensorflow document to support it.
Detailed description is listed below:
.batch(FLAGS.train_batch_size)
(here) should be called before .map(parse_tfrecords_function)
(here)..batch(FLAGS.validation_batch_size)
(here) should be called before .map(parse_tfrecords_function)
(here)..batch(FLAGS.train_batch_size)
(here) should be called before .map(parse_tfrecords_function)
(here)..batch(FLAGS.train_batch_size)
(here) should be called before .map(parse_csv_function)
(here)..batch(FLAGS.validation_batch_size)
(here) should be called before .map(parse_tfrecords_function)
(here)..batch(FLAGS.validation_batch_size)
(here) should be called before .map(parse_csv_function)
(here).Besides, you need to check the function called in map()
(e.g., parse_csv_function
called in map()
) whether to be affected or not to make the changed code work properly. For example, if parse_csv_function
needs data with shape (x, y, z) as its input before fix, it would require data with shape (batch_size, x, y, z).
Looking forward to your reply. Btw, I am very glad to create a PR to fix it if you are too busy.
Hi tobe,
# Read TFRecords file for validatioin
validate_filename_queue = tf.train.string_input_producer(
tf.train.match_filenames_once("data/cancer_test.csv.tfrecords"),
num_epochs=epoch_number)
validate_label, validate_features = read_and_decode(validate_filename_queue)
validate_batch_labels, validate_batch_features = tf.train.shuffle_batch(
[validate_label, validate_features],
batch_size=validate_batch_size,
num_threads=thread_number,
capacity=capacity,
min_after_dequeue=min_after_dequeue)
In the code, the validation batch data are also generated by QueueRunner. So cooridinator will collect these valid batch threads together with train QueueRunner right?
But if I have a different batch_size settings in train dataset and valid dataset. Their total num are also different( most of time). What should the coordiantor work?
And I found you set the num_of_epochs
the same value in train and valid queuerunner. Why it like that?
I think the two num_epoch should not be equal?
flags.DEFINE_integer('epoch_number', None, 'Number of epochs to run trainer.')
filename_queue = tf.train.string_input_producer(
tf.train.match_filenames_once("data/cancer_train.csv.tfrecords"),
num_epochs=epoch_number)
validate_filename_queue = tf.train.string_input_producer(
tf.train.match_filenames_once("data/cancer_test.csv.tfrecords"),
num_epochs=epoch_number)
Thanks!
我读入一张图片,datetype:float32, shape:[95,92,1], value 是一个[][][]float32的3维数组
现在,想要调 tensorflow 的grpc,
63 request := &pb.PredictRequest{
64 ModelSpec: &pb.ModelSpec{
65 Name: "mnist",
66 SignatureName: "predict_images",
67 Version: &google_protobuf.Int64Value{
68 Value: int64(1),
69 },
70 },
71 Inputs: map[string]*tf_core_framework.TensorProto{
72 "images": &tf_core_framework.TensorProto{
73 Dtype: tf_core_framework.DataType_DT_FLOAT,
74 TensorShape: &tf_core_framework.TensorShapeProto{
75 Dim: []*tf_core_framework.TensorShapeProto_Dim{
76 &tf_core_framework.TensorShapeProto_Dim{
77 Size: tensor.Shape()[0],
78 },
79 &tf_core_framework.TensorShapeProto_Dim{
80 Size: tensor.Shape()[1],
81 },
82
83 &tf_core_framework.TensorShapeProto_Dim{
84 Size: tensor.Shape()[2],
85 },
86 },
87 },
88 FloatVal: tensor.Value().([]float32),
89 },
90 },
91 }
FloatVal: tensor.Value().([]float32), 这个地方tensor.value()是[][][]float32类型的,怎么转换?
when input_unit is sparse feature with size 20000000, network is 1286464*32
train the model ,it errors "Tensor slice is too large to serialize".
did you try this repo in production system with large data?
wish some help to slove this. thanks
I test run deep_recommend_system-master/cancer_classifier.py, but it reveals that 'module' object has no attribute 'streaming_auc' in line 203. I don't know why, when the tensorflow_examples are running fluently in my PC.
Thanks for your answers.
When run dense_classifier or sparse_classifier, there would be an error as below:
What is the problem? version of python?
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/nn_ops.py", line 1684, in sparse_softmax_cross_entropy_with_logits
labels, logits)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/nn_ops.py", line 1533, in _ensure_xent_args
"named arguments (labels=..., logits=..., ...)" % name)
ValueError: Only call `sparse_softmax_cross_entropy_with_logits` with named arguments (labels=..., logits=..., ...)
Hi there, I downloaded deep_recommend_system for study. But failed to run cancer_classifier.py. The following is what I got. Please anybody advise what is the problem? Thanks!
Use the model: wide_and_deep
Use the optimizer: adagrad
Use the model: wide_and_deep
Traceback (most recent call last):
File "cancer_classifier.py", line 203, in
_, auc_op = tf.contrib.metrics.streaming_auc(validate_softmax,
AttributeError: 'module' object has no attribute 'streaming_auc'
Hi, in the file sparse_classifier.py
, feature size is assigned to 124 for a8a dataset.
flags.DEFINE_integer("feature_size", 124, "Number of feature size")
However, I think the feature size should be 123 according to this website.
Am I right?
run ./predict_client.py
Error: No module named tensorflow_serving.apis
In README.md the inference_server.py and inference_client.py shouldn't be predict_server.py and predict_client.py?
Traceback (most recent call last):
File "./dense_classifier.py", line 420, in
main()
File "./dense_classifier.py", line 349, in main
inference_data = np.genfromtxt(inference_test_file_name, delimiter=",")
File "/usr/local/lib/python2.7/dist-packages/numpy/lib/npyio.py", line 1451, in genfromtxt
fhd = iter(np.lib._datasource.open(fname, 'rbU'))
File "/usr/local/lib/python2.7/dist-packages/numpy/lib/_datasource.py", line 151, in open
return ds.open(path, mode)
File "/usr/local/lib/python2.7/dist-packages/numpy/lib/_datasource.py", line 501, in open
raise IOError("%s not found." % path)
IOError: ./data/cancer_test.csv not found.
the path is './data/cancer/cancer_test.csv', not './data/cancer_test.csv'
Cannot find file: ./tensorflow_model_server
When is_train, the bn is added. But test, it's ignored. based on the doc in the page, https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/layers/python/layers/layers.py
logic should be like the desc below:
is_training: Whether or not the layer is in training mode. In training mode
it would accumulate the statistics of the moments into moving_mean
and
moving_variance
using an exponential moving average with the given
decay
. When it is not in training mode then it would use the values of
the moving_mean
and the moving_variance
.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.