Comments (10)
Ok good. So in that case, no need to hack to CPU, that was just a suggestion to debug. Explicitly setting batch_size smaller than default might be in order if you continue to have problems.
from rude-carnie.
Hmmm, I haven't heard of this problem before. Just from eyeballing, your command line invocation looks okay to me. Since it found no images, you might want to double check your media directory. I suggest opening the age_train.txt file and from the aligned/ directory, make sure that each file reference resolves to a valid PNG or JPG:
dpressel@dpressel:~/dev/work/3csi-rd/dpressel/sh$ ls /data/xdata/age-gender/aligned
100003415@N08 101295462@N02 10280355@N07 10743505@N04 113528649@N08 113846810@N03 115021413@N07 18142498@N06 28754132@N06 34622581@N02 45666944@N00 62282816@N03 7380406@N04 8072696@N04
10001312@N04 10129575@N03 10328235@N07 10747684@N00 113564294@N02 113984426@N05 115033594@N04 19393853@N00 29671106@N00 35953373@N04 45668969@N05 62501130@N02 7398884@N04 8073752@N03
100014826@N03 10148140@N07 10354155@N05 10792106@N03 113603634@N04 114041079@N03 115046815@N06 20245009@N06 30601258@N03 37303189@N08 46113291@N03 63153065@N07 7411850@N04 8147776@N04
10008401@N05 101515718@N03 10391859@N05 10897942@N03 113605644@N05 114776843@N02 115111634@N07 20254529@N04 30872264@N00 37404707@N08 48135726@N02 63164355@N03 7464014@N04 8187011@N06
100346410@N05 101532586@N07 10406201@N05 11008464@N06 113650443@N02 114841417@N06 115126086@N07 20272804@N04 31040257@N06 37920461@N06 48647239@N03 64504106@N06 7610270@N03 8200563@N04
10044155@N06 101560979@N02 10440927@N07 110095806@N05 113705978@N06 114918674@N02 115152228@N06 20316685@N02 31183835@N08 39347094@N04 50458575@N08 64574820@N06 7636528@N03 82152000@N00
10058630@N06 101591466@N03 10466455@N02 111700049@N08 113707938@N08 114970707@N08 115153697@N06 20487016@N02 31442459@N00 39411334@N02 50739822@N00 66870968@N06 7648211@N03 8410632@N03
10062073@N07 101636677@N08 104937236@N08 112114373@N07 113715068@N06 11497677@N05 115178119@N08 20632896@N03 31885615@N05 39615950@N00 54030085@N03 68094148@N04 7651777@N03 86629393@N00
10069023@N00 10171175@N06 10543088@N02 112599447@N03 113728563@N05 114978798@N03 115321157@N03 20696814@N02 33592376@N08 39957446@N00 54263201@N07 68666269@N03 7890646@N03 9017386@N06
101071073@N04 10241064@N08 10580682@N07 113417044@N07 113771355@N07 114987449@N03 15772432@N00 22815721@N06 33627988@N04 40410287@N02 60251856@N05 68825596@N05 7895525@N04 98075207@N04
10113099@N03 102455446@N08 10611527@N03 113445054@N07 113804525@N05 115002895@N05 16166376@N00 26112397@N05 34158582@N02 43999398@N00 60616055@N03 7153718@N04 8007224@N07 9855553@N08
10123180@N04 10255165@N05 10693681@N00 113525713@N07 113830953@N04 115019194@N04 16886060@N03 28468602@N06 34350525@N03 44824649@N05 61777259@N08 7285955@N06 8034587@N06 9965452@N08
dpressel@dpressel:~/dev/work/3csi-rd/dpressel/sh$ vi /home/dpressel/dev/work/AgeGenderDeepLearning/Folds/train_val_txt_files_per_fold/test_fold_is_0/age_train.txt
dpressel@dpressel:~/dev/work/3csi-rd/dpressel/sh$ head /home/dpressel/dev/work/AgeGenderDeepLearning/Folds/train_val_txt_files_per_fold/test_fold_is_0/age_train.txt
10069023@N00/landmark_aligned_face.1924.10335948845_0d22490234_o.jpg 5
7464014@N04/landmark_aligned_face.961.10109081873_8060c8b0a5_o.jpg 4
28754132@N06/landmark_aligned_face.608.11546494564_2ec3e89568_o.jpg 2
10543088@N02/landmark_aligned_face.662.10044788254_2091a56ec3_o.jpg 3
66870968@N06/landmark_aligned_face.1227.11326221064_32114bf26a_o.jpg 4
7464014@N04/landmark_aligned_face.963.10142314254_8e96a97459_o.jpg 4
113525713@N07/landmark_aligned_face.1016.11784555666_8d43b6c493_o.jpg 3
30872264@N00/landmark_aligned_face.603.9575166089_f5f9cecc8c_o.jpg 5
10897942@N03/landmark_aligned_face.633.10372582914_382144ffe8_o.jpg 3
10792106@N03/landmark_aligned_face.522.11039121906_b047c90cc1_o.jpg 3
Just to double check nothing broke recently, I reran (I am using TF 1.0) is the CL that I ran, and its output:
python2.7 preproc.py --fold_dir /home/dpressel/dev/work/AgeGenderDeepLearning/Folds/train_val_txt_files_per_fold/test_fold_is_0 --train_list age_train.txt --valid_list age_val.txt --data_dir /data/xdata/age-gender/aligned --output_dir /home/dpressel/dev/work/AgeGenderDeepLearning/Folds/tf/age_test_fold_is_0
Worked ok for me
from rude-carnie.
Hi thanks for the reply, sorry, it was a mistake on my part, it is now working. Thanks again.
from rude-carnie.
Although I now have the following issue:
`E tensorflow/core/common_runtime/executor.cc:390] Executor failed to create kernel. Internal: Dst tensor is not initialized.
[[Node: OptimizeLoss/zeros_2 = Constdtype=DT_FLOAT, value=Tensor<type: float shape: [5,5,96,256] values: [[[0 0 0]]]...>, _device="/job:localhost/replica:0/task:0/gpu:0"]]
Traceback (most recent call last):
File "train.py", line 185, in
tf.app.run()
File "/home/tinn/.local/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 43, in run
sys.exit(main(sys.argv[:1] + flags_passthrough))
File "train.py", line 124, in main
tf.global_variables_initializer().run(session=sess)
File "/home/tinn/.local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1449, in run
_run_using_default_session(self, feed_dict, self.graph, session)
File "/home/tinn/.local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 3668, in _run_using_default_session
session.run(operation, feed_dict)
File "/home/tinn/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 766, in run
run_metadata_ptr)
File "/home/tinn/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 964, in _run
feed_dict_string, options, run_metadata)
File "/home/tinn/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1014, in _do_run
target_list, options, run_metadata)
File "/home/tinn/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1034, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: Dst tensor is not initialized.
[[Node: OptimizeLoss/zeros_2 = Constdtype=DT_FLOAT, value=Tensor<type: float shape: [5,5,96,256] values: [[[0 0 0]]]...>, _device="/job:localhost/replica:0/task:0/gpu:0"]]
Caused by op u'OptimizeLoss/zeros_2', defined at:
File "train.py", line 185, in
tf.app.run()
File "/home/tinn/.local/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 43, in run
sys.exit(main(sys.argv[:1] + flags_passthrough))
File "train.py", line 117, in main
train_op = optimizer(FLAGS.optim, FLAGS.eta, total_loss)
File "train.py", line 81, in optimizer
return tf.contrib.layers.optimize_loss(loss_fn, global_step, eta, optz, clip_gradients=4., learning_rate_decay_fn=lr_decay_fn)
File "/home/tinn/.local/lib/python2.7/site-packages/tensorflow/contrib/layers/python/layers/optimizers.py", line 275, in optimize_loss
name="train")
File "/home/tinn/.local/lib/python2.7/site-packages/tensorflow/python/training/optimizer.py", line 393, in apply_gradients
self._create_slots(var_list)
File "/home/tinn/.local/lib/python2.7/site-packages/tensorflow/python/training/momentum.py", line 51, in _create_slots
self._zeros_slot(v, "momentum", self._name)
File "/home/tinn/.local/lib/python2.7/site-packages/tensorflow/python/training/optimizer.py", line 593, in _zeros_slot
named_slots[var] = slot_creator.create_zeros_slot(var, op_name)
File "/home/tinn/.local/lib/python2.7/site-packages/tensorflow/python/training/slot_creator.py", line 106, in create_zeros_slot
val = array_ops.zeros(primary.get_shape().as_list(), dtype=dtype)
File "/home/tinn/.local/lib/python2.7/site-packages/tensorflow/python/ops/array_ops.py", line 1437, in zeros
output = constant(zero, shape=shape, dtype=dtype, name=name)
File "/home/tinn/.local/lib/python2.7/site-packages/tensorflow/python/framework/constant_op.py", line 169, in constant
attrs={"value": tensor_value, "dtype": dtype_value}, name=name).outputs[0]
File "/home/tinn/.local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2240, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/home/tinn/.local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1128, in init
self._traceback = _extract_stack()
InternalError (see above for traceback): Dst tensor is not initialized.
[[Node: OptimizeLoss/zeros_2 = Constdtype=DT_FLOAT, value=Tensor<type: float shape: [5,5,96,256] values: [[[0 0 0]]]...>, _device="/job:localhost/replica:0/task:0/gpu:0"]]
`
from rude-carnie.
I did have two out of memory warnings earlier but the first two commands completed successfully, do you see an issue with running this on a GTX 750 ti ?
from rude-carnie.
I have not seen this problem either. Works on mine with this CL:
python2.7 train.py --train_dir /home/dpressel/dev/work/AgeGenderDeepLearning/Folds/tf/age_test_fold_is_0
I guess it could be the card as well, it looks like you have 1GB of RAM (I have 8GB on my lappy). You could probably test this by hacking it to run with tf.device('/cpu:0') explicitly and seeing if you get the same error.
from rude-carnie.
My specs are:
Intel CORE 2
QUAD 8 GB RAM 500 GB HDD
2GB GDDR5 GTX 750Ti
from rude-carnie.
You can try limiting the batch size or the hack I suggested above. Default batch size is 128, pass in --batch_size (e.g. 16) which should reduce the GPU memory its using if you think that is the problem
from rude-carnie.
Hi thanks I restarted the PC and ran everything again, I no longer got the memory errors on the initial data scripts and this time it is training, although I did get a warning about memory it just mentioned performance gains with more memory, it is so far training without modification, but if it does raise errors again I will try your suggestions.
from rude-carnie.
Are we good to close this?
from rude-carnie.
Related Issues (20)
- where is the folder /AgeGenderDeepLearning/Folds/tf/age_test_fold_is_1/run-20854 ? HOT 1
- how to use the guess.py with pre-trained model ? HOT 4
- how to change the script if I use MTCNN to detect the face ? HOT 4
- no gender in output HOT 1
- Using the gender detection model on tf 2.0 HOT 2
- Issue Training Gender HOT 2
- Where is the pruning process for the Tensorflow code?
- Running in Realtime
- Adding a licence
- name 'ImageCoder' is not defined HOT 2
- Cannot work on GPU and ask for Requirements
- Gender Classification Confusion Matrix
- batch size issue when freeze the graph
- python train.py HOT 2
- eval.py display accuracy
- test datasets HOT 1
- Custom Face Detector HOT 3
- Gender detection does not work correctly HOT 1
- Output node name to create a frozen graph
- who has the dataset ? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from rude-carnie.