GithubHelp home page GithubHelp logo

Comments (10)

dpressel avatar dpressel commented on August 19, 2024 1

Ok good. So in that case, no need to hack to CPU, that was just a suggestion to debug. Explicitly setting batch_size smaller than default might be in order if you continue to have problems.

from rude-carnie.

dpressel avatar dpressel commented on August 19, 2024

Hmmm, I haven't heard of this problem before. Just from eyeballing, your command line invocation looks okay to me. Since it found no images, you might want to double check your media directory. I suggest opening the age_train.txt file and from the aligned/ directory, make sure that each file reference resolves to a valid PNG or JPG:

dpressel@dpressel:~/dev/work/3csi-rd/dpressel/sh$ ls /data/xdata/age-gender/aligned
100003415@N08  101295462@N02  10280355@N07   10743505@N04   113528649@N08  113846810@N03  115021413@N07  18142498@N06  28754132@N06  34622581@N02  45666944@N00  62282816@N03  7380406@N04  8072696@N04
10001312@N04   10129575@N03   10328235@N07   10747684@N00   113564294@N02  113984426@N05  115033594@N04  19393853@N00  29671106@N00  35953373@N04  45668969@N05  62501130@N02  7398884@N04  8073752@N03
100014826@N03  10148140@N07   10354155@N05   10792106@N03   113603634@N04  114041079@N03  115046815@N06  20245009@N06  30601258@N03  37303189@N08  46113291@N03  63153065@N07  7411850@N04  8147776@N04
10008401@N05   101515718@N03  10391859@N05   10897942@N03   113605644@N05  114776843@N02  115111634@N07  20254529@N04  30872264@N00  37404707@N08  48135726@N02  63164355@N03  7464014@N04  8187011@N06
100346410@N05  101532586@N07  10406201@N05   11008464@N06   113650443@N02  114841417@N06  115126086@N07  20272804@N04  31040257@N06  37920461@N06  48647239@N03  64504106@N06  7610270@N03  8200563@N04
10044155@N06   101560979@N02  10440927@N07   110095806@N05  113705978@N06  114918674@N02  115152228@N06  20316685@N02  31183835@N08  39347094@N04  50458575@N08  64574820@N06  7636528@N03  82152000@N00
10058630@N06   101591466@N03  10466455@N02   111700049@N08  113707938@N08  114970707@N08  115153697@N06  20487016@N02  31442459@N00  39411334@N02  50739822@N00  66870968@N06  7648211@N03  8410632@N03
10062073@N07   101636677@N08  104937236@N08  112114373@N07  113715068@N06  11497677@N05   115178119@N08  20632896@N03  31885615@N05  39615950@N00  54030085@N03  68094148@N04  7651777@N03  86629393@N00
10069023@N00   10171175@N06   10543088@N02   112599447@N03  113728563@N05  114978798@N03  115321157@N03  20696814@N02  33592376@N08  39957446@N00  54263201@N07  68666269@N03  7890646@N03  9017386@N06
101071073@N04  10241064@N08   10580682@N07   113417044@N07  113771355@N07  114987449@N03  15772432@N00   22815721@N06  33627988@N04  40410287@N02  60251856@N05  68825596@N05  7895525@N04  98075207@N04
10113099@N03   102455446@N08  10611527@N03   113445054@N07  113804525@N05  115002895@N05  16166376@N00   26112397@N05  34158582@N02  43999398@N00  60616055@N03  7153718@N04   8007224@N07  9855553@N08
10123180@N04   10255165@N05   10693681@N00   113525713@N07  113830953@N04  115019194@N04  16886060@N03   28468602@N06  34350525@N03  44824649@N05  61777259@N08  7285955@N06   8034587@N06  9965452@N08
dpressel@dpressel:~/dev/work/3csi-rd/dpressel/sh$ vi /home/dpressel/dev/work/AgeGenderDeepLearning/Folds/train_val_txt_files_per_fold/test_fold_is_0/age_train.txt
dpressel@dpressel:~/dev/work/3csi-rd/dpressel/sh$ head /home/dpressel/dev/work/AgeGenderDeepLearning/Folds/train_val_txt_files_per_fold/test_fold_is_0/age_train.txt
10069023@N00/landmark_aligned_face.1924.10335948845_0d22490234_o.jpg 5
7464014@N04/landmark_aligned_face.961.10109081873_8060c8b0a5_o.jpg 4
28754132@N06/landmark_aligned_face.608.11546494564_2ec3e89568_o.jpg 2
10543088@N02/landmark_aligned_face.662.10044788254_2091a56ec3_o.jpg 3
66870968@N06/landmark_aligned_face.1227.11326221064_32114bf26a_o.jpg 4
7464014@N04/landmark_aligned_face.963.10142314254_8e96a97459_o.jpg 4
113525713@N07/landmark_aligned_face.1016.11784555666_8d43b6c493_o.jpg 3
30872264@N00/landmark_aligned_face.603.9575166089_f5f9cecc8c_o.jpg 5
10897942@N03/landmark_aligned_face.633.10372582914_382144ffe8_o.jpg 3
10792106@N03/landmark_aligned_face.522.11039121906_b047c90cc1_o.jpg 3

Just to double check nothing broke recently, I reran (I am using TF 1.0) is the CL that I ran, and its output:

python2.7 preproc.py --fold_dir /home/dpressel/dev/work/AgeGenderDeepLearning/Folds/train_val_txt_files_per_fold/test_fold_is_0 --train_list age_train.txt --valid_list age_val.txt --data_dir /data/xdata/age-gender/aligned --output_dir /home/dpressel/dev/work/AgeGenderDeepLearning/Folds/tf/age_test_fold_is_0

Worked ok for me

from rude-carnie.

AdamMiltonBarker avatar AdamMiltonBarker commented on August 19, 2024

Hi thanks for the reply, sorry, it was a mistake on my part, it is now working. Thanks again.

from rude-carnie.

AdamMiltonBarker avatar AdamMiltonBarker commented on August 19, 2024

Although I now have the following issue:

`E tensorflow/core/common_runtime/executor.cc:390] Executor failed to create kernel. Internal: Dst tensor is not initialized.
[[Node: OptimizeLoss/zeros_2 = Constdtype=DT_FLOAT, value=Tensor<type: float shape: [5,5,96,256] values: [[[0 0 0]]]...>, _device="/job:localhost/replica:0/task:0/gpu:0"]]
Traceback (most recent call last):
File "train.py", line 185, in
tf.app.run()
File "/home/tinn/.local/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 43, in run
sys.exit(main(sys.argv[:1] + flags_passthrough))
File "train.py", line 124, in main
tf.global_variables_initializer().run(session=sess)
File "/home/tinn/.local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1449, in run
_run_using_default_session(self, feed_dict, self.graph, session)
File "/home/tinn/.local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 3668, in _run_using_default_session
session.run(operation, feed_dict)
File "/home/tinn/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 766, in run
run_metadata_ptr)
File "/home/tinn/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 964, in _run
feed_dict_string, options, run_metadata)
File "/home/tinn/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1014, in _do_run
target_list, options, run_metadata)
File "/home/tinn/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1034, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: Dst tensor is not initialized.
[[Node: OptimizeLoss/zeros_2 = Constdtype=DT_FLOAT, value=Tensor<type: float shape: [5,5,96,256] values: [[[0 0 0]]]...>, _device="/job:localhost/replica:0/task:0/gpu:0"]]

Caused by op u'OptimizeLoss/zeros_2', defined at:
File "train.py", line 185, in
tf.app.run()
File "/home/tinn/.local/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 43, in run
sys.exit(main(sys.argv[:1] + flags_passthrough))
File "train.py", line 117, in main
train_op = optimizer(FLAGS.optim, FLAGS.eta, total_loss)
File "train.py", line 81, in optimizer
return tf.contrib.layers.optimize_loss(loss_fn, global_step, eta, optz, clip_gradients=4., learning_rate_decay_fn=lr_decay_fn)
File "/home/tinn/.local/lib/python2.7/site-packages/tensorflow/contrib/layers/python/layers/optimizers.py", line 275, in optimize_loss
name="train")
File "/home/tinn/.local/lib/python2.7/site-packages/tensorflow/python/training/optimizer.py", line 393, in apply_gradients
self._create_slots(var_list)
File "/home/tinn/.local/lib/python2.7/site-packages/tensorflow/python/training/momentum.py", line 51, in _create_slots
self._zeros_slot(v, "momentum", self._name)
File "/home/tinn/.local/lib/python2.7/site-packages/tensorflow/python/training/optimizer.py", line 593, in _zeros_slot
named_slots[var] = slot_creator.create_zeros_slot(var, op_name)
File "/home/tinn/.local/lib/python2.7/site-packages/tensorflow/python/training/slot_creator.py", line 106, in create_zeros_slot
val = array_ops.zeros(primary.get_shape().as_list(), dtype=dtype)
File "/home/tinn/.local/lib/python2.7/site-packages/tensorflow/python/ops/array_ops.py", line 1437, in zeros
output = constant(zero, shape=shape, dtype=dtype, name=name)
File "/home/tinn/.local/lib/python2.7/site-packages/tensorflow/python/framework/constant_op.py", line 169, in constant
attrs={"value": tensor_value, "dtype": dtype_value}, name=name).outputs[0]
File "/home/tinn/.local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2240, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/home/tinn/.local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1128, in init
self._traceback = _extract_stack()

InternalError (see above for traceback): Dst tensor is not initialized.
[[Node: OptimizeLoss/zeros_2 = Constdtype=DT_FLOAT, value=Tensor<type: float shape: [5,5,96,256] values: [[[0 0 0]]]...>, _device="/job:localhost/replica:0/task:0/gpu:0"]]
`

from rude-carnie.

AdamMiltonBarker avatar AdamMiltonBarker commented on August 19, 2024

I did have two out of memory warnings earlier but the first two commands completed successfully, do you see an issue with running this on a GTX 750 ti ?

from rude-carnie.

dpressel avatar dpressel commented on August 19, 2024

I have not seen this problem either. Works on mine with this CL:

python2.7 train.py --train_dir /home/dpressel/dev/work/AgeGenderDeepLearning/Folds/tf/age_test_fold_is_0

I guess it could be the card as well, it looks like you have 1GB of RAM (I have 8GB on my lappy). You could probably test this by hacking it to run with tf.device('/cpu:0') explicitly and seeing if you get the same error.

from rude-carnie.

AdamMiltonBarker avatar AdamMiltonBarker commented on August 19, 2024

My specs are:

Intel CORE 2
QUAD 8 GB RAM 500 GB HDD
2GB GDDR5 GTX 750Ti

from rude-carnie.

dpressel avatar dpressel commented on August 19, 2024

You can try limiting the batch size or the hack I suggested above. Default batch size is 128, pass in --batch_size (e.g. 16) which should reduce the GPU memory its using if you think that is the problem

from rude-carnie.

AdamMiltonBarker avatar AdamMiltonBarker commented on August 19, 2024

Hi thanks I restarted the PC and ran everything again, I no longer got the memory errors on the initial data scripts and this time it is training, although I did get a warning about memory it just mentioned performance gains with more memory, it is so far training without modification, but if it does raise errors again I will try your suggestions.

from rude-carnie.

dpressel avatar dpressel commented on August 19, 2024

Are we good to close this?

from rude-carnie.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.