GithubHelp home page GithubHelp logo

Comments (14)

b2renger avatar b2renger commented on June 23, 2024 1

@cvalenzuela Victory \O/ !


Training complete. For evaluation:
    `python evaluate.py --checkpoint checkpoints/ ...`
Converting model to ml5js
Writing manifest to models/manifest.json
Done! Checkpoint saved. Visit https://ml5js.org/docs/StyleTransfer for more information

Now I'll test the model and start on documenting the docker process.

from training-styletransfer.

cvalenzuela avatar cvalenzuela commented on June 23, 2024

Seems to be running out of memory. Are you running any other scripts? and which version of CUDA are you using?

from training-styletransfer.

b2renger avatar b2renger commented on June 23, 2024

ok, everything should be the latest version as I am starting from fresh install. (sorry but I can't check right now, I'll get back to you with the exact versions of everything when I have access to the machine).

So I guess 9.2 on windows.

The ubuntu drivers for a GTX1080 are 396.xx, and the rest should be :
cudatoolkit: 9.0-h13b8566_0
cudnn: 7.1.2-cuda9.0_0
cupti: 9.0.176-0

Nothing else is running.

Are there specific version of cudatoolkit, cudnn etc. that I should be aware of ?

BTW, I noticed that a requirement is missing to run everything : moviepy, that is not available via a classic 'conda install moviepy', users should do a 'conda install -c conda-forge moviepy ' instead.

Installing eveything is quite tricky, maybe we should focus on the docker documentation ? specially if we can run on the gpu using docker. What do you think ?

from training-styletransfer.

cvalenzuela avatar cvalenzuela commented on June 23, 2024

Your specs look right. All the dependencies are being install here: https://github.com/ml5js/training-styletransfer/blob/master/Dockerfile#L20, including moviepy

Can you try using the docker image instead? Perhaps building the .DockerFile or pulling from the docker hub this image: cvalenzuelab/styletransfer

from training-styletransfer.

b2renger avatar b2renger commented on June 23, 2024

thanks @cvalenzuela I'll look into it

from training-styletransfer.

cvalenzuela avatar cvalenzuela commented on June 23, 2024

lmk how it goes!

from training-styletransfer.

b2renger avatar b2renger commented on June 23, 2024

@cvalenzuela

I may need some help to actually run the docker container, if I can get through this I may be able to write some doc about it. I've been looking at the docker documentation and here's what I think I should do :

first I do need to pull the docker file :
docker pull cvalenzuelab/styletransfer:latest
or I can build it locally
docker build -t cvalenzuelab/styletransfer -f Dockerfile .

the I need to run it and if I am not mistaken it should go like this :
docker run -it -p 8888:8888 -p 6006:6006 -v /Users/:/root/sharedfolder cvalenzuelab/styletransfer bash

after that I should navigate to the folder where I've clone this repo and run the scripts (from the step 3 of the readme) am I right ?

from training-styletransfer.

cvalenzuela avatar cvalenzuela commented on June 23, 2024

yes! that's right. It will probably be a good idea to add those instructions as well

from training-styletransfer.

b2renger avatar b2renger commented on June 23, 2024

Ok so for future reference, running docker on linux with access to the home folder is a bit trickier that the above cli (because of users and permissions), it looks like this instead :
sudo docker run -e USER=$USER -e USERID=$UID -v $PWD:$PWD -w=$PWD -it -p 8888:8888 -p 6006:6006 -v ~:/root/sharedfolder cvalenzuelab/styletransfer bash

to be continued ...

from training-styletransfer.

b2renger avatar b2renger commented on June 23, 2024

@cvalenzuela
on ubuntu, while being in the container after running the command above to start it and running the setup.py script when I try to run run.sh or the direct python cmd to style I get this error :

root@287cf95c435c:/home/lasseter/git/training-styletransfer# bash run.sh
Traceback (most recent call last):
  File "style.py", line 6, in <module>
    from optimize import optimize
  File "src/optimize.py", line 3, in <module>
    import vgg, pdb, time
  File "src/vgg.py", line 3, in <module>
    import tensorflow as tf
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/__init__.py", line 24, in <module>
    from tensorflow.python import *
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/__init__.py", line 49, in <module>
    from tensorflow.python import pywrap_tensorflow
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 72, in <module>
    raise ImportError(msg)
ImportError: Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in <module>
    from tensorflow.python.pywrap_tensorflow_internal import *
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>
    _pywrap_tensorflow_internal = swig_import_helper()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
    _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
ImportError: libcuda.so.1: cannot open shared object file: No such file or directory


Failed to load the native TensorFlow runtime.

from training-styletransfer.

cvalenzuela avatar cvalenzuela commented on June 23, 2024

There seems to be a problem with your CUDA runtime.
Are you using nvidia-docker?

from training-styletransfer.

b2renger avatar b2renger commented on June 23, 2024

No I'm not !

So @cvalenzuela I installed nvidia-docker following those instructions : https://github.com/NVIDIA/nvidia-docker

the nvidia-docker --version cmd returns :
Docker version 18.06.1-ce, build e68fc7a

Run it with
sudo nvidia-docker run -e USER=$USER -e USERID=$UID -v $PWD:$PWD -w=$PWD -it -p 8888:8888 -p 6006:6006 -v ~/home/lasseter/:/home cvalenzuelab/styletransfer bash

this ends up with a RessourceExhaustedError again, here is the full log :

ml5.js Style Transfer Training!
Note: This traning will take a couple of hours.
Training is starting!...
Train set has been trimmed slightly..
(1, 1400, 1200, 3)
UID: 32
Traceback (most recent call last):
File "style.py", line 179, in
main()
File "style.py", line 156, in main
for preds, losses, i, epoch in optimize(*args, **kwargs):
File "src/optimize.py", line 114, in optimize
train_step.run(feed_dict=feed_dict)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2042, in run
_run_using_default_session(self, feed_dict, self.graph, session)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 4490, in _run_using_default_session
session.run(operation, feed_dict)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 889, in run
run_metadata_ptr)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1120, in _run
feed_dict_tensor, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1317, in _do_run
options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1336, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[20,4096,256]
[[Node: gradients/transpose_2_grad/transpose = Transpose[T=DT_FLOAT, Tperm=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](gradients/MatMul_2_grad/tuple/control_dependency, gradients/transpose_grad/InvertPermutation)]]

Caused by op u'gradients/transpose_2_grad/transpose', defined at:
File "style.py", line 179, in
main()
File "style.py", line 156, in main
for preds, losses, i, epoch in optimize(*args, **kwargs):
File "src/optimize.py", line 91, in optimize
train_step = tf.train.AdamOptimizer(learning_rate).minimize(loss)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/optimizer.py", line 343, in minimize
grad_loss=grad_loss)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/optimizer.py", line 414, in compute_gradients
colocate_gradients_with_ops=colocate_gradients_with_ops)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gradients_impl.py", line 581, in gradients
grad_scope, op, func_call, lambda: grad_fn(op, *out_grads))
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gradients_impl.py", line 353, in _MaybeCompile
return grad_fn() # Exit early
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gradients_impl.py", line 581, in
grad_scope, op, func_call, lambda: grad_fn(op, *out_grads))
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/array_grad.py", line 505, in _TransposeGrad
return [array_ops.transpose(grad, array_ops.invert_permutation(p)), None]
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/array_ops.py", line 1336, in transpose
ret = gen_array_ops.transpose(a, perm, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_array_ops.py", line 5694, in transpose
"Transpose", x=x, perm=perm, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2956, in create_op
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1470, in init
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

...which was originally created as op u'transpose_2', defined at:
File "style.py", line 179, in
main()
[elided 0 identical lines from previous traceback]
File "style.py", line 156, in main
for preds, losses, i, epoch in optimize(*args, **kwargs):
File "src/optimize.py", line 74, in optimize
feats_T = tf.transpose(feats, perm=[0,2,1])
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/array_ops.py", line 1336, in transpose
ret = gen_array_ops.transpose(a, perm, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_array_ops.py", line 5694, in transpose
"Transpose", x=x, perm=perm, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2956, in create_op
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1470, in init
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[20,4096,256]
[[Node: gradients/transpose_2_grad/transpose = Transpose[T=DT_FLOAT, Tperm=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](gradients/MatMul_2_grad/tuple/control_dependency, gradients/transpose_grad/InvertPermutation)]]

from training-styletransfer.

b2renger avatar b2renger commented on June 23, 2024

FIY I tried to reduce the bacthsize to 16, it seems to be working for now ... I'll keep you posted in a few hours

from training-styletransfer.

cvalenzuela avatar cvalenzuela commented on June 23, 2024

great! glad to hear

from training-styletransfer.

Related Issues (8)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.