GithubHelp home page GithubHelp logo

Comments (8)

ptrblck avatar ptrblck commented on August 23, 2024 1

@DannyDannyDanny
If no GPU is detected on the system, you won't be able to use apex.
We should improve the error message on importing apex and raise an Exception, if some apex methods are used.
A workaround would be to guard the apex import with if torch.cuda.is_available().

Your suggestion won't work, since
torch.version.cuda returns the CUDA version (e.g. 10.0.130), while torch.__version__ returns the PyTorch version (e.g. 1.3.0.dev20190923).

from transfer-learning-conv-ai.

thomwolf avatar thomwolf commented on August 23, 2024

You don't need apex to use the codebase, it's only if you want to do fp16 training.
The code base also run on CPU but I'm not sure you can do the training, it would be very slow.
If you only want to do inference (interact.py script) it should work.
The interact.py script works fine on my laptop on CPU.

from transfer-learning-conv-ai.

tonyhqanguyen avatar tonyhqanguyen commented on August 23, 2024

Yeah I suppose I won't be able to run the code due to infeasible training time, but when I run train.py, I get AttributeError: 'NoneType' object has no attribute 'split', for the line return tuple(int(x) for x in torch.version.cuda.split('.')) and I'm guessing because there is no cuda on my laptop. I think the problem is that you guys use apex in the pytorch_pretraining_bert to implement OpenAIGPTDoubleHeadsModel and some other stuff that are imported in the module.

from transfer-learning-conv-ai.

thomwolf avatar thomwolf commented on August 23, 2024

In which file is this line? (return tuple(int(x) for x in torch.version.cuda.split('.')))
I can't find it in our code base.
By the way if you have installed apex and don't have a GPU, you should uninstall it. It doesn't like having no GPUs.

from transfer-learning-conv-ai.

tonyhqanguyen avatar tonyhqanguyen commented on August 23, 2024

It's not in your code, it's in apex's code which you guys import in modeling.py, which is imported in pytorch_pretrained_bert.py, which is imported in train.py.

Thanks for the advice I'll uninstall it.

edit:

Here is the full traceback if it's helpful:

Traceback (most recent call last):
File "", line 1, in
File "/Applications/PyCharm.app/Contents/helpers/pydev/_pydev_bundle/pydev_umd.py", line 197, in runfile
pydev_imports.execfile(filename, global_vars, local_vars) # execute the script
File "/Applications/PyCharm.app/Contents/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "/Users/tnguyen/Desktop/recourse-nlp/transfer-learning-conv-ai/train.py", line 19, in
from pytorch_pretrained_bert import (OpenAIAdam, OpenAIGPTDoubleHeadsModel, OpenAIGPTTokenizer,
File "/Applications/PyCharm.app/Contents/helpers/pydev/_pydev_bundle/pydev_import_hook.py", line 21, in do_import
module = self._system_import(name, *args, **kwargs)
File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pytorch_pretrained_bert/init.py", line 7, in
from .modeling import (BertConfig, BertModel, BertForPreTraining,
File "/Applications/PyCharm.app/Contents/helpers/pydev/_pydev_bundle/pydev_import_hook.py", line 21, in do_import
module = self._system_import(name, *args, **kwargs)
File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pytorch_pretrained_bert/modeling.py", line 228, in
from apex.normalization.fused_layer_norm import FusedLayerNorm as BertLayerNorm
File "/Applications/PyCharm.app/Contents/helpers/pydev/_pydev_bundle/pydev_import_hook.py", line 21, in do_import
module = self._system_import(name, *args, **kwargs)
File "", line 983, in _find_and_load
File "", line 967, in _find_and_load_unlocked
File "", line 668, in _load_unlocked
File "", line 638, in _load_backward_compatible
File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/apex-0.1-py3.7.egg/apex/init.py", line 2, in
from . import amp
File "/Applications/PyCharm.app/Contents/helpers/pydev/_pydev_bundle/pydev_import_hook.py", line 21, in do_import
module = self._system_import(name, *args, **kwargs)
File "", line 983, in _find_and_load
File "", line 967, in _find_and_load_unlocked
File "", line 668, in _load_unlocked
File "", line 638, in _load_backward_compatible
File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/apex-0.1-py3.7.egg/apex/amp/init.py", line 1, in
from .amp import init, half_function, float_function, promote_function,
File "/Applications/PyCharm.app/Contents/helpers/pydev/_pydev_bundle/pydev_import_hook.py", line 21, in do_import
module = self._system_import(name, *args, **kwargs)
File "", line 983, in _find_and_load
File "", line 967, in _find_and_load_unlocked
File "", line 668, in _load_unlocked
File "", line 638, in _load_backward_compatible
File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/apex-0.1-py3.7.egg/apex/amp/amp.py", line 3, in
from .lists import functional_overrides, torch_overrides, tensor_overrides
File "/Applications/PyCharm.app/Contents/helpers/pydev/_pydev_bundle/pydev_import_hook.py", line 21, in do_import
module = self._system_import(name, *args, **kwargs)
File "", line 983, in _find_and_load
File "", line 967, in _find_and_load_unlocked
File "", line 668, in _load_unlocked
File "", line 638, in _load_backward_compatible
File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/apex-0.1-py3.7.egg/apex/amp/lists/torch_overrides.py", line 69, in
if utils.get_cuda_version() >= (9, 1, 0):
File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/apex-0.1-py3.7.egg/apex/amp/utils.py", line 9, in get_cuda_version
return tuple(int(x) for x in torch.version.cuda.split('.'))
AttributeError: 'NoneType' object has no attribute 'split'

from transfer-learning-conv-ai.

DannyDannyDanny avatar DannyDannyDanny commented on August 23, 2024

It's not in your code, it's in apex's code which you guys import in modeling.py, which is imported in pytorch_pretrained_bert.py, which is imported in train.py.

Thanks for the advice I'll uninstall it...

I'm getting a the same error. The last two lines indicate that torch.version.cuda is returning None. This problem is that the method get_cuda_version in .../python3.7/site-packages/apex/amp/utils.py" on line 9 looks like:

def get_cuda_version():
    return tuple(int(x) for x in torch.version.cuda.split('.'))

...where instead it should be:

def get_cuda_version():
    return tuple(int(x) for x in torch.__version__.split('.'))

This is an issue with torch.

from transfer-learning-conv-ai.

shamoons avatar shamoons commented on August 23, 2024

Is there any older version that works without a GPU?

from transfer-learning-conv-ai.

Frank-Dz avatar Frank-Dz commented on August 23, 2024

If you are using 3090, cuda11.0 seems not ok for apex but cuda11.1 is ok.
I did the following and successfully installed apex 0.1

 pip install torch==1.8.1+cu111 torchvision==0.9.1+cu111 torchaudio==0.8.1 -f https://download.pytorch.org/whl/torch_stable.html                  
cd apex
 pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./   

from transfer-learning-conv-ai.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.