GithubHelp home page GithubHelp logo

Comments (8)

artemZholus avatar artemZholus commented on June 10, 2024

Hi, @Liuxueyi !

Could you please give more context on your setup? OS, python version, cuda version and how it was installed?
It looks like you have an issue with conflicting CUDA/CUDNN installation paths.

Also can you show the output of

import jax
m = jax.numpy.array([1,])
m@m

? (at least say whether it runs or fails)

Finally, is using docker or singularity an option for you?

from recall2imagine.

Liuxueyi avatar Liuxueyi commented on June 10, 2024

Thank you very much for your quick reply!
Here is my context version: Ubuntu 20.04, python 3.8, cuda 11.8. The cuda and cudnn are installed with jaxlib. I have built the docker with your instruction and try to run the code in docker.
I tried your example code, and the error is:

>>> import jax
>>> m = jax.numpy.array([1,])
>>> m@m
2024-03-27 00:14:27.594128: E external/xla/xla/stream_executor/cuda/cuda_dnn.cc:427] Loaded runtime CuDNN library: 8.5.0 but source was compiled with: 8.6.0.  CuDNN library needs to have matching major version and equal or higher minor version. If using a binary install, upgrade your CuDNN library.  If building from sources, make sure the library loaded at runtime is compatible with the version specified during compile configuration.
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/lxy/anaconda3/envs/r2i/lib/python3.8/site-packages/jax/_src/numpy/array_methods.py", line 258, in deferring_binary_op
    return binary_op(*args)
  File "/home/lxy/anaconda3/envs/r2i/lib/python3.8/site-packages/jax/_src/traceback_util.py", line 166, in reraise_with_filtered_traceback
    return fun(*args, **kwargs)
  File "/home/lxy/anaconda3/envs/r2i/lib/python3.8/site-packages/jax/_src/pjit.py", line 250, in cache_miss
    outs, out_flat, out_tree, args_flat, jaxpr = _python_pjit_helper(
  File "/home/lxy/anaconda3/envs/r2i/lib/python3.8/site-packages/jax/_src/pjit.py", line 163, in _python_pjit_helper
    out_flat = pjit_p.bind(*args_flat, **params)
  File "/home/lxy/anaconda3/envs/r2i/lib/python3.8/site-packages/jax/_src/core.py", line 2677, in bind
    return self.bind_with_trace(top_trace, args, params)
  File "/home/lxy/anaconda3/envs/r2i/lib/python3.8/site-packages/jax/_src/core.py", line 383, in bind_with_trace
    out = trace.process_primitive(self, map(trace.full_raise, args), params)
  File "/home/lxy/anaconda3/envs/r2i/lib/python3.8/site-packages/jax/_src/core.py", line 815, in process_primitive
    return primitive.impl(*tracers, **params)
  File "/home/lxy/anaconda3/envs/r2i/lib/python3.8/site-packages/jax/_src/pjit.py", line 1203, in _pjit_call_impl
    return xc._xla.pjit(name, f, call_impl_cache_miss, [], [], donated_argnums,
  File "/home/lxy/anaconda3/envs/r2i/lib/python3.8/site-packages/jax/_src/pjit.py", line 1187, in call_impl_cache_miss
    out_flat, compiled = _pjit_call_impl_python(
  File "/home/lxy/anaconda3/envs/r2i/lib/python3.8/site-packages/jax/_src/pjit.py", line 1120, in _pjit_call_impl_python
    compiled = _pjit_lower(
  File "/home/lxy/anaconda3/envs/r2i/lib/python3.8/site-packages/jax/_src/interpreters/pxla.py", line 2323, in compile
    executable = UnloadedMeshExecutable.from_hlo(
  File "/home/lxy/anaconda3/envs/r2i/lib/python3.8/site-packages/jax/_src/interpreters/pxla.py", line 2645, in from_hlo
    xla_executable, compile_options = _cached_compilation(
  File "/home/lxy/anaconda3/envs/r2i/lib/python3.8/site-packages/jax/_src/interpreters/pxla.py", line 2555, in _cached_compilation
    xla_executable = dispatch.compile_or_get_cached(
  File "/home/lxy/anaconda3/envs/r2i/lib/python3.8/site-packages/jax/_src/dispatch.py", line 497, in compile_or_get_cached
    return backend_compile(backend, computation, compile_options,
  File "/home/lxy/anaconda3/envs/r2i/lib/python3.8/site-packages/jax/_src/profiler.py", line 314, in wrapper
    return func(*args, **kwargs)
  File "/home/lxy/anaconda3/envs/r2i/lib/python3.8/site-packages/jax/_src/dispatch.py", line 465, in backend_compile
    return backend.compile(built_c, compile_options=options)
jax._src.traceback_util.UnfilteredStackTrace: jaxlib.xla_extension.XlaRuntimeError: FAILED_PRECONDITION: DNN library initialization failed. Look at the errors above for more details.

The stack trace below excludes JAX-internal frames.
The preceding is the original exception that occurred, unmodified.

--------------------

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/lxy/anaconda3/envs/r2i/lib/python3.8/site-packages/jax/_src/numpy/array_methods.py", line 258, in deferring_binary_op
    return binary_op(*args)
jaxlib.xla_extension.XlaRuntimeError: FAILED_PRECONDITION: DNN library initialization failed. Look at the errors above for more details.

from recall2imagine.

Liuxueyi avatar Liuxueyi commented on June 10, 2024

When I use docker to run the code, there also is an error about GPU:
No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.) Traceback (most recent call last): File "recall2imagine/train.py", line 241, in <module> main() File "recall2imagine/train.py", line 63, in main agent = agt.Agent(env.obs_space, env.act_space, step, config) File "/code/recall2imagine/jaxagent.py", line 20, in __init__ super().__init__(agent_cls, *args, **kwargs) File "/code/recall2imagine/jaxagent.py", line 37, in __init__ available = jax.devices(self.config.platform) File "/conda/miniconda/envs/py38/lib/python3.8/site-packages/jax/_src/xla_bridge.py", line 758, in devices return get_backend(backend).devices() File "/conda/miniconda/envs/py38/lib/python3.8/site-packages/jax/_src/xla_bridge.py", line 692, in get_backend return _get_backend_uncached(platform) File "/conda/miniconda/envs/py38/lib/python3.8/site-packages/jax/_src/xla_bridge.py", line 675, in _get_backend_uncached platform = canonicalize_platform(platform) File "/conda/miniconda/envs/py38/lib/python3.8/site-packages/jax/_src/xla_bridge.py", line 548, in canonicalize_platform raise RuntimeError(f"Unknown backend: '{platform}' requested, but no " RuntimeError: Unknown backend: 'gpu' requested, but no platforms that are instances of gpu are present. Platforms present are: cpu Gracefully stopping...
Is there any error about the installation of cuda in the base environment?
Thank you for your patience.

from recall2imagine.

artemZholus avatar artemZholus commented on June 10, 2024

@Liuxueyi did you create the conda environment from scratch or did you use an existing one? Also, if it's a new environment for just created for R2I, did you follow the installation instructions in our repo? This is the link: https://github.com/chandar-lab/Recall2Imagine?tab=readme-ov-file#conda

from recall2imagine.

Liuxueyi avatar Liuxueyi commented on June 10, 2024

Yes, I just followed the instructions and created a new conda environment for R2I. I think there may be any errors of cuda in my base environment. I will check it. And if I have some progress, I will share it. Thank you very much!

from recall2imagine.

artemZholus avatar artemZholus commented on June 10, 2024

Sure. Will be happy to help.

from recall2imagine.

artemZholus avatar artemZholus commented on June 10, 2024

@Liuxueyi , I am closing the issue for now since it's not related to our codebase. Please reopen it if you face an issue related to the codebase.

from recall2imagine.

Liuxueyi avatar Liuxueyi commented on June 10, 2024

When I add the instruction --gpus all , the error with docker is resolved.

from recall2imagine.

Related Issues (3)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.