GithubHelp home page GithubHelp logo

Comments (9)

miesav avatar miesav commented on August 28, 2024 2

c1bf7af 7f0b5ad solve this issue with the Dockerfile changes. Thanks a lot @chrisroat

from alphafold.

tfgg avatar tfgg commented on August 28, 2024 2

Could you try changing https://github.com/deepmind/alphafold/blob/main/docker/Dockerfile#L15 to CUDA 11.1? According to https://en.wikipedia.org/wiki/CUDA, 11.1+ is required for CC 8.6 (some of the more recent Ampere series but not the A100).

We will be looking at upgrading the CUDA version in this repository, but this requires careful benchmarking to ensure that there are no performance or accuracy regressions, so it will be faster for you to try this locally for now.

from alphafold.

miesav avatar miesav commented on August 28, 2024 2

Thank you @tfgg. I experimented a bit with the Dockerfile initially. I can confirm the current master works with Nvidia GPUs based on the GA100 chip (A100, A30). Upgraded the CUDA toolkit to 11.2 for GPUs based on the GA102 chip (A10, A40, RTX AX000) and everything worked fine so far.

from alphafold.

RodenLuo avatar RodenLuo commented on August 28, 2024

Hi, I think I'm having a similar error. I'm on NVIDIA RTX A6000. After I pull the latest code I still have the error. Could you please check and see if it can be solved? Thanks. Below I attached the last few lines of the log. If more info is needed, please let me know.

EEAQITYSWKKDSSPVEGSTNVYTVDTSSVGSQTIEVTATVTAADYNPVTVTKTGNVTVTAKVAPEPEGELPYVHPLPHRSSAYIWCGWWVMDEIQKMTEEGKDWKTDDPDSKYYLHRYTLQKMMKDYPEVDVQESRNGYIIHKTALETGIIYTYP, template: VLEVDTQGTVVCSLDGLFPVSEAQVHLALGDQRLNPTVTYGNDSFSAKASVSVTAEDEGTQRLTCAVILGNQSQETLQTVTIYSFPAPNVILTKPEVSEGTEVTVKCEAHPRAKVTLNGVPAQPLGPRAQLLLKATPEDNGRSFSCSATLEVAGQLIHKNQTRELRVLYGPRLDERDCPGNWTWPENSQQTPMCQAWGNPLPELKCLKDGTFPLPIGESVTVTRDLEGTYLCRAR
I0729 16:15:52.818250 23060273533120 run_docker.py:193] I0729 13:15:52.817516 140379006277376 templates.py:271] Found an exact template
match 1z7z_I.
I0729 16:15:53.303489 23060273533120 run_docker.py:193] I0729 13:15:53.302619 140379006277376 run_alphafold.py:141] Running model model_1
I0729 16:15:55.840626 23060273533120 run_docker.py:193] 2021-07-29 13:15:55.839955: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusolver.so.11'; dlerror: libcusolver.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
I0729 16:15:55.840924 23060273533120 run_docker.py:193] 2021-07-29 13:15:55.840622: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1766] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
I0729 16:15:55.841116 23060273533120 run_docker.py:193] Skipping registering GPU devices...
I0729 16:16:00.499923 23060273533120 run_docker.py:193] I0729 13:16:00.498721 140379006277376 model.py:132] Running predict with shape(feat) = {'aatype': (32, 376), 'residue_index': (32, 376), 'seq_length': (32,), 'template_aatype': (32, 4, 376), 'template_all_atom_masks': (32, 4, 376, 37), 'template_all_atom_positions': (32, 4, 376, 37, 3), 'template_sum_probs': (32, 4, 1), 'is_distillation': (32,), 'seq_mask': (32, 376), 'msa_mask': (32, 508, 376), 'msa_row_mask': (32, 508), 'random_crop_to_size_seed': (32, 2), 'template_mask': (32, 4), 'template_pseudo_beta': (32, 4, 376, 3), 'template_pseudo_beta_mask': (32, 4, 376), 'atom14_atom_exists': (32, 376, 14), 'residx_atom14_to_atom37': (32, 376, 14), 'residx_atom37_to_atom14': (32, 376, 37), 'atom37_atom_exists': (32, 376, 37), 'extra_msa': (32, 5120, 376), 'extra_msa_mask': (32, 5120, 376), 'extra_msa_row_mask': (32, 5120), 'bert_mask': (32, 508, 376), 'true_msa': (32, 508, 376), 'extra_has_deletion': (32, 5120, 376), 'extra_deletion_value': (32, 5120, 376), 'msa_feat': (32, 508, 376, 49), 'target_feat': (32, 376, 22)}
I0729 16:16:01.432996 23060273533120 run_docker.py:193] 2021-07-29 13:16:01.432244: W external/org_tensorflow/tensorflow/stream_executor/gpu/asm_compiler.cc:235] Falling back to the CUDA driver for PTX compilation; ptxas does not support CC 8.6
I0729 16:16:01.433186 23060273533120 run_docker.py:193] 2021-07-29 13:16:01.432302: W external/org_tensorflow/tensorflow/stream_executor/gpu/asm_compiler.cc:238] Used ptxas at ptxas
I0729 16:16:01.435479 23060273533120 run_docker.py:193] 2021-07-29 13:16:01.434778: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:625] failed to get PTX kernel "shift_right_logical_3" from module: CUDA_ERROR_NOT_FOUND: named symbol not found
I0729 16:16:01.435665 23060273533120 run_docker.py:193] 2021-07-29 13:16:01.434875: E external/org_tensorflow/tensorflow/compiler/xla/pjrt/pjrt_stream_executor_client.cc:2040] Execution of replica 0 failed: Internal: Could not find the corresponding function
I0729 16:16:01.440179 23060273533120 run_docker.py:193] Traceback (most recent call last):
I0729 16:16:01.440402 23060273533120 run_docker.py:193] File "/app/alphafold/run_alphafold.py", line 302, in <module>
I0729 16:16:01.440572 23060273533120 run_docker.py:193] app.run(main)
I0729 16:16:01.440726 23060273533120 run_docker.py:193] File "/opt/conda/lib/python3.7/site-packages/absl/app.py", line 312, in run
I0729 16:16:01.440872 23060273533120 run_docker.py:193] _run_main(main, args)
I0729 16:16:01.441013 23060273533120 run_docker.py:193] File "/opt/conda/lib/python3.7/site-packages/absl/app.py", line 258, in _run_main
I0729 16:16:01.441150 23060273533120 run_docker.py:193] sys.exit(main(argv))
I0729 16:16:01.441284 23060273533120 run_docker.py:193] File "/app/alphafold/run_alphafold.py", line 284, in main
I0729 16:16:01.441416 23060273533120 run_docker.py:193] random_seed=random_seed)
I0729 16:16:01.441550 23060273533120 run_docker.py:193] File "/app/alphafold/run_alphafold.py", line 148, in predict_structure
I0729 16:16:01.441683 23060273533120 run_docker.py:193] prediction_result = model_runner.predict(processed_feature_dict)
I0729 16:16:01.441817 23060273533120 run_docker.py:193] File "/app/alphafold/alphafold/model/model.py", line 133, in predict
I0729 16:16:01.441951 23060273533120 run_docker.py:193] result = self.apply(self.params, jax.random.PRNGKey(0), feat)
I0729 16:16:01.442085 23060273533120 run_docker.py:193] File "/opt/conda/lib/python3.7/site-packages/jax/_src/random.py", line 75, in PRNGKey
I0729 16:16:01.442246 23060273533120 run_docker.py:193] k1 = convert(lax.shift_right_logical(seed_arr, lax._const(seed_arr, 32)))
I0729 16:16:01.442380 23060273533120 run_docker.py:193] File "/opt/conda/lib/python3.7/site-packages/jax/_src/lax/lax.py", line 382, in
shift_right_logical
I0729 16:16:01.442511 23060273533120 run_docker.py:193] return shift_right_logical_p.bind(x, y)
I0729 16:16:01.442642 23060273533120 run_docker.py:193] File "/opt/conda/lib/python3.7/site-packages/jax/core.py", line 264, in bind
I0729 16:16:01.442774 23060273533120 run_docker.py:193] out = top_trace.process_primitive(self, tracers, params)
I0729 16:16:01.442907 23060273533120 run_docker.py:193] File "/opt/conda/lib/python3.7/site-packages/jax/core.py", line 604, in process_primitive
I0729 16:16:01.443041 23060273533120 run_docker.py:193] return primitive.impl(*tracers, **params)
I0729 16:16:01.443177 23060273533120 run_docker.py:193] File "/opt/conda/lib/python3.7/site-packages/jax/interpreters/xla.py", line 262, in apply_primitive
I0729 16:16:01.443312 23060273533120 run_docker.py:193] return compiled_fun(*args)
I0729 16:16:01.443447 23060273533120 run_docker.py:193] File "/opt/conda/lib/python3.7/site-packages/jax/interpreters/xla.py", line 378, in _execute_compiled_primitive
I0729 16:16:01.443582 23060273533120 run_docker.py:193] out_bufs = compiled.execute(input_bufs)
I0729 16:16:01.443713 23060273533120 run_docker.py:193] RuntimeError: Internal: Could not find the corresponding function

from alphafold.

amrhamedp avatar amrhamedp commented on August 28, 2024

i have the same problem with ampere cards. Can you please fix this problem?

from alphafold.

RodenLuo avatar RodenLuo commented on August 28, 2024

Thank you! I confirm CUDA 11.1 works with NVIDIA RTX A6000. Changing the following line

https://github.com/deepmind/alphafold/blob/b88f8dacef5d94e4d3d49613d08523feb20caec1/docker/Dockerfile#L15

to ARG CUDA=11.1 and then rebuild the docker image solves the error I reported above.

from alphafold.

amrhamedp avatar amrhamedp commented on August 28, 2024

11.2 has compatibility issues i believe.11.1 works

from alphafold.

jucastil avatar jucastil commented on August 28, 2024

I god rid of the error also. Thanks! Running dockers on CentOS 7.9.2009 (Core), with NVIDIA GeForce RTX 3090.

from alphafold.

Augustin-Zidek avatar Augustin-Zidek commented on August 28, 2024

This was fixed in 57a2455.

from alphafold.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.