hpcaitech / fastfold Goto Github PK
View Code? Open in Web Editor NEWOptimizing AlphaFold Training and Inference on GPU Clusters
License: Apache License 2.0
Optimizing AlphaFold Training and Inference on GPU Clusters
License: Apache License 2.0
It looks like that "predicted aligned error" and pTM are not included in the prediction_results dict in case of openfold/fastfold. Is this planned to be added?
I tried to predict a 5 subunit complex (in total ~5000 aa) and get the following error with various settings (1-4x A100 80GB, w/ and w/o --inplace, w/ and w/o --chunk_size 1-32). The error seems to be associated with exceeding the GPU memory and I am not sure if this is normal at the given sequence length and available GPU memory. I installed fastfold from the recent commit 930a58a into a clean conda environment and built triton from source. For a smaller complex (~2000 aa) it ran without errors.
terminate called after throwing an instance of 'c10::Error'
what(): NCCL error in: /opt/conda/conda-bld/pytorch_1659484810403/work/torch/csrc/distributed/c10d/NCCLUtils.hpp:173, unhandled cuda error, NCCL version 2.10.3
Process Group destroyed on rank 1
Exception raised from ncclCommAbort at /opt/conda/conda-bld/pytorch_1659484810403/work/torch/csrc/distributed/c10d/NCCLUtils.hpp:173 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x57 (0x7f43cf264497 in .../fastfold/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x7f43cf23bc94 in .../fastfold/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #2: <unknown function> + 0x19ea61 (0x7f44092e2a61 in .../fastfold/lib/python3.8/site-packages/torch/lib/libtorch_cuda_cpp.so)
frame #3: c10d::ProcessGroupNCCL::~ProcessGroupNCCL() + 0x118 (0x7f44092c6098 in .../fastfold/lib/python3.8/site-packages/torch/lib/libtorch_cuda_cpp.so)
frame #4: c10d::ProcessGroupNCCL::~ProcessGroupNCCL() + 0x9 (0x7f44092c6369 in .../fastfold/lib/python3.8/site-packages/torch/lib/libtorch_cuda_cpp.so)
frame #5: <unknown function> + 0x9d7799 (0x7f440f4fd799 in .../fastfold/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #6: <unknown function> + 0x354732 (0x7f440ee7a732 in .../fastfold/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #7: <unknown function> + 0x3555ff (0x7f440ee7b5ff in .../fastfold/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #8: <unknown function> + 0x116878 (0x55a0881ca878 in .../fastfold/bin/python3)
frame #9: <unknown function> + 0x11699d (0x55a0881ca99d in .../fastfold/bin/python3)
frame #10: <unknown function> + 0x1fd471 (0x55a0882b1471 in .../fastfold/bin/python3)
frame #11: <unknown function> + 0x10e937 (0x55a0881c2937 in .../fastfold/bin/python3)
frame #12: _PyGC_CollectNoFail + 0x2b (0x55a0882b134b in .../fastfold/bin/python3)
frame #13: PyImport_Cleanup + 0x371 (0x55a0882b11b1 in .../fastfold/bin/python3)
frame #14: Py_FinalizeEx + 0x7a (0x55a0882aff9a in .../fastfold/bin/python3)
frame #15: Py_Exit + 0x8 (0x55a0881454bc in .../fastfold/bin/python3)
frame #16: <unknown function> + 0x9141b (0x55a08814541b in .../fastfold/bin/python3)
frame #17: <unknown function> + 0x910ee (0x55a0881450ee in .../fastfold/bin/python3)
frame #18: PyRun_SimpleStringFlags + 0x4a (0x55a088141f12 in .../fastfold/bin/python3)
frame #19: Py_RunMain + 0x27b (0x55a0882abc1b in .../fastfold/bin/python3)
frame #20: Py_BytesMain + 0x39 (0x55a088283619 in .../fastfold/bin/python3)
frame #21: __libc_start_main + 0xf5 (0x7f444b239555 in /lib64/libc.so.6)
frame #22: <unknown function> + 0x1cf525 (0x55a088283525 in .../fastfold/bin/python3)
Traceback (most recent call last):
File ".../.../FastFold/inference.py", line 519, in <module>
main(args)
File ".../.../FastFold/inference.py", line 149, in main
inference_multimer_model(args)
File ".../.../FastFold/inference.py", line 282, in inference_multimer_model
torch.multiprocessing.spawn(inference_model, nprocs=args.gpus, args=(args.gpus, result_q, batch, args))
File ".../fastfold/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 240, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File ".../fastfold/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 198, in start_processes
while not context.join():
File ".../fastfold/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 160, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedExceptio
-- Process 0 terminated with the following error:
Traceback (most recent call last):
File ".../fastfold/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 69, in _wrap
fn(i, *args)
File ".../.../FastFold/inference.py", line 136, in inference_model
out = model(batch)
File ".../fastfold/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File ".../.../FastFold/fastfold/model/hub/alphafold.py", line 507, in forward
outputs, m_1_prev, z_prev, x_prev = self.iteration(
File ".../.../FastFold/fastfold/model/hub/alphafold.py", line 264, in iteration
template_embeds = self.template_embedder(
File ".../fastfold/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File ".../.../FastFold/fastfold/model/fastnn/embedders_multimer.py", line 351, in forward
self.template_single_embedder(
File ".../fastfold/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File ".../.../FastFold/fastfold/model/fastnn/embedders_multimer.py", line 238, in forward
all_atom_multimer.compute_chi_angles(
File ".../.../FastFold/fastfold/utils/all_atom_multimer.py", line 403, in compute_chi_angles
chi_atom_indices = get_chi_atom_indices(aatype.device)
File ".../.../FastFold/fastfold/utils/all_atom_multimer.py", line 365, in get_chi_atom_indices
return torch.tensor(chi_atom_indices, device=device)
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
hello,I am very happy to see that fastfold has added the multimer function, but I have a problem. When using the monomer function, I still cannot predict 5 models at once. Is there a solution for this?
here is my scripts:
#!/bin/bash
#DSUB --job_type cosched
#DSUB -n fastfold
#DSUB -A root.bingxing2.gpuuser001
#DSUB -q root.default
#DSUB -R 'cpu=12;gpu=2;mem=90000'
#DSUB -l wuhanG5500
#DSUB -N 1
#DSUB -e %J.out
#DSUB -o %J.out
######################查看gpu利用率################################################
STATE_FILE="state_${BATCH_JOB_ID}"
/usr/bin/touch ${STATE_FILE}
function gpus_collection(){
while [[ cat "${STATE_FILE}" | grep "over" | wc -l
== "0" ]]; do
/usr/bin/sleep 1
/usr/bin/nvidia-smi >> "gpu_${BATCH_JOB_ID}.log"
done
}
gpus_collection &
#####################AF2计算部分###################################################
module load anaconda/2021.11
module load cuda/11.3.0-gcc-4.8.5-oaa
module load gcc/9.3.0-gcc-4.8.5-bxl
source activate fastfold
af2Root=/home/bingxing2/public
python inference.py mono.fasta $af2Root/alphafold2.2.0/pdb_mmcif/mmcif_files
--output_dir ./mono_out
--uniref90_database_path $af2Root/uniref90/uniref90.fasta
--mgnify_database_path $af2Root/mgnify/mgy_clusters.fa
--pdb70_database_path $af2Root/pdb70/pdb70
--param_path $af2Root/alphafold2.2.0/params/params_model_1.npz
--uniclust30_database_path $af2Root/uniclust30/uniclust30_2018_08/uniclust30_2018_08
--bfd_database_path $af2Root/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt
--jackhmmer_binary_path which jackhmmer
--hhblits_binary_path which hhblits
--hhsearch_binary_path which hhsearch
--kalign_binary_path which kalign
--gpus 2
--enable_workflow
--chunk_size 1
--inplace
echo "over" >> "${STATE_FILE}"
Hi, I'd like to run FastFold for a long sequence with no templates. Is this setting supported by FastFold?
I'm building an MSA only from Uniref90 DB for starters.
The most recent update from AlphaFold v2.3.0 includes updated parameters
Running inference.py using these update parameters (v3) throws the following error. The same command is successful for parameters from previous versions.
Multimer command
python ~/FastFold/inference.py multimer_query.fasta \
~/alphafold-2.3.0_data/pdb_mmcif/mmcif_files/ \
--use_precomputed_alignments ./alignments \
--output_dir ./multimer_query_fastfold_v3 \
--gpus 1 --model_preset multimer \
--uniref90_database_path ~/alphafold-2.3.0_data/uniref90/uniref90.fasta \
--mgnify_database_path ~/alphafold-2.3.0_data/mgnify/mgy_clusters_2022_05.fa \
--pdb70_database_path ~/alphafold-2.3.0_data/pdb70/pdb70 \
--uniclust30_database_path ~/alphafold-2.3.0_data/uniref30/UniRef30_2021_03 \
--bfd_database_path ~/alphafold-2.3.0_data/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \
--uniprot_database_path ~/alphafold-2.3.0_data/uniprot/uniprot.fasta \
--pdb_seqres_database_path ~/alphafold-2.3.0_data/pdb_seqres/pdb_seqres.txt \
--param_path ~/alphafold-2.3.0_data/params/params_model_1_multimer_v3.npz \
--model_name model_1_multimer_v3 \
--jackhmmer_binary_path `which jackhmmer` \
--hhblits_binary_path `which hhblits` \
--hhsearch_binary_path `which hhsearch` \
--kalign_binary_path `which kalign` \
--chunk_size 8 --inplace
Error is pasted below
[12/22/22 13:28:14] INFO colossalai - colossalai - INFO: ~/conda/envs/fastfold/lib/python3.8/site-packages/colossalai/context/parallel_context.py:557
set_seed
INFO colossalai - colossalai - INFO: initialized seed on rank 0, numpy: 1024, python random: 1024, ParallelMode.DATA: 1024, ParallelMode.TENSOR: 1024,the default
parallel seed is ParallelMode.DATA.
INFO colossalai - colossalai - INFO: ~/conda/envs/fastfold/lib/python3.8/site-packages/colossalai/initialize.py:117 launch
INFO colossalai - colossalai - INFO: Distributed environment is initialized, data parallel size: 1, pipeline parallel size: 1, tensor parallel size: 1
Traceback (most recent call last):
File "~/FastFold/inference.py", line 513, in <module>
main(args)
File "~/FastFold/inference.py", line 148, in main
inference_multimer_model(args)
File "~/FastFold/inference.py", line 276, in inference_multimer_model
torch.multiprocessing.spawn(inference_model, nprocs=args.gpus, args=(args.gpus, result_q, batch, args))
File "~/conda/envs/fastfold/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 240, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "~/conda/envs/fastfold/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 198, in start_processes
while not context.join():
File "~/conda/envs/fastfold/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 160, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:
-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "~/conda/envs/fastfold/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 69, in _wrap
fn(i, *args)
File "~/FastFold/inference.py", line 123, in inference_model
import_jax_weights_(model, args.param_path, version=args.model_name)
File "~/FastFold/fastfold/utils/import_weights.py", line 580, in import_jax_weights_
assert len(incorrect) == 0
AssertionError
I followed the installation instruction on anaconda but receive an error on fastfold_softmax_cuda
. The machine has cuda version 11.6.2 installed.
Colossalai should be built with cuda extension to use the FP16 optimizer
If you want to activate cuda mode for MoE, please install with cuda_ext!
Traceback (most recent call last):
File "inference.py", line 25, in <module>
from fastfold.model.hub import AlphaFold
File "/scratch/FastFold/fastfold/model/hub/__init__.py", line 1, in <module>
from .alphafold import AlphaFold
File "/scratch/FastFold/fastfold/model/hub/alphafold.py", line 20, in <module>
from fastfold.utils.feats import (
File "/scratch/FastFold/fastfold/utils/__init__.py", line 1, in <module>
from .inject_fastnn import inject_fastnn
File "/scratch/FastFold/fastfold/utils/inject_fastnn.py", line 9, in <module>
from fastfold.model.fastnn import MSAStack, OutProductMean, PairStack
File "/scratch/FastFold/fastfold/model/fastnn/__init__.py", line 1, in <module>
from .msa import MSAStack
File "/scratch/FastFold/fastfold/model/fastnn/msa.py", line 6, in <module>
from fastfold.model.fastnn.kernel import LayerNorm
File "/scratch/FastFold/fastfold/model/fastnn/kernel/__init__.py", line 3, in <module>
from .cuda_native.softmax import softmax, scale_mask_softmax, scale_mask_bias_softmax
File "/scratch/FastFold/fastfold/model/fastnn/kernel/cuda_native/softmax.py", line 7, in <module>
fastfold_softmax_cuda = importlib.import_module("fastfold_softmax_cuda")
File "/share/siegellab/software/kschu/anaconda3/envs/fastfold/lib/python3.8/importlib/__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
ModuleNotFoundError: No module named 'fastfold_softmax_cuda'
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 2184436) of binary: /share/siegellab/software/kschu/anaconda3/envs/fastfold/bin/python
Traceback (most recent call last):
File "/share/siegellab/software/kschu/anaconda3/envs/fastfold/bin/torchrun", line 33, in <module>
sys.exit(load_entry_point('torch==1.11.0', 'console_scripts', 'torchrun')())
File "/share/siegellab/software/kschu/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper
return f(*args, **kwargs)
File "/share/siegellab/software/kschu/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/distributed/run.py", line 724, in main
run(args)
File "/share/siegellab/software/kschu/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/distributed/run.py", line 715, in run
elastic_launch(
File "/share/siegellab/software/kschu/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
File "/share/siegellab/software/kschu/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
inference.py FAILED
------------------------------------------------------------
Failures:
<NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
time : 2022-05-04_06:49:56
host : kakawa-1
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 2184436)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================
Hi, When i read the support information of AlphaFold2, I got confused about the "1.11.8 Reducing the memory consumption". It said that when using the technique called gradient checkpoint, the memory consumption can be reduced to square size from cubic size when training. And when making inference, the set of the chunk of layers can also change the memory from cubic size into square size. I don't know why this can be done? Can anyone give me a hand?
Hi,i was tring to test H1044.fasta on NVIDIA A100,when i set the nproc_per_node=2,i met the memory problem as follows:
RuntimeError: CUDA out of memory. Tried to allocate 77.40 GiB (GPU 0; 38.61 GiB total capacity; 13.21 GiB already allocated; 8.81 GiB free; 26.90 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Is that normal?Thanks for your reply ^^
Hello.
I need to predict a protein of length 2038aa, alphafold can't satisfy my needs, but I have alphafold's database, can I use it directly for FastFold?
I'm confused about dap,
https://github.com/microsoft/DeepSpeed
For even faster training :)
Your work surely achieved a high performance in training of AlphaFold2. i am wondering about the accuracy of your inference results. Have you made a benchmark of this part with AlphaFold2? Or can you publish your models for us to make a use of.
Enable saving .pkl files as they contain pLDDT and pTM values. This information is useful in downstream analysis.
Hi!
Alphafold's multimer model produces iptm scores that are saved within the pickle file as below:
distogram
experimentally_resolved
masked_msa
predicted_aligned_error
predicted_lddt
structure_module
plddt
aligned_confidence_probs
max_predicted_aligned_error
ptm
iptm
ranking_confidence
However, it seems like fastfold's output does not contain this data
msa
pair
single
sm
final_atom_positions
final_atom_mask
final_affine_tensor
lddt_logits
plddt
distogram_logits
masked_msa_logits
experimentally_resolved_logits
tm_logits
predicted_tm_score
aligned_confidence_probs
predicted_aligned_error
max_predicted_aligned_error
Is it possible to get the iptm scores?
Thanks!
running in multimer mode...
Finished running alignment for 1_1
Finished running alignment for 1_2
Finished running alignment for 1_3
Finished running alignment for 1_4
Traceback (most recent call last):
File "inference.py", line 548, in
main(args)
File "inference.py", line 164, in main
inference_multimer_model(args)
File "inference.py", line 281, in inference_multimer_model
feature_dict = data_processor.process_fasta(
File "/home/fy/FastFold-main/fastfold/data/data_pipeline.py", line 1165, in process_fasta
chain_features = self._process_single_chain(
File "/home/fy/FastFold-main/fastfold/data/data_pipeline.py", line 1114, in _process_single_chain
chain_features = self._monomer_data_pipeline.process_fasta(
File "/home/fy/FastFold-main/fastfold/data/data_pipeline.py", line 936, in process_fasta
hits = self._parse_template_hits(
File "/home/fy/FastFold-main/fastfold/data/data_pipeline.py", line 884, in _parse_template_hits
hits = parsers.parse_hmmsearch_sto(
File "/home/fy/FastFold-main/fastfold/data/parsers.py", line 656, in parse_hmmsearch_sto
template_hits = parse_hmmsearch_a3m(
File "/home/fy/FastFold-main/fastfold/data/parsers.py", line 627, in parse_hmmsearch_a3m
metadata = _parse_hmmsearch_description(hit_description)
File "/home/fy/FastFold-main/fastfold/data/parsers.py", line 589, in _parse_hmmsearch_description
raise ValueError(f'Could not parse description: "{description}".')
ValueError: Could not parse description: "0000|3jqh_A/8-65 [subseq from] mol:protein length:167 C-type lectin domain family 4 member M".
I installed FastFold from a fresh clone according to the instructions in the README and ran the following:
import torch
from fastfold.model.fastnn.kernel import softmax
seq = 2500
h, n, c = 8, 384, 2
dtype = torch.float32
q = torch.rand([seq, h, n, c], device="cuda:0", dtype=dtype, requires_grad=True)
k = torch.rand([seq, h, n, c], device="cuda:0", dtype=dtype, requires_grad=True)
s = softmax(torch.matmul(q, k.transpose(-1, -2)))
print(s)
on an A100 using CUDA version 11.4. This consistently generates the following error:
Traceback (most recent call last):
File "test_softmax.py", line 11, in <module>
print(s)
File "/mnt/home/dberenberg/gustaf_stuff/experiments/softmax/FastFold/lib/conda/envs/fastfold/lib/python3.8/site-packages/torch/_tensor.py", line 305, in __repr__
return torch._tensor_str._str(self)
File "/mnt/home/dberenberg/gustaf_stuff/experiments/softmax/FastFold/lib/conda/envs/fastfold/lib/python3.8/site-packages/torch/_tensor_str.py", line 434, in _str
return _str_intern(self)
File "/mnt/home/dberenberg/gustaf_stuff/experiments/softmax/FastFold/lib/conda/envs/fastfold/lib/python3.8/site-packages/torch/_tensor_str.py", line 409, in _str_intern
tensor_str = _tensor_str(self, indent)
File "/mnt/home/dberenberg/gustaf_stuff/experiments/softmax/FastFold/lib/conda/envs/fastfold/lib/python3.8/site-packages/torch/_tensor_str.py", line 264, in _tensor_str
formatter = _Formatter(get_summarized_data(self) if summarize else self)
File "/mnt/home/dberenberg/gustaf_stuff/experiments/softmax/FastFold/lib/conda/envs/fastfold/lib/python3.8/site-packages/torch/_tensor_str.py", line 100, in __init__
nonzero_finite_vals = torch.masked_select(tensor_view, torch.isfinite(tensor_view) & tensor_view.ne(0))
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
It can be made to work by decreasing seq
. For my particular setup, the threshold seems to be around 2000. Setting CUDA_LAUNCH_BLOCKING
doesn't seem to do anything. Any tips for getting this working for tensors as large as [5120, 8, 384, 384], which appear in the AlphaFold pipeline?
Thanks for making this available, just a few comments on #installation
I've been trying to get the CONDA environment to work on my systems,
If I use the default environment I get the following errors:
Colossalai should be built with cuda extension to use the FP16 optimizer
If you want to activate cuda mode for MoE, please install with cuda_ext!
/proj/berzelius-2021-29/users/x_arnel/.conda/envs/FastFold/lib/python3.8/site-packages/torch/autocast_mode.py:162: UserWarning: User provided device_type of 'cuda', but CUDA is not available. Disabling
warnings.warn('User provided device_type of \'cuda\', but CUDA is not available. Disabling')
I guess this is due to that the PyPI version of colossalai is not compiled with CUDA, see https://colossalai.org/download
Instead I had to do a postinstall:
pip install colossalai==0.1.6+torch1.11cu11.3 -f https://release.colossalai.org
(perhaps this can be included in the .yml file ?)
but that created problems so I had to recompile which works after some minor tweaks (wrong nvcc paths and environmental variables,)
The Dockerfile and environment.yml pin OpenMM to 7.5.1. That's an old release that isn't supported anymore. Could they be updated to the latest release, or alternatively could the pin be removed? I don't think any code changes are needed, although there are some deprecated module names that could be updated to avoid a deprecation warning. The patch in openmm.patch
also isn't needed anymore. That change has been merged upstream.
hi, I try different dap_size, such as dap_size=2, dap_size=4, but with the increase of dap, the decrease of gpu memory is not obvious, have you tried this?
Maybe some merge and check
fastfold.np.relax
-> fastfold.relax
fastfold.np.residue_constants
-> fastfold.common.residue_constants
fastfold.np.protein
-> fastfold.common.protein
Hi,
on our cluster the current method to generate the path for the ray directory is not working. It seems that os.getlogin() does not result in a meaningful value.
storage_dir = "file:///tmp/ray/" + os.getlogin() + "/workflow_data"
I get back an OS error that the device does not exist.
After removing os.getlogin() the workflow initiates properly.
Would it be possible to define the ray-storage dir via a flag like the ouput dir?
Thank you and all the best,
Dominik
hi, i met a parse error when executing hmmsearch in pdb_seqres.txt. the log is:
(WorkflowManagementActor pid=7511) RuntimeError: hmmsearch failed:
(WorkflowManagementActor pid=7511) stdout:
(WorkflowManagementActor pid=7511) # hmmsearch :: search profile(s) against a sequence database
(WorkflowManagementActor pid=7511) # HMMER 3.3.2 (Nov 2020); http://hmmer.org/
(WorkflowManagementActor pid=7511) # Copyright (C) 2020 Howard Hughes Medical Institute.
(WorkflowManagementActor pid=7511) # Freely distributed under the BSD open source license.
(WorkflowManagementActor pid=7511) # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
(WorkflowManagementActor pid=7511) # query HMM file: /tmp/tmph1wus5n3/query.hmm
(WorkflowManagementActor pid=7511) # target sequence database: /uniprot/pdb_seqres.txt
(WorkflowManagementActor pid=7511) # MSA of all hits saved to file: ./alignments/5ZNG_1|Chain A|NBS-LRR type protein|Oryza sativa subsp. japonica (39947)/hmm_output.sto
(WorkflowManagementActor pid=7511) # show alignments in output: no
(WorkflowManagementActor pid=7511) # sequence reporting threshold: E-value <= 100
(WorkflowManagementActor pid=7511) # domain reporting threshold: E-value <= 100
(WorkflowManagementActor pid=7511) # sequence inclusion threshold: E-value <= 100
(WorkflowManagementActor pid=7511) # domain inclusion threshold: E-value <= 100
(WorkflowManagementActor pid=7511) # MSV filter P threshold: <= 0.1
(WorkflowManagementActor pid=7511) # Vit filter P threshold: <= 0.1
(WorkflowManagementActor pid=7511) # Fwd filter P threshold: <= 0.1
(WorkflowManagementActor pid=7511) # number of worker threads: 12
(WorkflowManagementActor pid=7511) # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
(WorkflowManagementActor pid=7511)
(WorkflowManagementActor pid=7511) Query: query [M=137]
(WorkflowManagementActor pid=7511)
(WorkflowManagementActor pid=7511)
(WorkflowManagementActor pid=7511) stderr:
(WorkflowManagementActor pid=7511) Parse failed (sequence file /uniprot/pdb_seqres.txt):
(WorkflowManagementActor pid=7511) Line 1366526: illegal character 0
Can you provide init training and finetune training settings?
model config, data config, not just table1 in the paper.
e.g. whether use struct model? whether enable template? whether use extra msa?
thx.
This line should be:
return grad_input.contiguous(), None, grad_bias, None
Here is the full error when I installing from cloned repo
python setup.py install
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/package/_directory_reader.py:17: UserWarning: Failed to initialize NumPy: numpy.core.multiarray failed to import (Triggered internally at ../torch/csrc/utils/tensor_numpy.cpp:68.)
_dtype_to_storage = {data_type(0).dtype: data_type for data_type in _storages}
torch.version = 1.10.0+cu111
Compiling cuda extensions with
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Tue_Sep_15_19:10:02_PDT_2020
Cuda compilation tools, release 11.1, V11.1.74
Build cuda_11.1.TC455_06.29069683_0
from /usr/local/cuda-11.1/bin
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/setuptools/dist.py:717: UserWarning: Usage of dash-separated 'index-url' will not be supported in future versions. Please use the underscore name 'index_url' instead
warnings.warn(
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/setuptools/dist.py:487: UserWarning: Normalizing '0.1.0-beta' to '0.1.0b0'
warnings.warn(tmpl.format(**locals()))
running install
running bdist_egg
running egg_info
writing fastfold.egg-info/PKG-INFO
writing dependency_links to fastfold.egg-info/dependency_links.txt
writing requirements to fastfold.egg-info/requires.txt
writing top-level names to fastfold.egg-info/top_level.txt
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/utils/cpp_extension.py:381: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend.
warnings.warn(msg.format('we could not find ninja.'))
reading manifest file 'fastfold.egg-info/SOURCES.txt'
adding license file 'LICENSE'
writing manifest file 'fastfold.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_py
running build_ext
building 'fastfold_layer_norm_cuda' extension
gcc -pthread -B /root/anaconda3/envs/fastfold/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/data/FastFold/fastfold/model/kernel/cuda_native/csrc/include -I/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include -I/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/TH -I/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda-11.1/include -I/root/anaconda3/envs/fastfold/include/python3.8 -c fastfold/model/kernel/cuda_native/csrc/layer_norm_cuda.cpp -o build/temp.linux-x86_64-3.8/fastfold/model/kernel/cuda_native/csrc/layer_norm_cuda.o -O3 -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DVERSION_GE_1_5 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -DTORCH_EXTENSION_NAME=fastfold_layer_norm_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
/usr/local/cuda-11.1/bin/nvcc -I/data/FastFold/fastfold/model/kernel/cuda_native/csrc/include -I/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include -I/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/TH -I/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda-11.1/include -I/root/anaconda3/envs/fastfold/include/python3.8 -c fastfold/model/kernel/cuda_native/csrc/layer_norm_cuda_kernel.cu -o build/temp.linux-x86_64-3.8/fastfold/model/kernel/cuda_native/csrc/layer_norm_cuda_kernel.o -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -O3 --use_fast_math -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DVERSION_GE_1_5 -std=c++14 -maxrregcount=50 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -DTORCH_EXTENSION_NAME=fastfold_layer_norm_cuda -D_GLIBCXX_USE_CXX11_ABI=0
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h: In instantiation of ‘std::shared_ptrtorch::nn::Module torch::nn::Cloneable::clone(const c10::optionalc10::Device&) const [with Derived = torch::nn::CrossMapLRN2dImpl]’:
/tmp/tmpxft_000042ce_00000000-6_layer_norm_cuda_kernel.compute_80.cudafe1.stub.c:70:27: required from here
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:57:59: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_string, at::Tensor>&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:69:61: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >’ to type ‘torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h: In instantiation of ‘std::shared_ptrtorch::nn::Module torch::nn::Cloneable::clone(const c10::optionalc10::Device&) const [with Derived = torch::nn::EmbeddingBagImpl]’:
/tmp/tmpxft_000042ce_00000000-6_layer_norm_cuda_kernel.compute_80.cudafe1.stub.c:70:27: required from here
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:57:59: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_string, at::Tensor>&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:69:61: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >’ to type ‘torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h: In instantiation of ‘std::shared_ptrtorch::nn::Module torch::nn::Cloneable::clone(const c10::optionalc10::Device&) const [with Derived = torch::nn::EmbeddingImpl]’:
/tmp/tmpxft_000042ce_00000000-6_layer_norm_cuda_kernel.compute_80.cudafe1.stub.c:70:27: required from here
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:57:59: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_string, at::Tensor>&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:69:61: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >’ to type ‘torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h: In instantiation of ‘std::shared_ptrtorch::nn::Module torch::nn::Cloneable::clone(const c10::optionalc10::Device&) const [with Derived = torch::nn::ParameterDictImpl]’:
/tmp/tmpxft_000042ce_00000000-6_layer_norm_cuda_kernel.compute_80.cudafe1.stub.c:70:27: required from here
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:57:59: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_string, at::Tensor>&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:69:61: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >’ to type ‘torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h: In instantiation of ‘std::shared_ptrtorch::nn::Module torch::nn::Cloneable::clone(const c10::optionalc10::Device&) const [with Derived = torch::nn::SequentialImpl]’:
/tmp/tmpxft_000042ce_00000000-6_layer_norm_cuda_kernel.compute_80.cudafe1.stub.c:70:27: required from here
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:57:59: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_string, at::Tensor>&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:69:61: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >’ to type ‘torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h: In instantiation of ‘std::shared_ptrtorch::nn::Module torch::nn::Cloneable::clone(const c10::optionalc10::Device&) const [with Derived = torch::nn::ModuleListImpl]’:
/tmp/tmpxft_000042ce_00000000-6_layer_norm_cuda_kernel.compute_80.cudafe1.stub.c:70:27: required from here
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:57:59: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_string, at::Tensor>&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:69:61: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >’ to type ‘torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h: In instantiation of ‘std::shared_ptrtorch::nn::Module torch::nn::Cloneable::clone(const c10::optionalc10::Device&) const [with Derived = torch::nn::ModuleDictImpl]’:
/tmp/tmpxft_000042ce_00000000-6_layer_norm_cuda_kernel.compute_80.cudafe1.stub.c:70:27: required from here
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:57:59: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_string, at::Tensor>&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:69:61: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >’ to type ‘torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h: In instantiation of ‘std::shared_ptrtorch::nn::Module torch::nn::Cloneable::clone(const c10::optionalc10::Device&) const [with Derived = torch::nn::TransformerDecoderImpl]’:
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/optim/sgd.h:49:48: required from here
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:57:59: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_string, at::Tensor>&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:69:61: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >’ to type ‘torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h: In instantiation of ‘std::shared_ptrtorch::nn::Module torch::nn::Cloneable::clone(const c10::optionalc10::Device&) const [with Derived = torch::nn::TransformerEncoderImpl]’:
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/optim/sgd.h:49:48: required from here
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:57:59: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_string, at::Tensor>&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:69:61: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >’ to type ‘torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h: In instantiation of ‘std::shared_ptrtorch::nn::Module torch::nn::Cloneable::clone(const c10::optionalc10::Device&) const [with Derived = torch::nn::TransformerDecoderLayerImpl]’:
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/optim/sgd.h:49:48: required from here
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:57:59: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_string, at::Tensor>&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:69:61: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >’ to type ‘torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h: In instantiation of ‘std::shared_ptrtorch::nn::Module torch::nn::Cloneable::clone(const c10::optionalc10::Device&) const [with Derived = torch::nn::TransformerEncoderLayerImpl]’:
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/optim/sgd.h:49:48: required from here
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:57:59: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_string, at::Tensor>&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:69:61: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >’ to type ‘torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h: In instantiation of ‘std::shared_ptrtorch::nn::Module torch::nn::Cloneable::clone(const c10::optionalc10::Device&) const [with Derived = torch::nn::GroupNormImpl]’:
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/optim/sgd.h:49:48: required from here
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:57:59: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_string, at::Tensor>&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:69:61: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >’ to type ‘torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h: In instantiation of ‘std::shared_ptrtorch::nn::Module torch::nn::Cloneable::clone(const c10::optionalc10::Device&) const [with Derived = torch::nn::LocalResponseNormImpl]’:
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/optim/sgd.h:49:48: required from here
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:57:59: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_string, at::Tensor>&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:69:61: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >’ to type ‘torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h: In instantiation of ‘std::shared_ptrtorch::nn::Module torch::nn::Cloneable::clone(const c10::optionalc10::Device&) const [with Derived = torch::nn::LayerNormImpl]’:
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/optim/sgd.h:49:48: required from here
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:57:59: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_string, at::Tensor>&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:69:61: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >’ to type ‘torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h: In instantiation of ‘std::shared_ptrtorch::nn::Module torch::nn::Cloneable::clone(const c10::optionalc10::Device&) const [with Derived = torch::nn::MultiheadAttentionImpl]’:
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/optim/sgd.h:49:48: required from here
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:57:59: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_string, at::Tensor>&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:69:61: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >’ to type ‘torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h: In instantiation of ‘std::shared_ptrtorch::nn::Module torch::nn::Cloneable::clone(const c10::optionalc10::Device&) const [with Derived = torch::nn::ThresholdImpl]’:
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/optim/sgd.h:49:48: required from here
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:57:59: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_string, at::Tensor>&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:69:61: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >’ to type ‘torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h: In instantiation of ‘std::shared_ptrtorch::nn::Module torch::nn::Cloneable::clone(const c10::optionalc10::Device&) const [with Derived = torch::nn::LogSoftmaxImpl]’:
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/optim/sgd.h:49:48: required from here
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:57:59: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_string, at::Tensor>&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:69:61: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >’ to type ‘torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h: In instantiation of ‘std::shared_ptrtorch::nn::Module torch::nn::Cloneable::clone(const c10::optionalc10::Device&) const [with Derived = torch::nn::SoftminImpl]’:
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/optim/sgd.h:49:48: required from here
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:57:59: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_string, at::Tensor>&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:69:61: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >’ to type ‘torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h: In instantiation of ‘std::shared_ptrtorch::nn::Module torch::nn::Cloneable::clone(const c10::optionalc10::Device&) const [with Derived = torch::nn::SoftmaxImpl]’:
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/optim/sgd.h:49:48: required from here
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:57:59: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_string, at::Tensor>&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:69:61: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >’ to type ‘torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h: In instantiation of ‘std::shared_ptrtorch::nn::Module torch::nn::Cloneable::clone(const c10::optionalc10::Device&) const [with Derived = torch::nn::GRUCellImpl]’:
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/optim/sgd.h:49:48: required from here
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:57:59: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_string, at::Tensor>&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:69:61: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >’ to type ‘torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h: In instantiation of ‘std::shared_ptrtorch::nn::Module torch::nn::Cloneable::clone(const c10::optionalc10::Device&) const [with Derived = torch::nn::LSTMCellImpl]’:
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/optim/sgd.h:49:48: required from here
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:57:59: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_string, at::Tensor>&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:69:61: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >’ to type ‘torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h: In instantiation of ‘std::shared_ptrtorch::nn::Module torch::nn::Cloneable::clone(const c10::optionalc10::Device&) const [with Derived = torch::nn::RNNCellImpl]’:
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/optim/sgd.h:49:48: required from here
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:57:59: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_string, at::Tensor>&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:69:61: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >’ to type ‘torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h: In instantiation of ‘std::shared_ptrtorch::nn::Module torch::nn::Cloneable::clone(const c10::optionalc10::Device&) const [with Derived = torch::nn::GRUImpl]’:
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/optim/sgd.h:49:48: required from here
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:57:59: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_string, at::Tensor>&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:69:61: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >’ to type ‘torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h: In instantiation of ‘std::shared_ptrtorch::nn::Module torch::nn::Cloneable::clone(const c10::optionalc10::Device&) const [with Derived = torch::nn::LSTMImpl]’:
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/optim/sgd.h:49:48: required from here
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:57:59: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_string, at::Tensor>&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:69:61: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >’ to type ‘torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h: In instantiation of ‘std::shared_ptrtorch::nn::Module torch::nn::Cloneable::clone(const c10::optionalc10::Device&) const [with Derived = torch::nn::RNNImpl]’:
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/optim/sgd.h:49:48: required from here
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:57:59: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_string, at::Tensor>&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:69:61: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >’ to type ‘torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h: In instantiation of ‘std::shared_ptrtorch::nn::Module torch::nn::Cloneable::clone(const c10::optionalc10::Device&) const [with Derived = torch::nn::FractionalMaxPool3dImpl]’:
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/optim/sgd.h:49:48: required from here
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:57:59: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_string, at::Tensor>&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:69:61: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >’ to type ‘torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h: In instantiation of ‘std::shared_ptrtorch::nn::Module torch::nn::Cloneable::clone(const c10::optionalc10::Device&) const [with Derived = torch::nn::FractionalMaxPool2dImpl]’:
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/optim/sgd.h:49:48: required from here
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:57:59: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_string, at::Tensor>&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:69:61: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >’ to type ‘torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h: In instantiation of ‘std::shared_ptrtorch::nn::Module torch::nn::Cloneable::clone(const c10::optionalc10::Device&) const [with Derived = torch::nn::ZeroPad2dImpl]’:
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/optim/sgd.h:49:48: required from here
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:57:59: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_string, at::Tensor>&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:69:61: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >’ to type ‘torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h: In instantiation of ‘std::shared_ptrtorch::nn::Module torch::nn::Cloneable::clone(const c10::optionalc10::Device&) const [with Derived = torch::nn::UnfoldImpl]’:
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/optim/sgd.h:49:48: required from here
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:57:59: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_string, at::Tensor>&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:69:61: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >’ to type ‘torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h: In instantiation of ‘std::shared_ptrtorch::nn::Module torch::nn::Cloneable::clone(const c10::optionalc10::Device&) const [with Derived = torch::nn::FoldImpl]’:
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/optim/sgd.h:49:48: required from here
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:57:59: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_string, at::Tensor>&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:69:61: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >’ to type ‘torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h: In instantiation of ‘std::shared_ptrtorch::nn::Module torch::nn::Cloneable::clone(const c10::optionalc10::Device&) const [with Derived = torch::nn::ConvTranspose3dImpl]’:
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/optim/sgd.h:49:48: required from here
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:57:59: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_string, at::Tensor>&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:69:61: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >’ to type ‘torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h: In instantiation of ‘std::shared_ptrtorch::nn::Module torch::nn::Cloneable::clone(const c10::optionalc10::Device&) const [with Derived = torch::nn::ConvTranspose2dImpl]’:
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/optim/sgd.h:49:48: required from here
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:57:59: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_string, at::Tensor>&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:69:61: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >’ to type ‘torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h: In instantiation of ‘std::shared_ptrtorch::nn::Module torch::nn::Cloneable::clone(const c10::optionalc10::Device&) const [with Derived = torch::nn::ConvTranspose1dImpl]’:
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/optim/sgd.h:49:48: required from here
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:57:59: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_string, at::Tensor>&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:69:61: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >’ to type ‘torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h: In instantiation of ‘std::shared_ptrtorch::nn::Module torch::nn::Cloneable::clone(const c10::optionalc10::Device&) const [with Derived = torch::nn::Conv3dImpl]’:
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/optim/sgd.h:49:48: required from here
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:57:59: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_string, at::Tensor>&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:69:61: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >’ to type ‘torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h: In instantiation of ‘std::shared_ptrtorch::nn::Module torch::nn::Cloneable::clone(const c10::optionalc10::Device&) const [with Derived = torch::nn::Conv2dImpl]’:
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/optim/sgd.h:49:48: required from here
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:57:59: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_string, at::Tensor>&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:69:61: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >’ to type ‘torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h: In instantiation of ‘std::shared_ptrtorch::nn::Module torch::nn::Cloneable::clone(const c10::optionalc10::Device&) const [with Derived = torch::nn::Conv1dImpl]’:
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/optim/sgd.h:49:48: required from here
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:57:59: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_string, at::Tensor>&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:69:61: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >’ to type ‘torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h: In instantiation of ‘std::shared_ptrtorch::nn::Module torch::nn::Cloneable::clone(const c10::optionalc10::Device&) const [with Derived = torch::nn::AdaptiveLogSoftmaxWithLossImpl]’:
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/optim/sgd.h:49:48: required from here
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:57:59: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_string, at::Tensor>&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:69:61: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >’ to type ‘torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h: In instantiation of ‘std::shared_ptrtorch::nn::Module torch::nn::Cloneable::clone(const c10::optionalc10::Device&) const [with Derived = torch::nn::BilinearImpl]’:
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/optim/sgd.h:49:48: required from here
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:57:59: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_string, at::Tensor>&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:69:61: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >’ to type ‘torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h: In instantiation of ‘std::shared_ptrtorch::nn::Module torch::nn::Cloneable::clone(const c10::optionalc10::Device&) const [with Derived = torch::nn::UnflattenImpl]’:
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/optim/sgd.h:49:48: required from here
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:57:59: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_string, at::Tensor>&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:69:61: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >’ to type ‘torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h: In instantiation of ‘std::shared_ptrtorch::nn::Module torch::nn::Cloneable::clone(const c10::optionalc10::Device&) const [with Derived = torch::nn::LinearImpl]’:
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/optim/sgd.h:49:48: required from here
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:57:59: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_string, at::Tensor>&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:69:61: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >’ to type ‘torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >&’
error: command '/usr/local/cuda-11.1/bin/nvcc' failed with exit status 1
Dear author:
I run Fastfold in a 4 GPU device, each GPU have an 24GiB memory。
I run inference.py with an fasta lenght 1805AA (without triton
), with parameter --gpus 3
and the error prints like:
RuntimeError: CUDA out of memory. Tried to allocate 29.26 GiB (GPU 0; 23.70 GiB total capacity; 9.63 GiB already allocated; 11.79 GiB free; 10.65 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
my questions is:
why there are only one GPU(GPU0 but not GPU0, GPU1,GPU2) used to calculate total memory? what should I do to get over this?
Is there a way to run an extremely long fasta files, like 4000AA?
appriciate your reply, thankyou.
When trying to install the requirements, pip failed trying to install ColossalAi and it seems it's no longer available. When trying to install ColoassalAi directly from ColossalAI, there's a mismatch in pytorch versioning.
Traceback (most recent call last):
File "~FastFold/inference.py", line 527, in <module>
main(args)
File "~FastFold/inference.py", line 153, in main
inference_multimer_model(args)
File "~FastFold/inference.py", line 268, in inference_multimer_model
feature_dict = data_processor.process_fasta(
File "~FastFold/fastfold/data/data_pipeline.py", line 1165, in process_fasta
chain_features = self._process_single_chain(
File "~FastFold/fastfold/data/data_pipeline.py", line 1114, in _process_single_chain
chain_features = self._monomer_data_pipeline.process_fasta(
File "~FastFold/fastfold/data/data_pipeline.py", line 942, in process_fasta
template_features = make_template_features(
File "~FastFold/fastfold/data/data_pipeline.py", line 76, in make_template_features
templates_result = template_featurizer.get_templates(
File "~FastFold/fastfold/data/templates.py", line 1163, in get_templates
result = _process_single_hit(
File "~FastFold/fastfold/data/templates.py", line 885, in _process_single_hit
"%s_%s (sum_probs: %.2f, rank: %d): feature extracting errors: "
TypeError: must be real number, not NoneType
Since the update of alphafold v2.3, the upstream alphafold/data/templates.py
has changed. The lines linked below are now required to avoid this error
This line does not seem correct:
row_d_input = 0;
row_d_input is a pointer, after it is zeroed we write to some bullshit location.
I also encountered this problem. https://github.com/hpcaitech/FastFold/issues/106
I get the following error message:
`
running in multimer mode...
Traceback (most recent call last):
File "/public/home/FastFold/lib/conda/envs/fastfold/lib/python3.8/site-packages/fastfold/data/templates.py", line 859, in _process_single_hit
features, realign_warning = _extract_template_features(
File "/public/home/FastFold/lib/conda/envs/fastfold/lib/python3.8/site-packages/fastfold/data/templates.py", line 651, in _extract_template_features
raise TemplateAtomMaskAllZerosError(
fastfold.data.templates.TemplateAtomMaskAllZerosError: Template all atom mask was all zeros: 6qx9_A1. Residue range: 49-100
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "inference.py", line 546, in
main(args)
File "inference.py", line 165, in main
inference_multimer_model(args)
File "inference.py", line 280, in inference_multimer_model
feature_dict = data_processor.process_fasta(
File "/public/home/FastFold/lib/conda/envs/fastfold/lib/python3.8/site-packages/fastfold/data/data_pipeline.py", line 1165, in process_fasta
chain_features = self._process_single_chain(
File "/public/home/FastFold/lib/conda/envs/fastfold/lib/python3.8/site-packages/fastfold/data/data_pipeline.py", line 1114, in _process_single_chain
chain_features = self._monomer_data_pipeline.process_fasta(
File "/public/home/FastFold/lib/conda/envs/fastfold/lib/python3.8/site-packages/fastfold/data/data_pipeline.py", line 942, in process_fasta
template_features = make_template_features(
File "/public/home/FastFold/lib/conda/envs/fastfold/lib/python3.8/site-packages/fastfold/data/data_pipeline.py", line 76, in make_template_features
templates_result = template_featurizer.get_templates(
File "/public/home/FastFold/lib/conda/envs/fastfold/lib/python3.8/site-packages/fastfold/data/templates.py", line 1167, in get_templates
result = _process_single_hit(
File "/public/home/FastFold/lib/conda/envs/fastfold/lib/python3.8/site-packages/fastfold/data/templates.py", line 888, in process_single_hit
"%s%s (sum_probs: %.2f, rank: %d): feature extracting errors: "
TypeError: must be real number, not NoneType
`
this here is the input file:
input.txt
In the fastfold/data/templates.py file,I modified the 884th line warning=None to solve this problem.
Hi, I saw you upload inference.py. I thought that it can support the inference on multi-gpu. So i wonder how to set the parameter on "--model-device". Thanks so much.
How can i fix this error? I ran the command: torchrun --nproc_per_node=1 perf.py --msa-length 128 --res-length 256. Then the following error appeared.
The versions of Pytorch, Python, and CUDA are 1.10, 3.8, and 11.3, respectively.
Training in distributed mode with multiple processes, 1 GPU per process. Process 0, total 1.
initialize tensor model parallel with size 1
initialize data parallel with size 1
Traceback (most recent call last):
File "perf.py", line 191, in
main()
File "perf.py", line 156, in main
layer_inputs = attn_layers[lyr_idx].forward(*layer_inputs, node_mask, pair_mask)
File "/root/miniconda3/envs/myconda/lib/python3.8/site-packages/fastfold-0.1.0b0-py3.8-linux-x86_64.egg/fastfold/model/evoformer.py", line 17, in forward
node = self.msa_stack(node, pair, node_mask)
File "/root/miniconda3/envs/myconda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/root/miniconda3/envs/myconda/lib/python3.8/site-packages/fastfold-0.1.0b0-py3.8-linux-x86_64.egg/fastfold/model/msa.py", line 99, in forward
node = self.MSARowAttentionWithPairBias(node, pair, node_mask_row)
File "/root/miniconda3/envs/myconda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/root/miniconda3/envs/myconda/lib/python3.8/site-packages/fastfold-0.1.0b0-py3.8-linux-x86_64.egg/fastfold/model/msa.py", line 43, in forward
Z = self.layernormZ(Z)
File "/root/miniconda3/envs/myconda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/root/miniconda3/envs/myconda/lib/python3.8/site-packages/fastfold-0.1.0b0-py3.8-linux-x86_64.egg/fastfold/model/kernel/cuda_native/layer_norm.py", line 69, in forward
return FusedLayerNormAffineFunction.apply(input, self.weight, self.bias,
File "/root/miniconda3/envs/myconda/lib/python3.8/site-packages/fastfold-0.1.0b0-py3.8-linux-x86_64.egg/fastfold/model/kernel/cuda_native/layer_norm.py", line 22, in forward
output, mean, invvar = fastfold_layer_norm_cuda.forward_affine(
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 132) of binary: /root/miniconda3/envs/myconda/bin/python
Traceback (most recent call last):
File "/root/miniconda3/envs/myconda/bin/torchrun", line 33, in
sys.exit(load_entry_point('torch==1.10.0+cu113', 'console_scripts', 'torchrun')())
File "/root/miniconda3/envs/myconda/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 345, in wrapper
return f(*args, **kwargs)
File "/root/miniconda3/envs/myconda/lib/python3.8/site-packages/torch/distributed/run.py", line 719, in main
run(args)
File "/root/miniconda3/envs/myconda/lib/python3.8/site-packages/torch/distributed/run.py", line 710, in run
elastic_launch(
File "/root/miniconda3/envs/myconda/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/root/miniconda3/envs/myconda/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 259, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
perf.py FAILED
hi, I see how to use dap is described in the readme, as follows,
from fastfold.distributed import init_dap
torch.distributed.init_process_group(backend='nccl', init_method='env://')
init_dap(args.dap_size)
I want to know is it possible to use dap instead of deepspeed in openfold
Dear author:
I try to test Fastfold, after followed the Installation Using Conda, (i think there are no command to test for a successful installation)
I run inference.py with the following code:
#################################
conda activate fastfold
python /home/FastFold/inference.py used.fasta /database/alphafold2-data/pdb_mmcif/mmcif_files/
--output_dir /mydir/output
--cpus 80
--gpus 3
--param_path /database/alphafold2-data/params/params_model_1.npz
--uniref90_database_path /database/alphafold2-data/uniref90/uniref90.fasta
--mgnify_database_path /database/alphafold2-data/mgnify/mgy_clusters_2018_12.fa
--pdb70_database_path /database/alphafold2-data/pdb70/pdb70
--uniclust30_database_path /database/alphafold2-data/uniclust30/uniclust30_2018_08/uniclust30_2018_08
--bfd_database_path /database/alphafold2-data/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt
--jackhmmer_binary_path /home/Software/miniconda3/envs/fastfold/bin/jackhmmer
--hhblits_binary_path /home/Software/miniconda3/envs/fastfold/bin/hhblits
--hhsearch_binary_path /home/Software/miniconda3/envs/fastfold/bin/hhsearch
--kalign_binary_path /home/Software/miniconda3/envs/fastfold/bin/kalign
#################################
It seems right at the jackhmmer→hhsearch→jackhmmer→hhblits steps
then I meet error print as follow:
I woundering what they hints and what should i do to run fastfold properly?
##########error message##################
/tmp/tmp4wm30exa/main.c:2:10: fatal error: cuda.h: No such file or directory
2 | #include "cuda.h"
| ^~~~~~~~
/tmp/tmp65558a3s/main.c:2:10: fatal error: cuda.h: No such file or directory
2 | #include "cuda.h"
| ^~~~~~~~
compilation terminated.
compilation terminated.
Traceback (most recent call last):
File "/home/FastFold/inference.py", line 513, in
main(args)
File "/home/FastFold/inference.py", line 150, in main
inference_monomer_model(args)
File "/home/FastFold/inference.py", line 415, in inference_monomer_model
torch.multiprocessing.spawn(inference_model, nprocs=args.gpus, args=(args.gpus, result_q, batch, args))
File "/home/Software/miniconda3/envs/fastfold/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 240, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/home/Software/miniconda3/envs/fastfold/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 198, in start_processes
while not context.join():
File "/home/Software/miniconda3/envs/fastfold/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 160, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:
-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "", line 21, in _layer_norm_fwd_fused
KeyError: ('2-.-0-.-0-d82511111ad128294e9d31a6ac684238-7929002797455b30efce6e41eddc6b57-3aa563e00c5c695dd945e23b09a86848-bb0203f280ee2aaa28bc6e4eff4090f3-ff946bd4b3b4a4cbdf8cedc6e1c658e0-5c5e32ff210f3b7f56c98ca29917c25e-06f0df2d61979d629033f4a22eff5198-0dd03b0bd512a184b3512b278d9dfa59-d35ab04ae841e2714a253c523530b071', (torch.float32, torch.float32, torch.float32, torch.float32, torch.float32, torch.float32, 'i32', 'i32', 'fp32'), (256,), (True, True, True, True, True, True, (True, False), (True, False), (False,)))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/Software/miniconda3/envs/fastfold/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 69, in _wrap
fn(i, *args)
File "/home/FastFold/inference.py", line 135, in inference_model
out = model(batch)
File "/home/Software/miniconda3/envs/fastfold/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "/home/FastFold/fastfold/model/hub/alphafold.py", line 507, in forward
outputs, m_1_prev, z_prev, x_prev = self.iteration(
File "/home/FastFold/fastfold/model/hub/alphafold.py", line 232, in iteration
m_1_prev, z_prev = self.recycling_embedder(
File "/home/Software/miniconda3/envs/fastfold/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "/home/FastFold/fastfold/model/fastnn/ops.py", line 1097, in forward
m_update = self.layer_norm_m(m)
File "/home/Software/miniconda3/envs/fastfold/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "/home/FastFold/fastfold/model/fastnn/kernel/layer_norm.py", line 52, in forward
return self.kernel_forward(input)
File "/home/FastFold/fastfold/model/fastnn/kernel/layer_norm.py", line 56, in kernel_forward
return LayerNormTritonFunc.apply(input, self.normalized_shape, self.weight, self.bias,
File "/home/FastFold/fastfold/model/fastnn/kernel/triton/layer_norm.py", line 164, in forward
_layer_norm_fwd_fused[(M,)](
File "/home/triton/python/triton/runtime/jit.py", line 106, in launcher
return self.run(*args, grid=grid, **kwargs)
File "", line 41, in _layer_norm_fwd_fused
File "/home/triton/python/triton/compiler.py", line 1239, in compile
so = _build(fn.name, src_path, tmpdir)
File "/home/triton/python/triton/compiler.py", line 1169, in _build
ret = subprocess.check_call(cc_cmd)
File "/home/Software/miniconda3/envs/fastfold/lib/python3.8/subprocess.py", line 364, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['/usr/bin/gcc', '/tmp/tmp65558a3s/main.c', '-O3', '-I/usr/local/cuda/include', '-I/home/Software/miniconda3/envs/fastfold/include/python3.8', '-I/tmp/tmp65558a3s', '-shared', '-fPIC', '-lcuda', '-o', '/tmp/tmp65558a3s/_layer_norm_fwd_fused.cpython-38-x86_64-linux-gnu.so', '-L/usr/lib/x86_64-linux-gnu']' returned non-zero exit status 1.
Hey there! I just wanted to flag that we recently fixed a bunch of bugs in OpenFold's template code (as of OpenFold commit #591d10d), which I noticed are also present here. These have a significant effect on the quality of predictions, especially for chains with small MSAs.
Hi! I installed the conda environment according to the content in the READMD, and then wrote a script to infer the protein structure. The content of the script is as follows. After submitting the script, the following error will be reported. Please help to find out what caused it?
content of script:
#DSUB -q root.default
#DSUB -R 'cpu=12;gpu=2;mem=96000'
#DSUB -l wuhanG5500
#DSUB -N 1
#DSUB -e %J.out
#DSUB -o %J.out
module load anaconda/2020.11
module load cuda/11.5.0-gcc-4.8.5-atd
module load gcc/8.3.0-gcc-4.8.5-cpp
source activate fastfold
af2Root=/home/bingxing2/public/alphafold2.1.1
torchrun --nproc_per_node=2 ./inference.py multi.fasta $af2Root/pdb_mmcif/mmcif_files \
--output_dir ./out \
--model_name model_1 \
--param_path $af2Root/params/params_model_1_multimer.npz \
--cpus 2 \
--uniref90_database_path $af2Root/uniref90/uniref90.fasta \
--mgnify_database_path $af2Root/mgnify/mgy_clusters.fa \
--pdb70_database_path $af2Root/pdb70/pdb70 \
--uniclust30_database_path $af2Root/uniclust30/uniclust30_2018_08/uniclust30_2018_08 \
--bfd_database_path $af2Root/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \
--jackhmmer_binary_path `which jackhmmer` \
--hhblits_binary_path `which hhblits` \
--hhsearch_binary_path `which hhsearch` \
--kalign_binary_path `which kalign`
error:
WARNING:torch.distributed.run:
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
*****************************************
[06/29/2022 01:17:31 AM] INFO colossalai - colossalai - INFO: /home/bingxing
2/gpuuser001/.conda/envs/fastfold/lib/python3.
8/site-packages/colossalai/context/parallel_co
ntext.py:519 set_device
INFO colossalai - colossalai - INFO: process rank 1
is bound to device 1
[06/29/2022 01:17:31 AM] INFO colossalai - colossalai - INFO: /home/bingxing
2/gpuuser001/.conda/envs/fastfold/lib/python3.
8/site-packages/colossalai/context/parallel_co
ntext.py:519 set_device
INFO colossalai - colossalai - INFO: process rank 0
is bound to device 0
[06/29/2022 01:17:33 AM] INFO colossalai - colossalai - INFO: /home/bingxing
2/gpuuser001/.conda/envs/fastfold/lib/python3.
8/site-packages/colossalai/context/parallel_co
ntext.py:555 set_seed
INFO colossalai - colossalai - INFO: initialized
seed on rank 1, numpy: 1024, python random:
1024, ParallelMode.DATA: 1024,
ParallelMode.TENSOR: 1025,the default parallel
seed is ParallelMode.DATA.
[06/29/2022 01:17:33 AM] INFO colossalai - colossalai - INFO: /home/bingxing
2/gpuuser001/.conda/envs/fastfold/lib/python3.
8/site-packages/colossalai/context/parallel_co
ntext.py:555 set_seed
INFO colossalai - colossalai - INFO: initialized
seed on rank 0, numpy: 1024, python random:
1024, ParallelMode.DATA: 1024,
ParallelMode.TENSOR: 1024,the default parallel
seed is ParallelMode.DATA.
INFO colossalai - colossalai - INFO: /home/bingxing
2/gpuuser001/.conda/envs/fastfold/lib/python3.
8/site-packages/colossalai/initialize.py:112
launch
INFO colossalai - colossalai - INFO: Distributed
environment is initialized, data parallel
size: 1, pipeline parallel size: 1, tensor
parallel size: 2
Traceback (most recent call last):
File "./inference.py", line 266, in <module>
main(args)Traceback (most recent call last):
File "./inference.py", line 266, in <module>
File "./inference.py", line 82, in main
main(args)
import_jax_weights_(model, args.param_path, version=args.model_name) File "./inference.py", line 82, in main
import_jax_weights_(model, args.param_path, version=args.model_name)
File "/home/bingxing2/gpuuser001/zhou/FastFold/fastfold/utils/import_weights.py", line 445, in import_jax_weights_
File "/home/bingxing2/gpuuser001/zhou/FastFold/fastfold/utils/import_weights.py", line 445, in import_jax_weights_
assert len(incorrect) == 0
AssertionErrorassert len(incorrect) == 0
AssertionError
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 13987) of binary: /home/bingxing2/gpuuser001/.conda/envs/fastfold/bin/python
Traceback (most recent call last):
File "/home/bingxing2/gpuuser001/.conda/envs/fastfold/bin/torchrun", line 33, in <module>
sys.exit(load_entry_point('torch==1.11.0', 'console_scripts', 'torchrun')())
File "/home/bingxing2/gpuuser001/.conda/envs/fastfold/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper
return f(*args, **kwargs)
File "/home/bingxing2/gpuuser001/.conda/envs/fastfold/lib/python3.8/site-packages/torch/distributed/run.py", line 724, in main
run(args)
File "/home/bingxing2/gpuuser001/.conda/envs/fastfold/lib/python3.8/site-packages/torch/distributed/run.py", line 715, in run
elastic_launch(
File "/home/bingxing2/gpuuser001/.conda/envs/fastfold/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/bingxing2/gpuuser001/.conda/envs/fastfold/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
./inference.py FAILED
------------------------------------------------------------
Failures:
[1]:
time : 2022-06-29_01:17:42
host : gpu09
rank : 1 (local_rank: 1)
exitcode : 1 (pid: 13988)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
time : 2022-06-29_01:17:42
host : gpu09
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 13987)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================
Hi,
Can i use fastfold to run just inference?
for example, I dont need jackmmer to run the msa search, I already have my msa search result (.a3m file) ready. I also have GPU available for the model inference, how should i implement with fastfold? what arguments should be there? Thanks
Hi, maybe this line
FastFold/fastfold/model/hub/alphafold.py
Line 321 in f55dca9
Hi, I stumbled over an issue when using monomer_ptm model. The function LinearParams used here
FastFold/fastfold/utils/import_weights.py
Line 566 in f55dca9
is defined in a different scope (L144) and cannot be accessed in this place.
Hello, How can i fix this error? When I ran the command, and the command is ok in my view: torchrun --nproc_per_node=4 inference.py ../testFasta/PsCrtW-HpCrtZ.fasta ../databases/pdb_mmcif/mmcif_files/ --output_dir ./output_4gpu --uniref90_database_path ../databases/uniref90/uniref90.fasta --mgnify_database_path ../databases/mgnify/mgy_clusters_2018_12.fa --pdb70_database_path ../databases/pdb70/pdb70 --param_path ../databases/params/params_model_1.npz --model_name model_1 --cpus 24 --uniclust30_database_path ../databases/uniclust30/uniclust30_2018_08/uniclust30_2018_08 --bfd_database_path ../databases/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt --jackhmmer_binary_path which jackhmmer
--hhblits_binary_path which hhblits
--hhsearch_binary_path which hhsearch
--kalign_binary_path which kalign
[07/25/22 10:45:46] INFO colossalai - colossalai - INFO: /anaconda/envs/fastfold_py38/lib/python3.8/site-packa
ges/colossalai/context/parallel_context.py:521 set_device
INFO colossalai - colossalai - INFO: process rank 2 is bound to device 2
[07/25/22 10:45:46] INFO colossalai - colossalai - INFO: /anaconda/envs/fastfold_py38/lib/python3.8/site-packa
ges/colossalai/context/parallel_context.py:521 set_device
INFO colossalai - colossalai - INFO: process rank 0 is bound to device 0
[07/25/22 10:45:46] INFO colossalai - colossalai - INFO: /anaconda/envs/fastfold_py38/lib/python3.8/site-packa
ges/colossalai/context/parallel_context.py:521 set_device
[07/25/22 10:45:46] INFO colossalai - colossalai - INFO: /anaconda/envs/fastfold_py38/lib/python3.8/site-packa
ges/colossalai/context/parallel_context.py:521 set_device
INFO colossalai - colossalai - INFO: process rank 1 is bound to device 1
INFO colossalai - colossalai - INFO: process rank 3 is bound to device 3
[07/25/22 10:45:50] INFO colossalai - colossalai - INFO: /anaconda/envs/fastfold_py38/lib/python3.8/site-packa
ges/colossalai/context/parallel_context.py:557 set_seed
INFO colossalai - colossalai - INFO: initialized seed on rank 0, numpy: 1024, python
random: 1024, ParallelMode.DATA: 1024, ParallelMode.TENSOR: 1024,the default parallel
seed is ParallelMode.DATA.
[07/25/22 10:45:50] INFO colossalai - colossalai - INFO: /anaconda/envs/fastfold_py38/lib/python3.8/site-packa
ges/colossalai/context/parallel_context.py:557 set_seed
INFO colossalai - colossalai - INFO:
/anaconda/envs/fastfold_py38/lib/python3.8/site-packages/colossalai/initialize.py:117
launch
INFO colossalai - colossalai - INFO: initialized seed on rank 1, numpy: 1024, python
random: 1024, ParallelMode.DATA: 1024, ParallelMode.TENSOR: 1025,the default parallel
seed is ParallelMode.DATA.
INFO colossalai - colossalai - INFO: Distributed environment is initialized, data parallel
size: 1, pipeline parallel size: 1, tensor parallel size: 4
[07/25/22 10:45:50] INFO colossalai - colossalai - INFO: /anaconda/envs/fastfold_py38/lib/python3.8/site-packa
ges/colossalai/context/parallel_context.py:557 set_seed
INFO colossalai - colossalai - INFO: initialized seed on rank 2, numpy: 1024, python
random: 1024, ParallelMode.DATA: 1024, ParallelMode.TENSOR: 1026,the default parallel
seed is ParallelMode.DATA.
[07/25/22 10:45:50] INFO colossalai - colossalai - INFO: /anaconda/envs/fastfold_py38/lib/python3.8/site-packa
ges/colossalai/context/parallel_context.py:557 set_seed
INFO colossalai - colossalai - INFO: initialized seed on rank 3, numpy: 1024, python
random: 1024, ParallelMode.DATA: 1024, ParallelMode.TENSOR: 1027,the default parallel
seed is ParallelMode.DATA.
Generating features...
[07/25/22 10:46:12] INFO colossalai - root - INFO: Launching subprocess
"/anaconda/envs/fastfold_py38/bin/jackhmmer -o /dev/null -A
/tmp/tmphz8bnmj6/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --incE 0.0001
-E 0.0001 --cpu 24 -N 1 ./output_4gpu/tmp.fasta ../databases/uniref90/uniref90.fasta"
INFO colossalai - root - INFO: Started Jackhmmer (uniref90.fasta) query
[07/25/22 11:11:02] INFO colossalai - root - INFO: Finished Jackhmmer (uniref90.fasta) query in 1489.951
seconds
INFO colossalai - root - INFO: Launching subprocess
"/anaconda/envs/fastfold_py38/bin/hhsearch -i /tmp/tmp8xdycgi4/query.a3m -o
/tmp/tmp8xdycgi4/output.hhr -maxseq 1000000 -cpu 24 -d ../databases/pdb70/pdb70"
INFO colossalai - root - INFO: Started HHsearch query
[07/25/22 11:11:43] INFO colossalai - root - INFO: Finished HHsearch query in 41.348 seconds
[07/25/22 11:11:44] INFO colossalai - root - INFO: Launching subprocess
"/anaconda/envs/fastfold_py38/bin/jackhmmer -o /dev/null -A
/tmp/tmpnul0i_bi/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --incE 0.0001
-E 0.0001 --cpu 24 -N 1 ./output_4gpu/tmp.fasta
../databases/mgnify/mgy_clusters_2018_12.fa"
INFO colossalai - root - INFO: Started Jackhmmer (mgy_clusters_2018_12.fa) query
[E ProcessGroupNCCL.cpp:719] [Rank 2] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=3, OpType=BROADCAST, Timeout(ms)=1800000) ran for 1804531 milliseconds before timing out.
[E ProcessGroupNCCL.cpp:719] [Rank 3] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=3, OpType=BROADCAST, Timeout(ms)=1800000) ran for 1804731 milliseconds before timing out.
[E ProcessGroupNCCL.cpp:719] [Rank 1] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=3, OpType=BROADCAST, Timeout(ms)=1800000) ran for 1804734 milliseconds before timing out.
[E ProcessGroupNCCL.cpp:406] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplete data. To avoid this inconsistency, we are taking the entire process down.
[E ProcessGroupNCCL.cpp:406] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplete data. To avoid this inconsistency, we are taking the entire process down.
[E ProcessGroupNCCL.cpp:406] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplete data. To avoid this inconsistency, we are taking the entire process down.
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 21309 closing signal SIGTERM
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -6) local_rank: 1 (pid: 21310) of binary: /anaconda/envs/fastfold_py38/bin/python
Traceback (most recent call last):
File "/anaconda/envs/fastfold_py38/bin/torchrun", line 33, in <module>
sys.exit(load_entry_point('torch==1.11.0', 'console_scripts', 'torchrun')())
File "/anaconda/envs/fastfold_py38/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper
return f(*args, **kwargs)
File "/anaconda/envs/fastfold_py38/lib/python3.8/site-packages/torch/distributed/run.py", line 724, in main
run(args)
File "/anaconda/envs/fastfold_py38/lib/python3.8/site-packages/torch/distributed/run.py", line 715, in run
elastic_launch(
File "/anaconda/envs/fastfold_py38/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
File "/anaconda/envs/fastfold_py38/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
======================================================
inference.py FAILED
------------------------------------------------------
Failures:
[1]:
time : 2022-07-25_11:16:19
host : localhost
rank : 2 (local_rank: 2)
exitcode : -6 (pid: 21311)
error_file: <N/A>
traceback : Signal 6 (SIGABRT) received by PID 21311
[2]:
time : 2022-07-25_11:16:19
host : localhost
rank : 3 (local_rank: 3)
exitcode : -6 (pid: 21312)
error_file: <N/A>
traceback : Signal 6 (SIGABRT) received by PID 21312
------------------------------------------------------
Root Cause (first observed failure):
[0]:
time : 2022-07-25_11:16:19
host : localhost
rank : 1 (local_rank: 1)
exitcode : -6 (pid: 21310)
error_file: <N/A>
traceback : Signal 6 (SIGABRT) received by PID 21310
Hi,
I try to run inference scripts using T1050.fasta(779 residues) on nvidia a100, the following table shows the in time.
Type | Inference time |
---|---|
FastFold | 262.14789s |
OpenFold | 124.69651s |
It's a little different from the data in the paper. I don't know whether it's normal.
Thanks.
Hi,
i am running the latest release of fastfold using this command:
python inference.py input.fa database/pdb_mmcif/mmcif_files/ \
--output_dir output/ \
--gpus 1 \
--model_preset multimer \
--uniref90_database_path database/uniref90/uniref90.fasta \
--mgnify_database_path database/mgnify/mgy_clusters_2018_12.fa \
--pdb70_database_path database/pdb70/pdb70 \
--uniclust30_database_path database/uniclust30/uniclust30_2018_08/uniclust30_2018_08 \
--bfd_database_path database/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \
--uniprot_database_path database/uniprot/uniprot_sprot.fasta \
--pdb_seqres_database_path database/pdb_seqres/pdb_seqres.txt \
--param_path database/params/params_model_1_multimer_v2.npz \
--model_name model_1_multimer_v2 \
--jackhmmer_binary_path `which jackhmmer` \
--hhblits_binary_path `which hhblits` \
--hhsearch_binary_path `which hhsearch` \
--kalign_binary_path `which kalign`
I came across a problem during the template generation. I get the following error message:
[11/03/22 14:41:37] INFO colossalai - root - INFO: Invalid resolution format: ['.']
INFO colossalai - root - INFO: Found an exact template match 6v8o_I.
Traceback (most recent call last):
File "/scratch-cbe/users/handler/fastfold/FastFold/fastfold/data/templates.py", line 859, in _process_single_hit
features, realign_warning = _extract_template_features(
File "/scratch-cbe/users/handler/fastfold/FastFold/fastfold/data/templates.py", line 651, in _extract_template_features
raise TemplateAtomMaskAllZerosError(
fastfold.data.templates.TemplateAtomMaskAllZerosError: Template all atom mask was all zeros: 6v8o_I. Residue range: 415-475
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "inference.py", line 513, in <module>
main(args)
File "inference.py", line 148, in main
inference_multimer_model(args)
File "inference.py", line 263, in inference_multimer_model
feature_dict = data_processor.process_fasta(
File "/scratch-cbe/users/handler/fastfold/FastFold/fastfold/data/data_pipeline.py", line 1165, in process_fasta
chain_features = self._process_single_chain(
File "/scratch-cbe/users/handler/fastfold/FastFold/fastfold/data/data_pipeline.py", line 1114, in _process_single_chain
chain_features = self._monomer_data_pipeline.process_fasta(
File "/scratch-cbe/users/handler/fastfold/FastFold/fastfold/data/data_pipeline.py", line 942, in process_fasta
template_features = make_template_features(
File "/scratch-cbe/users/handler/fastfold/FastFold/fastfold/data/data_pipeline.py", line 76, in make_template_features
templates_result = template_featurizer.get_templates(
File "/scratch-cbe/users/handler/fastfold/FastFold/fastfold/data/templates.py", line 1163, in get_templates
result = _process_single_hit(
File "/scratch-cbe/users/handler/fastfold/FastFold/fastfold/data/templates.py", line 885, in _process_single_hit
"%s_%s (sum_probs: %.2f, rank: %d): feature extracting errors: "
TypeError: must be real number, not NoneType
The lines in the hhblits output corresponding to the ID raising the error are:
hmm_output.sto:#=GS 6v8o_I/416-476 DE [subseq from] mol:protein length:557 Chromatin structure-remodeling complex protein RSC8
hmm_output.sto:6v8o_I/416-476 -----EISEKYIEESQAIIQEL.VKLTMEKLESKF.TKLCDLETQlEMEKLKYVKES..eK.M.lN...D....RLSLS-....--------......-.--------..--....------------------------------------------------------------------------------------
hmm_output.sto:#=GR 6v8o_I/416-476 PP .....56799************.************.**9998865378888876554..24.4.25...5....65555.........................................................................................................................
hmm_output.sto:6v8o_I/416-476 -----
hmm_output.sto:#=GR 6v8o_I/416-476 PP .....
Is this a bug or is the problem here on my side.
Another question, is it correct to change the name of the model to params_model_1_multimer_v2.npz? In your readme you use params_model_1_multimer.npz but this is not included in the downloaded parameter tar file.
All the best and thank you,
Dominik
when i try to inference multimer sequence,i met the error after finishing running alignment and searching for templates:
Traceback (most recent call last):
File "inference.py", line 513, in
main(args)
File "inference.py", line 148, in main
inference_multimer_model(args)
File "inference.py", line 268, in inference_multimer_model
processed_feature_dict = feature_processor.process_features(
File "/home/FastFold/fastfold/data/feature_pipeline.py", line 124, in process_features
return np_example_to_features(
File "/home/FastFold/fastfold/data/feature_pipeline.py", line 96, in np_example_to_features
features = input_pipeline_multimer.process_tensors_from_config(
File "/home/FastFold/fastfold/data/input_pipeline_multimer.py", line 107, in process_tensors_from_config
tensors = compose(nonensembled)(tensors)
File "/home/FastFold/fastfold/data/data_transforms.py", line 76, in
return lambda x: f(x, *args, **kwargs)
File "/home/FastFold/fastfold/data/input_pipeline_multimer.py", line 124, in compose
x = f(x)
File "/home/FastFold/fastfold/data/data_transforms_multimer.py", line 298, in make_msa_profile
batch['msa_mask'][..., None],
KeyError: 'msa_mask'
i’ve tried several sequences,but met the same error . The sequence is
7M5F_1|Chain A|CdiI|Serratia marcescens (615)
MKEIKLMADYHCYPLWGTTPDDFGDISPDELPISLGLKNSLEAWAKRYDAILNTDDPALSGFKSVEEEKLFIDDGYKLAELLQEELGSAYKVIYHADY
7M5F_2|Chain B[auth C]|Toxin CdiA|Serratia marcescens (615)
MHHHHHHENLYFQSNAAKNSLTTKSLFKEMTIQGIKFTPENVVGAAKDNSGKIIFLEKGNSKSGLQHIVEEHGDQFAQIGVSEARIPDVVMKAVTDGKIVGYQGAGAGRPIYETMIDGKKYNIAVTVGSNGYVVGANLRGSVK
Hi!
I am running a few test cases and encountered some problems that I think are related to processing the templates.
I have a working example that led to a pdb output:
python /global/scratch/users/skyungyong/Software/FastFold/inference.py --output_dir ./ --model_preset multimer --use_precomputed_alignments Alignments --enable_workflow --inplace --param_path /global/scratch/users/skyungyong/Software/FastFold/data/params/params_model_1_multimer_v3.npz -model_name model_1_multimer AT3G18790-AT3G18790.fasta /global/scratch/users/skyungyong/Software/alphafold-multimer-v2.2.2-080922/Database/pdb_mmcif/mmcif_files/
WARNING:root:Triton is not available, fallback to old kernel.
WARNING:root:Triton is not available, fallback to old kernel.
WARNING:root:Triton is not available, fallback to old kernel.
running in multimer mode...
WARNING:root:Triton is not available, fallback to old kernel.
WARNING:root:Triton is not available, fallback to old kernel.
WARNING:root:Triton is not available, fallback to old kernel.
[02/21/23 20:04:34] INFO colossalai - colossalai - INFO: /global/scratch/users/skyungyong/Software/anaconda3/envs/fastfold/lib/python3.8/site-packages/colossalai/context/parallel_context.py:521 set_device
INFO colossalai - colossalai - INFO: process rank 0 is bound to device 0
[02/21/23 20:04:35] INFO colossalai - colossalai - INFO: /global/scratch/users/skyungyong/Software/anaconda3/envs/fastfold/lib/python3.8/site-packages/colossalai/context/parallel_context.py:557 set_seed
INFO colossalai - colossalai - INFO: initialized seed on rank 0, numpy: 1024, python random: 1024, ParallelMode.DATA: 1024, ParallelMode.TENSOR: 1024,the default parallel seed is ParallelMode.DATA.
INFO colossalai - colossalai - INFO: /global/scratch/users/skyungyong/Software/anaconda3/envs/fastfold/lib/python3.8/site-packages/colossalai/initialize.py:116 launch
INFO colossalai - colossalai - INFO: Distributed environment is initialized, data parallel size: 1, pipeline parallel size: 1, tensor parallel size: 1
Inference time: 144.00418956391513
These are the some of the runs that produced error messages
python /global/scratch/users/skyungyong/Software/FastFold/inference.py --output_dir ./ --model_preset multimer --use_precomputed_alignments Alignments --enable_workflow --inplace --param_path /global/scratch/users/skyungyong/Software/FastFold/data/params/params_model_1_multimer_v3.npz --model_name model_1_multimer AT1G23170-AT1G23170.fasta /global/scratch/users/skyungyong/Software/alphafold-multimer-v2.2.2-080922/Database/pdb_mmcif/mmcif_files/
WARNING:root:Triton is not available, fallback to old kernel.
WARNING:root:Triton is not available, fallback to old kernel.
WARNING:root:Triton is not available, fallback to old kernel.
running in multimer mode...
WARNING:root:Triton is not available, fallback to old kernel.
WARNING:root:Triton is not available, fallback to old kernel.
WARNING:root:Triton is not available, fallback to old kernel.
[02/21/23 20:16:03] INFO colossalai - colossalai - INFO: /global/scratch/users/skyungyong/Software/anaconda3/envs/fastfold/lib/python3.8/site-packages/colossalai/context/parallel_context.py:521 set_device
INFO colossalai - colossalai - INFO: process rank 0 is bound to device 0
[02/21/23 20:16:12] INFO colossalai - colossalai - INFO: /global/scratch/users/skyungyong/Software/anaconda3/envs/fastfold/lib/python3.8/site-packages/colossalai/context/parallel_context.py:557 set_seed
INFO colossalai - colossalai - INFO: initialized seed on rank 0, numpy: 1024, python random: 1024, ParallelMode.DATA: 1024, ParallelMode.TENSOR: 1024,the default parallel seed is ParallelMode.DATA.
INFO colossalai - colossalai - INFO: /global/scratch/users/skyungyong/Software/anaconda3/envs/fastfold/lib/python3.8/site-packages/colossalai/initialize.py:116 launch
INFO colossalai - colossalai - INFO: Distributed environment is initialized, data parallel size: 1, pipeline parallel size: 1, tensor parallel size: 1
Traceback (most recent call last):
File "/global/scratch/users/skyungyong/Software/FastFold/inference.py", line 548, in <module>
main(args)
File "/global/scratch/users/skyungyong/Software/FastFold/inference.py", line 164, in main
inference_multimer_model(args)
File "/global/scratch/users/skyungyong/Software/FastFold/inference.py", line 293, in inference_multimer_model
torch.multiprocessing.spawn(inference_model, nprocs=args.gpus, args=(args.gpus, result_q, batch, args))
File "/global/scratch/users/skyungyong/Software/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 240, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/global/scratch/users/skyungyong/Software/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 198, in start_processes
while not context.join():
File "/global/scratch/users/skyungyong/Software/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 160, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:
-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "/global/scratch/users/skyungyong/Software/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 69, in _wrap
fn(i, *args)
File "/global/scratch/users/skyungyong/Software/FastFold/inference.py", line 151, in inference_model
out = model(batch)
File "/global/scratch/users/skyungyong/Software/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/global/scratch/users/skyungyong/Software/FastFold/fastfold/model/hub/alphafold.py", line 522, in forward
outputs, m_1_prev, z_prev, x_prev = self.iteration(
File "/global/scratch/users/skyungyong/Software/FastFold/fastfold/model/hub/alphafold.py", line 270, in iteration
template_embeds = self.template_embedder(
File "/global/scratch/users/skyungyong/Software/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/global/scratch/users/skyungyong/Software/FastFold/fastfold/model/fastnn/embedders_multimer.py", line 368, in forward
self.template_single_embedder(
File "/global/scratch/users/skyungyong/Software/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/global/scratch/users/skyungyong/Software/FastFold/fastfold/model/fastnn/embedders_multimer.py", line 238, in forward
all_atom_multimer.compute_chi_angles(
File "/global/scratch/users/skyungyong/Software/FastFold/fastfold/utils/all_atom_multimer.py", line 441, in compute_chi_angles
chi_angle_atoms_mask = torch.prod(chi_angle_atoms_mask, dim=-1)
RuntimeError: CUDA driver error: invalid argument
python /global/scratch/users/skyungyong/Software/FastFold/inference.py --output_dir ./ --model_preset multimer --use_precomputed_alignments Alignments --enable_workflow --inplace --param_path /global/scratch/users/skyungyong/Software/FastFold/data/params/params_model_1_multimer_v3.npz --model_name model_1_multimer AT1G13220-AT1G13220.fasta /global/scratch/users/skyungyong/Software/alphafold-multimer-v2.2.2-080922/Database/pdb_mmcif/mmcif_files/
WARNING:root:Triton is not available, fallback to old kernel.
WARNING:root:Triton is not available, fallback to old kernel.
WARNING:root:Triton is not available, fallback to old kernel.
running in multimer mode...
Traceback (most recent call last):
File "/global/scratch/users/skyungyong/Software/FastFold/fastfold/data/templates.py", line 859, in _process_single_hit
features, realign_warning = _extract_template_features(
File "/global/scratch/users/skyungyong/Software/FastFold/fastfold/data/templates.py", line 651, in _extract_template_features
raise TemplateAtomMaskAllZerosError(
fastfold.data.templates.TemplateAtomMaskAllZerosError: Template all atom mask was all zeros: 6zmi_CE. Residue range: 4-304
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/global/scratch/users/skyungyong/Software/FastFold/inference.py", line 548, in <module>
main(args)
File "/global/scratch/users/skyungyong/Software/FastFold/inference.py", line 164, in main
inference_multimer_model(args)
File "/global/scratch/users/skyungyong/Software/FastFold/inference.py", line 281, in inference_multimer_model
feature_dict = data_processor.process_fasta(
File "/global/scratch/users/skyungyong/Software/FastFold/fastfold/data/data_pipeline.py", line 1165, in process_fasta
chain_features = self._process_single_chain(
File "/global/scratch/users/skyungyong/Software/FastFold/fastfold/data/data_pipeline.py", line 1114, in _process_single_chain
chain_features = self._monomer_data_pipeline.process_fasta(
File "/global/scratch/users/skyungyong/Software/FastFold/fastfold/data/data_pipeline.py", line 942, in process_fasta
template_features = make_template_features(
File "/global/scratch/users/skyungyong/Software/FastFold/fastfold/data/data_pipeline.py", line 76, in make_template_features
templates_result = template_featurizer.get_templates(
File "/global/scratch/users/skyungyong/Software/FastFold/fastfold/data/templates.py", line 1166, in get_templates
result = _process_single_hit(
File "/global/scratch/users/skyungyong/Software/FastFold/fastfold/data/templates.py", line 888, in _process_single_hit
"%s_%s (sum_probs: %.2f, rank: %d): feature extracting errors: "
TypeError: must be real number, not NoneType
Will these be due to problems associated with the homologous templates themselves, or would there be a fix for this?
Thank you!
Last commit moves loss.py . This script still references old location.
FastFold/fastfold/relax/amber_minimize.py
Line 26 in ceee81d
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.