hpcaitech / fastfold Goto Github PK

View Code? Open in Web Editor NEW

557.0 17.0 86.0 1.29 MB

Optimizing AlphaFold Training and Inference on GPU Clusters

License: Apache License 2.0

Python 91.74% C 0.02% Cuda 4.18% C++ 2.15% Shell 1.88% Dockerfile 0.03%

alphafold2 protein-structure pytorch gpu evoformer parallelism protein-folding cuda habana-gaudi

fastfold's People

Contributors

Stargazers

Watchers

Forkers

binmakeswell yuxuan-lou fastalgo chivee ver217 sze-qq liujuncheng helena2021lc linghuichen superxiang tron-x xinchunran fangliangji linhduongtuan hutao965 webclinic017 weiplanet qiuzhuang aerinko082 mengyanyuan luxpal khatvangi joskid shenghusang seanren666 daasin foss-archives dongcf jnbai517 sailfish009 ii0 marcus-arcadius plasmas ycsos robotcator xgreat8 fazziekey jeteveux researchmore asclepiusinformatica oahzxl clvnmng kashyapchhatbar leozhao-intel bkbonde rnaimehaom gy-lu double-vin ai-framwork xiaojun207 00mjk hbcbh1999 techthiyanes ai-and-ml camille7777 lawchingman oceantalk isuyu mcale6 nikolahuang fruitboy1226 tiandiao123 kiri2002 lifanwww jangocheng williamlisci wurining 5l1v3r1 applib-sg applib-sg-1009 findbug2019 chuhan-ouyang lihuibng strategist922 ccoulombe kimbioinfostudio fengeuler zalaivankov tzhang-nmdp carolzhangxy machinelearningsystem conglesolutionx

fastfold's Issues

Confidence metrics

It looks like that "predicted aligned error" and pTM are not included in the prediction_results dict in case of openfold/fastfold. Is this planned to be added?

Illegal memory access error after get_chi_atom_indices

I tried to predict a 5 subunit complex (in total ~5000 aa) and get the following error with various settings (1-4x A100 80GB, w/ and w/o --inplace, w/ and w/o --chunk_size 1-32). The error seems to be associated with exceeding the GPU memory and I am not sure if this is normal at the given sequence length and available GPU memory. I installed fastfold from the recent commit 930a58a into a clean conda environment and built triton from source. For a smaller complex (~2000 aa) it ran without errors.

terminate called after throwing an instance of 'c10::Error'
  what():  NCCL error in: /opt/conda/conda-bld/pytorch_1659484810403/work/torch/csrc/distributed/c10d/NCCLUtils.hpp:173, unhandled cuda error, NCCL version 2.10.3
Process Group destroyed on rank 1
Exception raised from ncclCommAbort at /opt/conda/conda-bld/pytorch_1659484810403/work/torch/csrc/distributed/c10d/NCCLUtils.hpp:173 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x57 (0x7f43cf264497 in .../fastfold/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x7f43cf23bc94 in .../fastfold/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #2: <unknown function> + 0x19ea61 (0x7f44092e2a61 in .../fastfold/lib/python3.8/site-packages/torch/lib/libtorch_cuda_cpp.so)
frame #3: c10d::ProcessGroupNCCL::~ProcessGroupNCCL() + 0x118 (0x7f44092c6098 in .../fastfold/lib/python3.8/site-packages/torch/lib/libtorch_cuda_cpp.so)
frame #4: c10d::ProcessGroupNCCL::~ProcessGroupNCCL() + 0x9 (0x7f44092c6369 in .../fastfold/lib/python3.8/site-packages/torch/lib/libtorch_cuda_cpp.so)
frame #5: <unknown function> + 0x9d7799 (0x7f440f4fd799 in .../fastfold/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #6: <unknown function> + 0x354732 (0x7f440ee7a732 in .../fastfold/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #7: <unknown function> + 0x3555ff (0x7f440ee7b5ff in .../fastfold/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #8: <unknown function> + 0x116878 (0x55a0881ca878 in .../fastfold/bin/python3)
frame #9: <unknown function> + 0x11699d (0x55a0881ca99d in .../fastfold/bin/python3)
frame #10: <unknown function> + 0x1fd471 (0x55a0882b1471 in .../fastfold/bin/python3)
frame #11: <unknown function> + 0x10e937 (0x55a0881c2937 in .../fastfold/bin/python3)
frame #12: _PyGC_CollectNoFail + 0x2b (0x55a0882b134b in .../fastfold/bin/python3)
frame #13: PyImport_Cleanup + 0x371 (0x55a0882b11b1 in .../fastfold/bin/python3)
frame #14: Py_FinalizeEx + 0x7a (0x55a0882aff9a in .../fastfold/bin/python3)
frame #15: Py_Exit + 0x8 (0x55a0881454bc in .../fastfold/bin/python3)
frame #16: <unknown function> + 0x9141b (0x55a08814541b in .../fastfold/bin/python3)
frame #17: <unknown function> + 0x910ee (0x55a0881450ee in .../fastfold/bin/python3)
frame #18: PyRun_SimpleStringFlags + 0x4a (0x55a088141f12 in .../fastfold/bin/python3)
frame #19: Py_RunMain + 0x27b (0x55a0882abc1b in .../fastfold/bin/python3)
frame #20: Py_BytesMain + 0x39 (0x55a088283619 in .../fastfold/bin/python3)
frame #21: __libc_start_main + 0xf5 (0x7f444b239555 in /lib64/libc.so.6)
frame #22: <unknown function> + 0x1cf525 (0x55a088283525 in .../fastfold/bin/python3)

Traceback (most recent call last):
  File ".../.../FastFold/inference.py", line 519, in <module>
    main(args)
  File ".../.../FastFold/inference.py", line 149, in main
    inference_multimer_model(args)
  File ".../.../FastFold/inference.py", line 282, in inference_multimer_model
    torch.multiprocessing.spawn(inference_model, nprocs=args.gpus, args=(args.gpus, result_q, batch, args))
  File ".../fastfold/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 240, in spawn
    return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
  File ".../fastfold/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 198, in start_processes
    while not context.join():
  File ".../fastfold/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 160, in join
    raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedExceptio

-- Process 0 terminated with the following error:
Traceback (most recent call last):
  File ".../fastfold/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 69, in _wrap
    fn(i, *args)
  File ".../.../FastFold/inference.py", line 136, in inference_model
    out = model(batch)
  File ".../fastfold/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File ".../.../FastFold/fastfold/model/hub/alphafold.py", line 507, in forward
    outputs, m_1_prev, z_prev, x_prev = self.iteration(
  File ".../.../FastFold/fastfold/model/hub/alphafold.py", line 264, in iteration
    template_embeds = self.template_embedder(
  File ".../fastfold/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File ".../.../FastFold/fastfold/model/fastnn/embedders_multimer.py", line 351, in forward
    self.template_single_embedder(
  File ".../fastfold/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File ".../.../FastFold/fastfold/model/fastnn/embedders_multimer.py", line 238, in forward
    all_atom_multimer.compute_chi_angles(
  File ".../.../FastFold/fastfold/utils/all_atom_multimer.py", line 403, in compute_chi_angles
    chi_atom_indices = get_chi_atom_indices(aatype.device)
  File ".../.../FastFold/fastfold/utils/all_atom_multimer.py", line 365, in get_chi_atom_indices
    return torch.tensor(chi_atom_indices, device=device)
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

How can I test 5 models at once

hello，I am very happy to see that fastfold has added the multimer function, but I have a problem. When using the monomer function, I still cannot predict 5 models at once. Is there a solution for this?
here is my scripts：

#!/bin/bash
#DSUB --job_type cosched
#DSUB -n fastfold
#DSUB -A root.bingxing2.gpuuser001
#DSUB -q root.default
#DSUB -R 'cpu=12;gpu=2;mem=90000'
#DSUB -l wuhanG5500
#DSUB -N 1
#DSUB -e %J.out
#DSUB -o %J.out
######################查看gpu利用率################################################
STATE_FILE="state_${BATCH_JOB_ID}"
/usr/bin/touch ${STATE_FILE}
function gpus_collection(){
while [[ cat "${STATE_FILE}" | grep "over" | wc -l == "0" ]]; do
/usr/bin/sleep 1
/usr/bin/nvidia-smi >> "gpu_${BATCH_JOB_ID}.log"
done
}
gpus_collection &
#####################AF2计算部分###################################################
module load anaconda/2021.11
module load cuda/11.3.0-gcc-4.8.5-oaa
module load gcc/9.3.0-gcc-4.8.5-bxl
source activate fastfold
af2Root=/home/bingxing2/public

add '--gpus [N]' to use N gpus for inference

add '--enable_workflow' to use parallel workflow for data processing

add '--use_precomputed_alignments [path_to_alignments]' to use precomputed msa

add '--chunk_size [N]' to use chunk to reduce peak memory

add '--inplace' to use inplace to save memory

python inference.py mono.fasta $af2Root/alphafold2.2.0/pdb_mmcif/mmcif_files
--output_dir ./mono_out
--uniref90_database_path $af2Root/uniref90/uniref90.fasta
--mgnify_database_path $af2Root/mgnify/mgy_clusters.fa
--pdb70_database_path $af2Root/pdb70/pdb70
--param_path $af2Root/alphafold2.2.0/params/params_model_1.npz
--uniclust30_database_path $af2Root/uniclust30/uniclust30_2018_08/uniclust30_2018_08
--bfd_database_path $af2Root/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt
--jackhmmer_binary_path which jackhmmer
--hhblits_binary_path which hhblits
--hhsearch_binary_path which hhsearch
--kalign_binary_path which kalign
--gpus 2
--enable_workflow
--chunk_size 1
--inplace
echo "over" >> "${STATE_FILE}"

Is the use of template a must for running FastFold?

Hi, I'd like to run FastFold for a long sequence with no templates. Is this setting supported by FastFold?

I'm building an MSA only from Uniref90 DB for starters.

Error when model parameters from AlphaFold v2.3.0 (multimer_v3.npz) are used

The most recent update from AlphaFold v2.3.0 includes updated parameters

params_model_1_multimer_v3.npz
params_model_2_multimer_v3.npz
params_model_3_multimer_v3.npz
params_model_4_multimer_v3.npz
params_model_5_multimer_v3.npz

Running inference.py using these update parameters (v3) throws the following error. The same command is successful for parameters from previous versions.

Multimer command

python ~/FastFold/inference.py multimer_query.fasta \
        ~/alphafold-2.3.0_data/pdb_mmcif/mmcif_files/ \
        --use_precomputed_alignments ./alignments \
        --output_dir ./multimer_query_fastfold_v3 \
        --gpus 1 --model_preset multimer \
        --uniref90_database_path ~/alphafold-2.3.0_data/uniref90/uniref90.fasta \
        --mgnify_database_path ~/alphafold-2.3.0_data/mgnify/mgy_clusters_2022_05.fa \
        --pdb70_database_path ~/alphafold-2.3.0_data/pdb70/pdb70 \
        --uniclust30_database_path ~/alphafold-2.3.0_data/uniref30/UniRef30_2021_03 \
        --bfd_database_path ~/alphafold-2.3.0_data/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \
        --uniprot_database_path ~/alphafold-2.3.0_data/uniprot/uniprot.fasta \
        --pdb_seqres_database_path ~/alphafold-2.3.0_data/pdb_seqres/pdb_seqres.txt  \
        --param_path ~/alphafold-2.3.0_data/params/params_model_1_multimer_v3.npz \
        --model_name model_1_multimer_v3 \
        --jackhmmer_binary_path `which jackhmmer` \
        --hhblits_binary_path `which hhblits` \
        --hhsearch_binary_path `which hhsearch` \
        --kalign_binary_path `which kalign` \
        --chunk_size 8 --inplace

Error is pasted below

[12/22/22 13:28:14] INFO     colossalai - colossalai - INFO: ~/conda/envs/fastfold/lib/python3.8/site-packages/colossalai/context/parallel_context.py:557
                             set_seed
                    INFO     colossalai - colossalai - INFO: initialized seed on rank 0, numpy: 1024, python random: 1024, ParallelMode.DATA: 1024, ParallelMode.TENSOR: 1024,the default
                             parallel seed is ParallelMode.DATA.
                    INFO     colossalai - colossalai - INFO: ~/conda/envs/fastfold/lib/python3.8/site-packages/colossalai/initialize.py:117 launch
                    INFO     colossalai - colossalai - INFO: Distributed environment is initialized, data parallel size: 1, pipeline parallel size: 1, tensor parallel size: 1
Traceback (most recent call last):
  File "~/FastFold/inference.py", line 513, in <module>
    main(args)
  File "~/FastFold/inference.py", line 148, in main
    inference_multimer_model(args)
  File "~/FastFold/inference.py", line 276, in inference_multimer_model
    torch.multiprocessing.spawn(inference_model, nprocs=args.gpus, args=(args.gpus, result_q, batch, args))
  File "~/conda/envs/fastfold/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 240, in spawn
    return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
  File "~/conda/envs/fastfold/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 198, in start_processes
    while not context.join():
  File "~/conda/envs/fastfold/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 160, in join
    raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:

-- Process 0 terminated with the following error:
Traceback (most recent call last):
  File "~/conda/envs/fastfold/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 69, in _wrap
    fn(i, *args)
  File "~/FastFold/inference.py", line 123, in inference_model
    import_jax_weights_(model, args.param_path, version=args.model_name)
  File "~/FastFold/fastfold/utils/import_weights.py", line 580, in import_jax_weights_
    assert len(incorrect) == 0
AssertionError

No module named 'fastfold_softmax_cuda'

I followed the installation instruction on anaconda but receive an error on fastfold_softmax_cuda. The machine has cuda version 11.6.2 installed.

Colossalai should be built with cuda extension to use the FP16 optimizer                                                                                                                                           
If you want to activate cuda mode for MoE, please install with cuda_ext!                                                                                                                                           
Traceback (most recent call last):                                                                                                                                                                                 
  File "inference.py", line 25, in <module>                                                                                                                                                                        
    from fastfold.model.hub import AlphaFold                                                                                                                                                                       
  File "/scratch/FastFold/fastfold/model/hub/__init__.py", line 1, in <module>                                                                                                                                     
    from .alphafold import AlphaFold                                                                                                                                                                               
  File "/scratch/FastFold/fastfold/model/hub/alphafold.py", line 20, in <module>
    from fastfold.utils.feats import (
  File "/scratch/FastFold/fastfold/utils/__init__.py", line 1, in <module>
    from .inject_fastnn import inject_fastnn
  File "/scratch/FastFold/fastfold/utils/inject_fastnn.py", line 9, in <module>
    from fastfold.model.fastnn import MSAStack, OutProductMean, PairStack
  File "/scratch/FastFold/fastfold/model/fastnn/__init__.py", line 1, in <module>
    from .msa import MSAStack
  File "/scratch/FastFold/fastfold/model/fastnn/msa.py", line 6, in <module>
    from fastfold.model.fastnn.kernel import LayerNorm
  File "/scratch/FastFold/fastfold/model/fastnn/kernel/__init__.py", line 3, in <module>
    from .cuda_native.softmax import softmax, scale_mask_softmax, scale_mask_bias_softmax
  File "/scratch/FastFold/fastfold/model/fastnn/kernel/cuda_native/softmax.py", line 7, in <module>
    fastfold_softmax_cuda = importlib.import_module("fastfold_softmax_cuda")
  File "/share/siegellab/software/kschu/anaconda3/envs/fastfold/lib/python3.8/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
ModuleNotFoundError: No module named 'fastfold_softmax_cuda'
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 2184436) of binary: /share/siegellab/software/kschu/anaconda3/envs/fastfold/bin/python
Traceback (most recent call last):
  File "/share/siegellab/software/kschu/anaconda3/envs/fastfold/bin/torchrun", line 33, in <module>
    sys.exit(load_entry_point('torch==1.11.0', 'console_scripts', 'torchrun')())
  File "/share/siegellab/software/kschu/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper
    return f(*args, **kwargs)
  File "/share/siegellab/software/kschu/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/distributed/run.py", line 724, in main
    run(args)
  File "/share/siegellab/software/kschu/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/distributed/run.py", line 715, in run
    elastic_launch(
  File "/share/siegellab/software/kschu/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/share/siegellab/software/kschu/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
inference.py FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2022-05-04_06:49:56
  host      : kakawa-1
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 2184436)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================

About the memory consumption

Hi, When i read the support information of AlphaFold2, I got confused about the "1.11.8 Reducing the memory consumption". It said that when using the technique called gradient checkpoint, the memory consumption can be reduced to square size from cubic size when training. And when making inference, the set of the chunk of layers can also change the memory from cubic size into square size. I don't know why this can be done? Can anyone give me a hand?

CUDA out of memory problem

Hi，i was tring to test H1044.fasta on NVIDIA A100，when i set the nproc_per_node=2，i met the memory problem as follows：

RuntimeError: CUDA out of memory. Tried to allocate 77.40 GiB (GPU 0; 38.61 GiB total capacity; 13.21 GiB already allocated; 8.81 GiB free; 26.90 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Is that normal？Thanks for your reply ^^

Differences between fastfold and alphfold about databases

Hello.
I need to predict a protein of length 2038aa, alphafold can't satisfy my needs, but I have alphafold's database, can I use it directly for FastFold?

data problem when use dap

I'm confused about dap,

Can the parameter dap_size only take 2? means row and column？
Is the input data complete or do I need to divide the data by dap_size as input？
thanks

Consider leveraging deepspeed

https://github.com/microsoft/DeepSpeed
For even faster training :)

About the inference performance

Your work surely achieved a high performance in training of AlphaFold2. i am wondering about the accuracy of your inference results. Have you made a benchmark of this part with AlphaFold2? Or can you publish your models for us to make a use of.

Save prediction output pickle file

Enable saving .pkl files as they contain pLDDT and pTM values. This information is useful in downstream analysis.

Multimer model scores

Hi!

Alphafold's multimer model produces iptm scores that are saved within the pickle file as below:

distogram
experimentally_resolved
masked_msa
predicted_aligned_error
predicted_lddt
structure_module
plddt
aligned_confidence_probs
max_predicted_aligned_error
ptm
iptm
ranking_confidence

However, it seems like fastfold's output does not contain this data

msa
pair
single
sm
final_atom_positions
final_atom_mask
final_affine_tensor
lddt_logits
plddt
distogram_logits
masked_msa_logits
experimentally_resolved_logits
tm_logits
predicted_tm_score
aligned_confidence_probs
predicted_aligned_error
max_predicted_aligned_error

Is it possible to get the iptm scores?

Thanks!

problem with multimer mode, ValueError: Could not parse description:

running in multimer mode...
Finished running alignment for 1_1
Finished running alignment for 1_2
Finished running alignment for 1_3
Finished running alignment for 1_4
Traceback (most recent call last):
File "inference.py", line 548, in
main(args)
File "inference.py", line 164, in main
inference_multimer_model(args)
File "inference.py", line 281, in inference_multimer_model
feature_dict = data_processor.process_fasta(
File "/home/fy/FastFold-main/fastfold/data/data_pipeline.py", line 1165, in process_fasta
chain_features = self._process_single_chain(
File "/home/fy/FastFold-main/fastfold/data/data_pipeline.py", line 1114, in _process_single_chain
chain_features = self._monomer_data_pipeline.process_fasta(
File "/home/fy/FastFold-main/fastfold/data/data_pipeline.py", line 936, in process_fasta
hits = self._parse_template_hits(
File "/home/fy/FastFold-main/fastfold/data/data_pipeline.py", line 884, in _parse_template_hits
hits = parsers.parse_hmmsearch_sto(
File "/home/fy/FastFold-main/fastfold/data/parsers.py", line 656, in parse_hmmsearch_sto
template_hits = parse_hmmsearch_a3m(
File "/home/fy/FastFold-main/fastfold/data/parsers.py", line 627, in parse_hmmsearch_a3m
metadata = _parse_hmmsearch_description(hit_description)
File "/home/fy/FastFold-main/fastfold/data/parsers.py", line 589, in _parse_hmmsearch_description
raise ValueError(f'Could not parse description: "{description}".')
ValueError: Could not parse description: "0000|3jqh_A/8-65 [subseq from] mol:protein length:167 C-type lectin domain family 4 member M".

CUDA error (Illegal memory access)

I installed FastFold from a fresh clone according to the instructions in the README and ran the following:

import torch
from fastfold.model.fastnn.kernel import softmax


seq = 2500
h, n, c = 8, 384, 2
dtype = torch.float32
q = torch.rand([seq, h, n, c], device="cuda:0", dtype=dtype, requires_grad=True)
k = torch.rand([seq, h, n, c], device="cuda:0", dtype=dtype, requires_grad=True)
s = softmax(torch.matmul(q, k.transpose(-1, -2)))
print(s)

on an A100 using CUDA version 11.4. This consistently generates the following error:

Traceback (most recent call last):
  File "test_softmax.py", line 11, in <module>
    print(s)
  File "/mnt/home/dberenberg/gustaf_stuff/experiments/softmax/FastFold/lib/conda/envs/fastfold/lib/python3.8/site-packages/torch/_tensor.py", line 305, in __repr__
    return torch._tensor_str._str(self)
  File "/mnt/home/dberenberg/gustaf_stuff/experiments/softmax/FastFold/lib/conda/envs/fastfold/lib/python3.8/site-packages/torch/_tensor_str.py", line 434, in _str
    return _str_intern(self)
  File "/mnt/home/dberenberg/gustaf_stuff/experiments/softmax/FastFold/lib/conda/envs/fastfold/lib/python3.8/site-packages/torch/_tensor_str.py", line 409, in _str_intern
    tensor_str = _tensor_str(self, indent)
  File "/mnt/home/dberenberg/gustaf_stuff/experiments/softmax/FastFold/lib/conda/envs/fastfold/lib/python3.8/site-packages/torch/_tensor_str.py", line 264, in _tensor_str
    formatter = _Formatter(get_summarized_data(self) if summarize else self)
  File "/mnt/home/dberenberg/gustaf_stuff/experiments/softmax/FastFold/lib/conda/envs/fastfold/lib/python3.8/site-packages/torch/_tensor_str.py", line 100, in __init__
    nonzero_finite_vals = torch.masked_select(tensor_view, torch.isfinite(tensor_view) & tensor_view.ne(0))
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

It can be made to work by decreasing seq. For my particular setup, the threshold seems to be around 2000. Setting CUDA_LAUNCH_BLOCKING doesn't seem to do anything. Any tips for getting this working for tensors as large as [5120, 8, 384, 384], which appear in the AlphaFold pipeline?

Conda environment - no CUDA

Thanks for making this available, just a few comments on #installation

I've been trying to get the CONDA environment to work on my systems,

If I use the default environment I get the following errors:

Colossalai should be built with cuda extension to use the FP16 optimizer
If you want to activate cuda mode for MoE, please install with cuda_ext!
/proj/berzelius-2021-29/users/x_arnel/.conda/envs/FastFold/lib/python3.8/site-packages/torch/autocast_mode.py:162: UserWarning: User provided device_type of 'cuda', but CUDA is not available. Disabling
  warnings.warn('User provided device_type of \'cuda\', but CUDA is not available. Disabling')

I guess this is due to that the PyPI version of colossalai is not compiled with CUDA, see https://colossalai.org/download

Instead I had to do a postinstall:
pip install colossalai==0.1.6+torch1.11cu11.3 -f https://release.colossalai.org (perhaps this can be included in the .yml file ?)

but that created problems so I had to recompile which works after some minor tweaks (wrong nvcc paths and environmental variables,)

Update to newer OpenMM

The Dockerfile and environment.yml pin OpenMM to 7.5.1. That's an old release that isn't supported anymore. Could they be updated to the latest release, or alternatively could the pin be removed? I don't think any code changes are needed, although there are some deprecated module names that could be updated to avoid a deprecation warning. The patch in openmm.patch also isn't needed anymore. That change has been merged upstream.

different dap

hi, I try different dap_size, such as dap_size=2, dap_size=4, but with the increase of dap, the decrease of gpu memory is not obvious, have you tried this？

duplicate module fastfold.np and fastfold.common / fastfold.relax

Maybe some merge and check

fastfold.np.relax -> fastfold.relax
fastfold.np.residue_constants -> fastfold.common.residue_constants
fastfold.np.protein -> fastfold.common.protein

Ray storage dir cannot be created

Hi,

on our cluster the current method to generate the path for the ray directory is not working. It seems that os.getlogin() does not result in a meaningful value.
storage_dir = "file:///tmp/ray/" + os.getlogin() + "/workflow_data"
I get back an OS error that the device does not exist.

After removing os.getlogin() the workflow initiates properly.
Would it be possible to define the ray-storage dir via a flag like the ouput dir?

Thank you and all the best,
Dominik

pdb_seqres.txt parse error when execute hmmsearch

hi, i met a parse error when executing hmmsearch in pdb_seqres.txt. the log is:

(WorkflowManagementActor pid=7511) RuntimeError: hmmsearch failed:
(WorkflowManagementActor pid=7511) stdout:
(WorkflowManagementActor pid=7511) # hmmsearch :: search profile(s) against a sequence database
(WorkflowManagementActor pid=7511) # HMMER 3.3.2 (Nov 2020); http://hmmer.org/
(WorkflowManagementActor pid=7511) # Copyright (C) 2020 Howard Hughes Medical Institute.
(WorkflowManagementActor pid=7511) # Freely distributed under the BSD open source license.
(WorkflowManagementActor pid=7511) # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
(WorkflowManagementActor pid=7511) # query HMM file: /tmp/tmph1wus5n3/query.hmm
(WorkflowManagementActor pid=7511) # target sequence database: /uniprot/pdb_seqres.txt
(WorkflowManagementActor pid=7511) # MSA of all hits saved to file: ./alignments/5ZNG_1|Chain A|NBS-LRR type protein|Oryza sativa subsp. japonica (39947)/hmm_output.sto
(WorkflowManagementActor pid=7511) # show alignments in output: no
(WorkflowManagementActor pid=7511) # sequence reporting threshold: E-value <= 100
(WorkflowManagementActor pid=7511) # domain reporting threshold: E-value <= 100
(WorkflowManagementActor pid=7511) # sequence inclusion threshold: E-value <= 100
(WorkflowManagementActor pid=7511) # domain inclusion threshold: E-value <= 100
(WorkflowManagementActor pid=7511) # MSV filter P threshold: <= 0.1
(WorkflowManagementActor pid=7511) # Vit filter P threshold: <= 0.1
(WorkflowManagementActor pid=7511) # Fwd filter P threshold: <= 0.1
(WorkflowManagementActor pid=7511) # number of worker threads: 12
(WorkflowManagementActor pid=7511) # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
(WorkflowManagementActor pid=7511)
(WorkflowManagementActor pid=7511) Query: query [M=137]
(WorkflowManagementActor pid=7511)
(WorkflowManagementActor pid=7511)
(WorkflowManagementActor pid=7511) stderr:
(WorkflowManagementActor pid=7511) Parse failed (sequence file /uniprot/pdb_seqres.txt):
(WorkflowManagementActor pid=7511) Line 1366526: illegal character 0

About Training Settings

Can you provide init training and finetune training settings?

model config, data config, not just table1 in the paper.

e.g. whether use struct model? whether enable template? whether use extra msa?

thx.

Python Module bug

This line should be:
return grad_input.contiguous(), None, grad_bias, None

error: command '/usr/local/cuda-11.1/bin/nvcc' failed with exit status 1

Here is the full error when I installing from cloned repo

python setup.py install

/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/package/_directory_reader.py:17: UserWarning: Failed to initialize NumPy: numpy.core.multiarray failed to import (Triggered internally at ../torch/csrc/utils/tensor_numpy.cpp:68.)
_dtype_to_storage = {data_type(0).dtype: data_type for data_type in _storages}

torch.version = 1.10.0+cu111

Compiling cuda extensions with
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Tue_Sep_15_19:10:02_PDT_2020
Cuda compilation tools, release 11.1, V11.1.74
Build cuda_11.1.TC455_06.29069683_0
from /usr/local/cuda-11.1/bin

/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/setuptools/dist.py:717: UserWarning: Usage of dash-separated 'index-url' will not be supported in future versions. Please use the underscore name 'index_url' instead
warnings.warn(
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/setuptools/dist.py:487: UserWarning: Normalizing '0.1.0-beta' to '0.1.0b0'
warnings.warn(tmpl.format(**locals()))
running install
running bdist_egg
running egg_info
writing fastfold.egg-info/PKG-INFO
writing dependency_links to fastfold.egg-info/dependency_links.txt
writing requirements to fastfold.egg-info/requires.txt
writing top-level names to fastfold.egg-info/top_level.txt
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/utils/cpp_extension.py:381: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend.
warnings.warn(msg.format('we could not find ninja.'))
reading manifest file 'fastfold.egg-info/SOURCES.txt'
adding license file 'LICENSE'
writing manifest file 'fastfold.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_py
running build_ext
building 'fastfold_layer_norm_cuda' extension
gcc -pthread -B /root/anaconda3/envs/fastfold/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/data/FastFold/fastfold/model/kernel/cuda_native/csrc/include -I/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include -I/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/TH -I/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda-11.1/include -I/root/anaconda3/envs/fastfold/include/python3.8 -c fastfold/model/kernel/cuda_native/csrc/layer_norm_cuda.cpp -o build/temp.linux-x86_64-3.8/fastfold/model/kernel/cuda_native/csrc/layer_norm_cuda.o -O3 -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DVERSION_GE_1_5 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -DTORCH_EXTENSION_NAME=fastfold_layer_norm_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
/usr/local/cuda-11.1/bin/nvcc -I/data/FastFold/fastfold/model/kernel/cuda_native/csrc/include -I/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include -I/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/TH -I/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda-11.1/include -I/root/anaconda3/envs/fastfold/include/python3.8 -c fastfold/model/kernel/cuda_native/csrc/layer_norm_cuda_kernel.cu -o build/temp.linux-x86_64-3.8/fastfold/model/kernel/cuda_native/csrc/layer_norm_cuda_kernel.o -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -O3 --use_fast_math -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DVERSION_GE_1_5 -std=c++14 -maxrregcount=50 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -DTORCH_EXTENSION_NAME=fastfold_layer_norm_cuda -D_GLIBCXX_USE_CXX11_ABI=0
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h: In instantiation of ‘std::shared_ptrtorch::nn::Module torch::nn::Cloneable::clone(const c10::optionalc10::Device&) const [with Derived = torch::nn::CrossMapLRN2dImpl]’:
/tmp/tmpxft_000042ce_00000000-6_layer_norm_cuda_kernel.compute_80.cudafe1.stub.c:70:27: required from here
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:57:59: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_string, at::Tensor>&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:69:61: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >’ to type ‘torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h: In instantiation of ‘std::shared_ptrtorch::nn::Module torch::nn::Cloneable::clone(const c10::optionalc10::Device&) const [with Derived = torch::nn::EmbeddingBagImpl]’:
/tmp/tmpxft_000042ce_00000000-6_layer_norm_cuda_kernel.compute_80.cudafe1.stub.c:70:27: required from here
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:57:59: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_string, at::Tensor>&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:69:61: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >’ to type ‘torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h: In instantiation of ‘std::shared_ptrtorch::nn::Module torch::nn::Cloneable::clone(const c10::optionalc10::Device&) const [with Derived = torch::nn::EmbeddingImpl]’:
/tmp/tmpxft_000042ce_00000000-6_layer_norm_cuda_kernel.compute_80.cudafe1.stub.c:70:27: required from here
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:57:59: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_string, at::Tensor>&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:69:61: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >’ to type ‘torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h: In instantiation of ‘std::shared_ptrtorch::nn::Module torch::nn::Cloneable::clone(const c10::optionalc10::Device&) const [with Derived = torch::nn::ParameterDictImpl]’:
/tmp/tmpxft_000042ce_00000000-6_layer_norm_cuda_kernel.compute_80.cudafe1.stub.c:70:27: required from here
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:57:59: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_string, at::Tensor>&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:69:61: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >’ to type ‘torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h: In instantiation of ‘std::shared_ptrtorch::nn::Module torch::nn::Cloneable::clone(const c10::optionalc10::Device&) const [with Derived = torch::nn::SequentialImpl]’:
/tmp/tmpxft_000042ce_00000000-6_layer_norm_cuda_kernel.compute_80.cudafe1.stub.c:70:27: required from here
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:57:59: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_string, at::Tensor>&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:69:61: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >’ to type ‘torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h: In instantiation of ‘std::shared_ptrtorch::nn::Module torch::nn::Cloneable::clone(const c10::optionalc10::Device&) const [with Derived = torch::nn::ModuleListImpl]’:
/tmp/tmpxft_000042ce_00000000-6_layer_norm_cuda_kernel.compute_80.cudafe1.stub.c:70:27: required from here
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:57:59: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_string, at::Tensor>&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:69:61: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >’ to type ‘torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h: In instantiation of ‘std::shared_ptrtorch::nn::Module torch::nn::Cloneable::clone(const c10::optionalc10::Device&) const [with Derived = torch::nn::ModuleDictImpl]’:
/tmp/tmpxft_000042ce_00000000-6_layer_norm_cuda_kernel.compute_80.cudafe1.stub.c:70:27: required from here
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:57:59: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_string, at::Tensor>&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:69:61: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >’ to type ‘torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h: In instantiation of ‘std::shared_ptrtorch::nn::Module torch::nn::Cloneable::clone(const c10::optionalc10::Device&) const [with Derived = torch::nn::TransformerDecoderImpl]’:
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/optim/sgd.h:49:48: required from here
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:57:59: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_string, at::Tensor>&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:69:61: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >’ to type ‘torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h: In instantiation of ‘std::shared_ptrtorch::nn::Module torch::nn::Cloneable::clone(const c10::optionalc10::Device&) const [with Derived = torch::nn::TransformerEncoderImpl]’:
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/optim/sgd.h:49:48: required from here
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:57:59: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_string, at::Tensor>&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:69:61: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >’ to type ‘torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h: In instantiation of ‘std::shared_ptrtorch::nn::Module torch::nn::Cloneable::clone(const c10::optionalc10::Device&) const [with Derived = torch::nn::TransformerDecoderLayerImpl]’:
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/optim/sgd.h:49:48: required from here
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:57:59: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_string, at::Tensor>&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:69:61: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >’ to type ‘torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h: In instantiation of ‘std::shared_ptrtorch::nn::Module torch::nn::Cloneable::clone(const c10::optionalc10::Device&) const [with Derived = torch::nn::TransformerEncoderLayerImpl]’:
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/optim/sgd.h:49:48: required from here
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:57:59: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_string, at::Tensor>&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:69:61: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >’ to type ‘torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h: In instantiation of ‘std::shared_ptrtorch::nn::Module torch::nn::Cloneable::clone(const c10::optionalc10::Device&) const [with Derived = torch::nn::GroupNormImpl]’:
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/optim/sgd.h:49:48: required from here
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:57:59: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_string, at::Tensor>&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:69:61: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >’ to type ‘torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h: In instantiation of ‘std::shared_ptrtorch::nn::Module torch::nn::Cloneable::clone(const c10::optionalc10::Device&) const [with Derived = torch::nn::LocalResponseNormImpl]’:
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/optim/sgd.h:49:48: required from here
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:57:59: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_string, at::Tensor>&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:69:61: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >’ to type ‘torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h: In instantiation of ‘std::shared_ptrtorch::nn::Module torch::nn::Cloneable::clone(const c10::optionalc10::Device&) const [with Derived = torch::nn::LayerNormImpl]’:
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/optim/sgd.h:49:48: required from here
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:57:59: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_string, at::Tensor>&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:69:61: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >’ to type ‘torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h: In instantiation of ‘std::shared_ptrtorch::nn::Module torch::nn::Cloneable::clone(const c10::optionalc10::Device&) const [with Derived = torch::nn::MultiheadAttentionImpl]’:
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/optim/sgd.h:49:48: required from here
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:57:59: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_string, at::Tensor>&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:69:61: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >’ to type ‘torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h: In instantiation of ‘std::shared_ptrtorch::nn::Module torch::nn::Cloneable::clone(const c10::optionalc10::Device&) const [with Derived = torch::nn::ThresholdImpl]’:
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/optim/sgd.h:49:48: required from here
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:57:59: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_string, at::Tensor>&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:69:61: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >’ to type ‘torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h: In instantiation of ‘std::shared_ptrtorch::nn::Module torch::nn::Cloneable::clone(const c10::optionalc10::Device&) const [with Derived = torch::nn::LogSoftmaxImpl]’:
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/optim/sgd.h:49:48: required from here
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:57:59: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_string, at::Tensor>&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:69:61: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >’ to type ‘torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h: In instantiation of ‘std::shared_ptrtorch::nn::Module torch::nn::Cloneable::clone(const c10::optionalc10::Device&) const [with Derived = torch::nn::SoftminImpl]’:
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/optim/sgd.h:49:48: required from here
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:57:59: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_string, at::Tensor>&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:69:61: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >’ to type ‘torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h: In instantiation of ‘std::shared_ptrtorch::nn::Module torch::nn::Cloneable::clone(const c10::optionalc10::Device&) const [with Derived = torch::nn::SoftmaxImpl]’:
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/optim/sgd.h:49:48: required from here
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:57:59: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_string, at::Tensor>&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:69:61: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >’ to type ‘torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h: In instantiation of ‘std::shared_ptrtorch::nn::Module torch::nn::Cloneable::clone(const c10::optionalc10::Device&) const [with Derived = torch::nn::GRUCellImpl]’:
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/optim/sgd.h:49:48: required from here
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:57:59: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_string, at::Tensor>&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:69:61: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >’ to type ‘torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h: In instantiation of ‘std::shared_ptrtorch::nn::Module torch::nn::Cloneable::clone(const c10::optionalc10::Device&) const [with Derived = torch::nn::LSTMCellImpl]’:
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/optim/sgd.h:49:48: required from here
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:57:59: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_string, at::Tensor>&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:69:61: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >’ to type ‘torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h: In instantiation of ‘std::shared_ptrtorch::nn::Module torch::nn::Cloneable::clone(const c10::optionalc10::Device&) const [with Derived = torch::nn::RNNCellImpl]’:
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/optim/sgd.h:49:48: required from here
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:57:59: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_string, at::Tensor>&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:69:61: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >’ to type ‘torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h: In instantiation of ‘std::shared_ptrtorch::nn::Module torch::nn::Cloneable::clone(const c10::optionalc10::Device&) const [with Derived = torch::nn::GRUImpl]’:
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/optim/sgd.h:49:48: required from here
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:57:59: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_string, at::Tensor>&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:69:61: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >’ to type ‘torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h: In instantiation of ‘std::shared_ptrtorch::nn::Module torch::nn::Cloneable::clone(const c10::optionalc10::Device&) const [with Derived = torch::nn::LSTMImpl]’:
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/optim/sgd.h:49:48: required from here
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:57:59: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_string, at::Tensor>&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:69:61: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >’ to type ‘torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h: In instantiation of ‘std::shared_ptrtorch::nn::Module torch::nn::Cloneable::clone(const c10::optionalc10::Device&) const [with Derived = torch::nn::RNNImpl]’:
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/optim/sgd.h:49:48: required from here
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:57:59: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_string, at::Tensor>&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:69:61: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >’ to type ‘torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h: In instantiation of ‘std::shared_ptrtorch::nn::Module torch::nn::Cloneable::clone(const c10::optionalc10::Device&) const [with Derived = torch::nn::FractionalMaxPool3dImpl]’:
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/optim/sgd.h:49:48: required from here
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:57:59: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_string, at::Tensor>&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:69:61: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >’ to type ‘torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h: In instantiation of ‘std::shared_ptrtorch::nn::Module torch::nn::Cloneable::clone(const c10::optionalc10::Device&) const [with Derived = torch::nn::FractionalMaxPool2dImpl]’:
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/optim/sgd.h:49:48: required from here
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:57:59: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_string, at::Tensor>&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:69:61: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >’ to type ‘torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h: In instantiation of ‘std::shared_ptrtorch::nn::Module torch::nn::Cloneable::clone(const c10::optionalc10::Device&) const [with Derived = torch::nn::ZeroPad2dImpl]’:
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/optim/sgd.h:49:48: required from here
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:57:59: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_string, at::Tensor>&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:69:61: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >’ to type ‘torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h: In instantiation of ‘std::shared_ptrtorch::nn::Module torch::nn::Cloneable::clone(const c10::optionalc10::Device&) const [with Derived = torch::nn::UnfoldImpl]’:
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/optim/sgd.h:49:48: required from here
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:57:59: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_string, at::Tensor>&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:69:61: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >’ to type ‘torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h: In instantiation of ‘std::shared_ptrtorch::nn::Module torch::nn::Cloneable::clone(const c10::optionalc10::Device&) const [with Derived = torch::nn::FoldImpl]’:
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/optim/sgd.h:49:48: required from here
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:57:59: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_string, at::Tensor>&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:69:61: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >’ to type ‘torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h: In instantiation of ‘std::shared_ptrtorch::nn::Module torch::nn::Cloneable::clone(const c10::optionalc10::Device&) const [with Derived = torch::nn::ConvTranspose3dImpl]’:
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/optim/sgd.h:49:48: required from here
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:57:59: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_string, at::Tensor>&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:69:61: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >’ to type ‘torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h: In instantiation of ‘std::shared_ptrtorch::nn::Module torch::nn::Cloneable::clone(const c10::optionalc10::Device&) const [with Derived = torch::nn::ConvTranspose2dImpl]’:
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/optim/sgd.h:49:48: required from here
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:57:59: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_string, at::Tensor>&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:69:61: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >’ to type ‘torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h: In instantiation of ‘std::shared_ptrtorch::nn::Module torch::nn::Cloneable::clone(const c10::optionalc10::Device&) const [with Derived = torch::nn::ConvTranspose1dImpl]’:
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/optim/sgd.h:49:48: required from here
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:57:59: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_string, at::Tensor>&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:69:61: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >’ to type ‘torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h: In instantiation of ‘std::shared_ptrtorch::nn::Module torch::nn::Cloneable::clone(const c10::optionalc10::Device&) const [with Derived = torch::nn::Conv3dImpl]’:
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/optim/sgd.h:49:48: required from here
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:57:59: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_string, at::Tensor>&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:69:61: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >’ to type ‘torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h: In instantiation of ‘std::shared_ptrtorch::nn::Module torch::nn::Cloneable::clone(const c10::optionalc10::Device&) const [with Derived = torch::nn::Conv2dImpl]’:
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/optim/sgd.h:49:48: required from here
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:57:59: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_string, at::Tensor>&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:69:61: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >’ to type ‘torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h: In instantiation of ‘std::shared_ptrtorch::nn::Module torch::nn::Cloneable::clone(const c10::optionalc10::Device&) const [with Derived = torch::nn::Conv1dImpl]’:
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/optim/sgd.h:49:48: required from here
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:57:59: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_string, at::Tensor>&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:69:61: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >’ to type ‘torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h: In instantiation of ‘std::shared_ptrtorch::nn::Module torch::nn::Cloneable::clone(const c10::optionalc10::Device&) const [with Derived = torch::nn::AdaptiveLogSoftmaxWithLossImpl]’:
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/optim/sgd.h:49:48: required from here
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:57:59: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_string, at::Tensor>&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:69:61: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >’ to type ‘torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h: In instantiation of ‘std::shared_ptrtorch::nn::Module torch::nn::Cloneable::clone(const c10::optionalc10::Device&) const [with Derived = torch::nn::BilinearImpl]’:
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/optim/sgd.h:49:48: required from here
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:57:59: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_string, at::Tensor>&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:69:61: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >’ to type ‘torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h: In instantiation of ‘std::shared_ptrtorch::nn::Module torch::nn::Cloneable::clone(const c10::optionalc10::Device&) const [with Derived = torch::nn::UnflattenImpl]’:
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/optim/sgd.h:49:48: required from here
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:57:59: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_string, at::Tensor>&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:69:61: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >’ to type ‘torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h: In instantiation of ‘std::shared_ptrtorch::nn::Module torch::nn::Cloneable::clone(const c10::optionalc10::Device&) const [with Derived = torch::nn::LinearImpl]’:
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/optim/sgd.h:49:48: required from here
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:57:59: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_string, at::Tensor>&’
/root/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:69:61: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >’ to type ‘torch::OrderedDict<std::basic_string, std::shared_ptrtorch::nn::Module >&’
error: command '/usr/local/cuda-11.1/bin/nvcc' failed with exit status 1

CUDA out of memory

Dear author：

I run Fastfold in a 4 GPU device， each GPU have an 24GiB memory。

I run inference.py with an fasta lenght 1805AA (without triton), with parameter --gpus 3

and the error prints like:

RuntimeError: CUDA out of memory. Tried to allocate 29.26 GiB (GPU 0; 23.70 GiB total capacity; 9.63 GiB already allocated; 11.79 GiB free; 10.65 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

my questions is:

why there are only one GPU(GPU0 but not GPU0, GPU1,GPU2) used to calculate total memory? what should I do to get over this?
Is there a way to run an extremely long fasta files, like 4000AA?

appriciate your reply, thankyou.

ColossalAi missing from PyPi

When trying to install the requirements, pip failed trying to install ColossalAi and it seems it's no longer available. When trying to install ColoassalAi directly from ColossalAI, there's a mismatch in pytorch versioning.

TypeError in fastfold/data/templates.py

Traceback (most recent call last):
  File "~FastFold/inference.py", line 527, in <module>
    main(args)
  File "~FastFold/inference.py", line 153, in main
    inference_multimer_model(args)
  File "~FastFold/inference.py", line 268, in inference_multimer_model
    feature_dict = data_processor.process_fasta(
  File "~FastFold/fastfold/data/data_pipeline.py", line 1165, in process_fasta
    chain_features = self._process_single_chain(
  File "~FastFold/fastfold/data/data_pipeline.py", line 1114, in _process_single_chain
    chain_features = self._monomer_data_pipeline.process_fasta(
  File "~FastFold/fastfold/data/data_pipeline.py", line 942, in process_fasta
    template_features = make_template_features(
  File "~FastFold/fastfold/data/data_pipeline.py", line 76, in make_template_features
    templates_result = template_featurizer.get_templates(
  File "~FastFold/fastfold/data/templates.py", line 1163, in get_templates
    result = _process_single_hit(
  File "~FastFold/fastfold/data/templates.py", line 885, in _process_single_hit
    "%s_%s (sum_probs: %.2f, rank: %d): feature extracting errors: "
TypeError: must be real number, not NoneType

Since the update of alphafold v2.3, the upstream alphafold/data/templates.py has changed. The lines linked below are now required to avoid this error

https://github.com/deepmind/alphafold/blob/a3941673e90b8d1d75c60b16a4b3707ebf7ba527/alphafold/data/templates.py#L763-L764

CUDA kernel bug

This line does not seem correct:
row_d_input = 0;
row_d_input is a pointer, after it is zeroed we write to some bullshit location.

Template all atom mask was all zeros error

I also encountered this problem. https://github.com/hpcaitech/FastFold/issues/106

I get the following error message:
`
running in multimer mode...
Traceback (most recent call last):
File "/public/home/FastFold/lib/conda/envs/fastfold/lib/python3.8/site-packages/fastfold/data/templates.py", line 859, in _process_single_hit
features, realign_warning = _extract_template_features(
File "/public/home/FastFold/lib/conda/envs/fastfold/lib/python3.8/site-packages/fastfold/data/templates.py", line 651, in _extract_template_features
raise TemplateAtomMaskAllZerosError(
fastfold.data.templates.TemplateAtomMaskAllZerosError: Template all atom mask was all zeros: 6qx9_A1. Residue range: 49-100

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "inference.py", line 546, in
main(args)
File "inference.py", line 165, in main
inference_multimer_model(args)
File "inference.py", line 280, in inference_multimer_model
feature_dict = data_processor.process_fasta(
File "/public/home/FastFold/lib/conda/envs/fastfold/lib/python3.8/site-packages/fastfold/data/data_pipeline.py", line 1165, in process_fasta
chain_features = self._process_single_chain(
File "/public/home/FastFold/lib/conda/envs/fastfold/lib/python3.8/site-packages/fastfold/data/data_pipeline.py", line 1114, in _process_single_chain
chain_features = self._monomer_data_pipeline.process_fasta(
File "/public/home/FastFold/lib/conda/envs/fastfold/lib/python3.8/site-packages/fastfold/data/data_pipeline.py", line 942, in process_fasta
template_features = make_template_features(
File "/public/home/FastFold/lib/conda/envs/fastfold/lib/python3.8/site-packages/fastfold/data/data_pipeline.py", line 76, in make_template_features
templates_result = template_featurizer.get_templates(
File "/public/home/FastFold/lib/conda/envs/fastfold/lib/python3.8/site-packages/fastfold/data/templates.py", line 1167, in get_templates
result = _process_single_hit(
File "/public/home/FastFold/lib/conda/envs/fastfold/lib/python3.8/site-packages/fastfold/data/templates.py", line 888, in process_single_hit
"%s%s (sum_probs: %.2f, rank: %d): feature extracting errors: "
TypeError: must be real number, not NoneType
`

this here is the input file:
input.txt

In the fastfold/data/templates.py file，I modified the 884th line warning=None to solve this problem.

About the distributed inference

Hi, I saw you upload inference.py. I thought that it can support the inference on multi-gpu. So i wonder how to set the parameter on "--model-device". Thanks so much.

RuntimeError: CUDA error: no kernel image is available for execution on the device

How can i fix this error? I ran the command: torchrun --nproc_per_node=1 perf.py --msa-length 128 --res-length 256. Then the following error appeared.
The versions of Pytorch, Python, and CUDA are 1.10, 3.8, and 11.3, respectively.

Training in distributed mode with multiple processes, 1 GPU per process. Process 0, total 1.

initialize tensor model parallel with size 1
initialize data parallel with size 1
Traceback (most recent call last):
File "perf.py", line 191, in
main()
File "perf.py", line 156, in main
layer_inputs = attn_layers[lyr_idx].forward(*layer_inputs, node_mask, pair_mask)
File "/root/miniconda3/envs/myconda/lib/python3.8/site-packages/fastfold-0.1.0b0-py3.8-linux-x86_64.egg/fastfold/model/evoformer.py", line 17, in forward
node = self.msa_stack(node, pair, node_mask)
File "/root/miniconda3/envs/myconda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/root/miniconda3/envs/myconda/lib/python3.8/site-packages/fastfold-0.1.0b0-py3.8-linux-x86_64.egg/fastfold/model/msa.py", line 99, in forward
node = self.MSARowAttentionWithPairBias(node, pair, node_mask_row)
File "/root/miniconda3/envs/myconda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/root/miniconda3/envs/myconda/lib/python3.8/site-packages/fastfold-0.1.0b0-py3.8-linux-x86_64.egg/fastfold/model/msa.py", line 43, in forward
Z = self.layernormZ(Z)
File "/root/miniconda3/envs/myconda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/root/miniconda3/envs/myconda/lib/python3.8/site-packages/fastfold-0.1.0b0-py3.8-linux-x86_64.egg/fastfold/model/kernel/cuda_native/layer_norm.py", line 69, in forward
return FusedLayerNormAffineFunction.apply(input, self.weight, self.bias,
File "/root/miniconda3/envs/myconda/lib/python3.8/site-packages/fastfold-0.1.0b0-py3.8-linux-x86_64.egg/fastfold/model/kernel/cuda_native/layer_norm.py", line 22, in forward
output, mean, invvar = fastfold_layer_norm_cuda.forward_affine(
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 132) of binary: /root/miniconda3/envs/myconda/bin/python
Traceback (most recent call last):
File "/root/miniconda3/envs/myconda/bin/torchrun", line 33, in
sys.exit(load_entry_point('torch==1.10.0+cu113', 'console_scripts', 'torchrun')())
File "/root/miniconda3/envs/myconda/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 345, in wrapper
return f(*args, **kwargs)
File "/root/miniconda3/envs/myconda/lib/python3.8/site-packages/torch/distributed/run.py", line 719, in main
run(args)
File "/root/miniconda3/envs/myconda/lib/python3.8/site-packages/torch/distributed/run.py", line 710, in run
elastic_launch(
File "/root/miniconda3/envs/myconda/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/root/miniconda3/envs/myconda/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 259, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
perf.py FAILED

Failures:
<NO_OTHER_FAILURES>

Root Cause (first observed failure):
[0]:
time : 2022-03-15_10:18:15
host : 69f885408067
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 132)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

use dap in openfold

hi, I see how to use dap is described in the readme, as follows,

from fastfold.distributed import init_dap

torch.distributed.init_process_group(backend='nccl', init_method='env://')
init_dap(args.dap_size)

I want to know is it possible to use dap instead of deepspeed in openfold

compare the predicted and actual structure

Hi, I was wondering which python library did you use to compare the predicted and actual structure.

something similar to this.

Thanks

fatal error: cuda.h: No such file or directory

Dear author:

I try to test Fastfold, after followed the Installation Using Conda, (i think there are no command to test for a successful installation)

I run inference.py with the following code:

#################################
conda activate fastfold
python /home/FastFold/inference.py used.fasta /database/alphafold2-data/pdb_mmcif/mmcif_files/
--output_dir /mydir/output
--cpus 80
--gpus 3
--param_path /database/alphafold2-data/params/params_model_1.npz
--uniref90_database_path /database/alphafold2-data/uniref90/uniref90.fasta
--mgnify_database_path /database/alphafold2-data/mgnify/mgy_clusters_2018_12.fa
--pdb70_database_path /database/alphafold2-data/pdb70/pdb70
--uniclust30_database_path /database/alphafold2-data/uniclust30/uniclust30_2018_08/uniclust30_2018_08
--bfd_database_path /database/alphafold2-data/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt
--jackhmmer_binary_path /home/Software/miniconda3/envs/fastfold/bin/jackhmmer
--hhblits_binary_path /home/Software/miniconda3/envs/fastfold/bin/hhblits
--hhsearch_binary_path /home/Software/miniconda3/envs/fastfold/bin/hhsearch
--kalign_binary_path /home/Software/miniconda3/envs/fastfold/bin/kalign
#################################

It seems right at the jackhmmer→hhsearch→jackhmmer→hhblits steps

then I meet error print as follow：

I woundering what they hints and what should i do to run fastfold properly?

##########error message##################

/tmp/tmp4wm30exa/main.c:2:10: fatal error: cuda.h: No such file or directory
2 | #include "cuda.h"
| ^~~~~~~~
/tmp/tmp65558a3s/main.c:2:10: fatal error: cuda.h: No such file or directory
2 | #include "cuda.h"
| ^~~~~~~~
compilation terminated.
compilation terminated.
Traceback (most recent call last):
File "/home/FastFold/inference.py", line 513, in
main(args)
File "/home/FastFold/inference.py", line 150, in main
inference_monomer_model(args)
File "/home/FastFold/inference.py", line 415, in inference_monomer_model
torch.multiprocessing.spawn(inference_model, nprocs=args.gpus, args=(args.gpus, result_q, batch, args))
File "/home/Software/miniconda3/envs/fastfold/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 240, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/home/Software/miniconda3/envs/fastfold/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 198, in start_processes
while not context.join():
File "/home/Software/miniconda3/envs/fastfold/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 160, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:

-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "", line 21, in _layer_norm_fwd_fused
KeyError: ('2-.-0-.-0-d82511111ad128294e9d31a6ac684238-7929002797455b30efce6e41eddc6b57-3aa563e00c5c695dd945e23b09a86848-bb0203f280ee2aaa28bc6e4eff4090f3-ff946bd4b3b4a4cbdf8cedc6e1c658e0-5c5e32ff210f3b7f56c98ca29917c25e-06f0df2d61979d629033f4a22eff5198-0dd03b0bd512a184b3512b278d9dfa59-d35ab04ae841e2714a253c523530b071', (torch.float32, torch.float32, torch.float32, torch.float32, torch.float32, torch.float32, 'i32', 'i32', 'fp32'), (256,), (True, True, True, True, True, True, (True, False), (True, False), (False,)))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/Software/miniconda3/envs/fastfold/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 69, in _wrap
fn(i, *args)
File "/home/FastFold/inference.py", line 135, in inference_model
out = model(batch)
File "/home/Software/miniconda3/envs/fastfold/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "/home/FastFold/fastfold/model/hub/alphafold.py", line 507, in forward
outputs, m_1_prev, z_prev, x_prev = self.iteration(
File "/home/FastFold/fastfold/model/hub/alphafold.py", line 232, in iteration
m_1_prev, z_prev = self.recycling_embedder(
File "/home/Software/miniconda3/envs/fastfold/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "/home/FastFold/fastfold/model/fastnn/ops.py", line 1097, in forward
m_update = self.layer_norm_m(m)
File "/home/Software/miniconda3/envs/fastfold/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "/home/FastFold/fastfold/model/fastnn/kernel/layer_norm.py", line 52, in forward
return self.kernel_forward(input)
File "/home/FastFold/fastfold/model/fastnn/kernel/layer_norm.py", line 56, in kernel_forward
return LayerNormTritonFunc.apply(input, self.normalized_shape, self.weight, self.bias,
File "/home/FastFold/fastfold/model/fastnn/kernel/triton/layer_norm.py", line 164, in forward
_layer_norm_fwd_fused[(M,)](
File "/home/triton/python/triton/runtime/jit.py", line 106, in launcher
return self.run(*args, grid=grid, **kwargs)
File "", line 41, in _layer_norm_fwd_fused
File "/home/triton/python/triton/compiler.py", line 1239, in compile
so = _build(fn.name, src_path, tmpdir)
File "/home/triton/python/triton/compiler.py", line 1169, in _build
ret = subprocess.check_call(cc_cmd)
File "/home/Software/miniconda3/envs/fastfold/lib/python3.8/subprocess.py", line 364, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['/usr/bin/gcc', '/tmp/tmp65558a3s/main.c', '-O3', '-I/usr/local/cuda/include', '-I/home/Software/miniconda3/envs/fastfold/include/python3.8', '-I/tmp/tmp65558a3s', '-shared', '-fPIC', '-lcuda', '-o', '/tmp/tmp65558a3s/_layer_norm_fwd_fused.cpython-38-x86_64-linux-gnu.so', '-L/usr/lib/x86_64-linux-gnu']' returned non-zero exit status 1.

Bugfixes need to be pulled from the OpenFold repo

Hey there! I just wanted to flag that we recently fixed a bunch of bugs in OpenFold's template code (as of OpenFold commit #591d10d), which I noticed are also present here. These have a significant effect on the quality of predictions, especially for chains with small MSAs.

About error of inference question

Hi! I installed the conda environment according to the content in the READMD, and then wrote a script to infer the protein structure. The content of the script is as follows. After submitting the script, the following error will be reported. Please help to find out what caused it?

content of script：

#DSUB -q root.default
#DSUB -R 'cpu=12;gpu=2;mem=96000'
#DSUB -l wuhanG5500
#DSUB -N 1
#DSUB -e %J.out
#DSUB -o %J.out
module load anaconda/2020.11
module load cuda/11.5.0-gcc-4.8.5-atd
module load gcc/8.3.0-gcc-4.8.5-cpp
source activate fastfold
af2Root=/home/bingxing2/public/alphafold2.1.1
torchrun --nproc_per_node=2 ./inference.py multi.fasta $af2Root/pdb_mmcif/mmcif_files \
 --output_dir ./out \
 --model_name model_1 \
 --param_path $af2Root/params/params_model_1_multimer.npz \
 --cpus 2 \
 --uniref90_database_path $af2Root/uniref90/uniref90.fasta \
 --mgnify_database_path $af2Root/mgnify/mgy_clusters.fa \
 --pdb70_database_path $af2Root/pdb70/pdb70 \
 --uniclust30_database_path $af2Root/uniclust30/uniclust30_2018_08/uniclust30_2018_08 \
 --bfd_database_path $af2Root/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \
 --jackhmmer_binary_path `which jackhmmer` \
 --hhblits_binary_path `which hhblits` \
 --hhsearch_binary_path `which hhsearch` \
 --kalign_binary_path `which kalign`

error:

WARNING:torch.distributed.run:
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
*****************************************
[06/29/2022 01:17:31 AM] INFO     colossalai - colossalai - INFO: /home/bingxing
                                  2/gpuuser001/.conda/envs/fastfold/lib/python3.
                                  8/site-packages/colossalai/context/parallel_co
                                  ntext.py:519 set_device
                         INFO     colossalai - colossalai - INFO: process rank 1
                                  is bound to device 1
[06/29/2022 01:17:31 AM] INFO     colossalai - colossalai - INFO: /home/bingxing
                                  2/gpuuser001/.conda/envs/fastfold/lib/python3.
                                  8/site-packages/colossalai/context/parallel_co
                                  ntext.py:519 set_device
                         INFO     colossalai - colossalai - INFO: process rank 0
                                  is bound to device 0
[06/29/2022 01:17:33 AM] INFO     colossalai - colossalai - INFO: /home/bingxing
                                  2/gpuuser001/.conda/envs/fastfold/lib/python3.
                                  8/site-packages/colossalai/context/parallel_co
                                  ntext.py:555 set_seed
                         INFO     colossalai - colossalai - INFO: initialized
                                  seed on rank 1, numpy: 1024, python random:
                                  1024, ParallelMode.DATA: 1024,
                                  ParallelMode.TENSOR: 1025,the default parallel
                                  seed is ParallelMode.DATA.
[06/29/2022 01:17:33 AM] INFO     colossalai - colossalai - INFO: /home/bingxing
                                  2/gpuuser001/.conda/envs/fastfold/lib/python3.
                                  8/site-packages/colossalai/context/parallel_co
                                  ntext.py:555 set_seed
                         INFO     colossalai - colossalai - INFO: initialized
                                  seed on rank 0, numpy: 1024, python random:
                                  1024, ParallelMode.DATA: 1024,
                                  ParallelMode.TENSOR: 1024,the default parallel
                                  seed is ParallelMode.DATA.
                         INFO     colossalai - colossalai - INFO: /home/bingxing
                                  2/gpuuser001/.conda/envs/fastfold/lib/python3.
                                  8/site-packages/colossalai/initialize.py:112
                                  launch
                         INFO     colossalai - colossalai - INFO: Distributed
                                  environment is initialized, data parallel
                                  size: 1, pipeline parallel size: 1, tensor
                                  parallel size: 2
Traceback (most recent call last):
  File "./inference.py", line 266, in <module>
    main(args)Traceback (most recent call last):

  File "./inference.py", line 266, in <module>
  File "./inference.py", line 82, in main
    main(args)
import_jax_weights_(model, args.param_path, version=args.model_name)  File "./inference.py", line 82, in main

    import_jax_weights_(model, args.param_path, version=args.model_name)
  File "/home/bingxing2/gpuuser001/zhou/FastFold/fastfold/utils/import_weights.py", line 445, in import_jax_weights_
  File "/home/bingxing2/gpuuser001/zhou/FastFold/fastfold/utils/import_weights.py", line 445, in import_jax_weights_
    assert len(incorrect) == 0
    AssertionErrorassert len(incorrect) == 0

AssertionError
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 13987) of binary: /home/bingxing2/gpuuser001/.conda/envs/fastfold/bin/python
Traceback (most recent call last):
  File "/home/bingxing2/gpuuser001/.conda/envs/fastfold/bin/torchrun", line 33, in <module>
    sys.exit(load_entry_point('torch==1.11.0', 'console_scripts', 'torchrun')())
  File "/home/bingxing2/gpuuser001/.conda/envs/fastfold/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper
    return f(*args, **kwargs)
  File "/home/bingxing2/gpuuser001/.conda/envs/fastfold/lib/python3.8/site-packages/torch/distributed/run.py", line 724, in main
    run(args)
  File "/home/bingxing2/gpuuser001/.conda/envs/fastfold/lib/python3.8/site-packages/torch/distributed/run.py", line 715, in run
    elastic_launch(
  File "/home/bingxing2/gpuuser001/.conda/envs/fastfold/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/home/bingxing2/gpuuser001/.conda/envs/fastfold/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
./inference.py FAILED
------------------------------------------------------------
Failures:
[1]:
  time      : 2022-06-29_01:17:42
  host      : gpu09
  rank      : 1 (local_rank: 1)
  exitcode  : 1 (pid: 13988)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2022-06-29_01:17:42
  host      : gpu09
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 13987)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================

how to use .a3m file as input

Hi,
Can i use fastfold to run just inference?
for example, I dont need jackmmer to run the msa search, I already have my msa search result (.a3m file) ready. I also have GPU available for the model inference, how should i implement with fastfold? what arguments should be there? Thanks

wrong indentation?

Hi, maybe this line

FastFold/fastfold/model/hub/alphafold.py

Line 321 in f55dca9

del template_feats, template_embeds

needs to be shifted to the right into the if block? This at least solves an error when using an AF model trained without templates.

Any way to fold a multimer?

Function not accessible

Hi, I stumbled over an issue when using monomer_ptm model. The function LinearParams used here

FastFold/fastfold/utils/import_weights.py

Line 566 in f55dca9

"logits": LinearParams(model.aux_heads.tm.linear)

is defined in a different scope (L144) and cannot be accessed in this place.

The error of "Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels....."

Hello, How can i fix this error? When I ran the command, and the command is ok in my view: torchrun --nproc_per_node=4 inference.py ../testFasta/PsCrtW-HpCrtZ.fasta ../databases/pdb_mmcif/mmcif_files/ --output_dir ./output_4gpu --uniref90_database_path ../databases/uniref90/uniref90.fasta --mgnify_database_path ../databases/mgnify/mgy_clusters_2018_12.fa --pdb70_database_path ../databases/pdb70/pdb70 --param_path ../databases/params/params_model_1.npz --model_name model_1 --cpus 24 --uniclust30_database_path ../databases/uniclust30/uniclust30_2018_08/uniclust30_2018_08 --bfd_database_path ../databases/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt --jackhmmer_binary_path which jackhmmer --hhblits_binary_path which hhblits --hhsearch_binary_path which hhsearch --kalign_binary_path which kalign

[07/25/22 10:45:46] INFO     colossalai - colossalai - INFO: /anaconda/envs/fastfold_py38/lib/python3.8/site-packa
                             ges/colossalai/context/parallel_context.py:521 set_device                            
                    INFO     colossalai - colossalai - INFO: process rank 2 is bound to device 2                  
[07/25/22 10:45:46] INFO     colossalai - colossalai - INFO: /anaconda/envs/fastfold_py38/lib/python3.8/site-packa
                             ges/colossalai/context/parallel_context.py:521 set_device                            
                    INFO     colossalai - colossalai - INFO: process rank 0 is bound to device 0                  
[07/25/22 10:45:46] INFO     colossalai - colossalai - INFO: /anaconda/envs/fastfold_py38/lib/python3.8/site-packa
                             ges/colossalai/context/parallel_context.py:521 set_device                            
[07/25/22 10:45:46] INFO     colossalai - colossalai - INFO: /anaconda/envs/fastfold_py38/lib/python3.8/site-packa
                             ges/colossalai/context/parallel_context.py:521 set_device                            
                    INFO     colossalai - colossalai - INFO: process rank 1 is bound to device 1                  
                    INFO     colossalai - colossalai - INFO: process rank 3 is bound to device 3                  
[07/25/22 10:45:50] INFO     colossalai - colossalai - INFO: /anaconda/envs/fastfold_py38/lib/python3.8/site-packa
                             ges/colossalai/context/parallel_context.py:557 set_seed                              
                    INFO     colossalai - colossalai - INFO: initialized seed on rank 0, numpy: 1024, python      
                             random: 1024, ParallelMode.DATA: 1024, ParallelMode.TENSOR: 1024,the default parallel
                             seed is ParallelMode.DATA.                                                           
[07/25/22 10:45:50] INFO     colossalai - colossalai - INFO: /anaconda/envs/fastfold_py38/lib/python3.8/site-packa
                             ges/colossalai/context/parallel_context.py:557 set_seed                              
                    INFO     colossalai - colossalai - INFO:                                                      
                             /anaconda/envs/fastfold_py38/lib/python3.8/site-packages/colossalai/initialize.py:117
                             launch                                                                               
                    INFO     colossalai - colossalai - INFO: initialized seed on rank 1, numpy: 1024, python      
                             random: 1024, ParallelMode.DATA: 1024, ParallelMode.TENSOR: 1025,the default parallel
                             seed is ParallelMode.DATA.                                                           
                    INFO     colossalai - colossalai - INFO: Distributed environment is initialized, data parallel
                             size: 1, pipeline parallel size: 1, tensor parallel size: 4                          
[07/25/22 10:45:50] INFO     colossalai - colossalai - INFO: /anaconda/envs/fastfold_py38/lib/python3.8/site-packa
                             ges/colossalai/context/parallel_context.py:557 set_seed                              
                    INFO     colossalai - colossalai - INFO: initialized seed on rank 2, numpy: 1024, python      
                             random: 1024, ParallelMode.DATA: 1024, ParallelMode.TENSOR: 1026,the default parallel
                             seed is ParallelMode.DATA.                                                           
[07/25/22 10:45:50] INFO     colossalai - colossalai - INFO: /anaconda/envs/fastfold_py38/lib/python3.8/site-packa
                             ges/colossalai/context/parallel_context.py:557 set_seed                              
                    INFO     colossalai - colossalai - INFO: initialized seed on rank 3, numpy: 1024, python      
                             random: 1024, ParallelMode.DATA: 1024, ParallelMode.TENSOR: 1027,the default parallel
                             seed is ParallelMode.DATA.                                                           
Generating features...
[07/25/22 10:46:12] INFO     colossalai - root - INFO: Launching subprocess                                       
                             "/anaconda/envs/fastfold_py38/bin/jackhmmer -o /dev/null -A                          
                             /tmp/tmphz8bnmj6/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --incE 0.0001  
                             -E 0.0001 --cpu 24 -N 1 ./output_4gpu/tmp.fasta ../databases/uniref90/uniref90.fasta"
                    INFO     colossalai - root - INFO: Started Jackhmmer (uniref90.fasta) query                   
[07/25/22 11:11:02] INFO     colossalai - root - INFO: Finished Jackhmmer (uniref90.fasta) query in 1489.951      
                             seconds                                                                              
                    INFO     colossalai - root - INFO: Launching subprocess                                       
                             "/anaconda/envs/fastfold_py38/bin/hhsearch -i /tmp/tmp8xdycgi4/query.a3m -o          
                             /tmp/tmp8xdycgi4/output.hhr -maxseq 1000000 -cpu 24 -d ../databases/pdb70/pdb70"     
                    INFO     colossalai - root - INFO: Started HHsearch query                                     
[07/25/22 11:11:43] INFO     colossalai - root - INFO: Finished HHsearch query in 41.348 seconds                  
[07/25/22 11:11:44] INFO     colossalai - root - INFO: Launching subprocess                                       
                             "/anaconda/envs/fastfold_py38/bin/jackhmmer -o /dev/null -A                          
                             /tmp/tmpnul0i_bi/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --incE 0.0001  
                             -E 0.0001 --cpu 24 -N 1 ./output_4gpu/tmp.fasta                                      
                             ../databases/mgnify/mgy_clusters_2018_12.fa"                                         
                    INFO     colossalai - root - INFO: Started Jackhmmer (mgy_clusters_2018_12.fa) query          
[E ProcessGroupNCCL.cpp:719] [Rank 2] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=3, OpType=BROADCAST, Timeout(ms)=1800000) ran for 1804531 milliseconds before timing out.
[E ProcessGroupNCCL.cpp:719] [Rank 3] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=3, OpType=BROADCAST, Timeout(ms)=1800000) ran for 1804731 milliseconds before timing out.
[E ProcessGroupNCCL.cpp:719] [Rank 1] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=3, OpType=BROADCAST, Timeout(ms)=1800000) ran for 1804734 milliseconds before timing out.
[E ProcessGroupNCCL.cpp:406] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplete data. To avoid this inconsistency, we are taking the entire process down.
[E ProcessGroupNCCL.cpp:406] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplete data. To avoid this inconsistency, we are taking the entire process down.
[E ProcessGroupNCCL.cpp:406] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplete data. To avoid this inconsistency, we are taking the entire process down.
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 21309 closing signal SIGTERM
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -6) local_rank: 1 (pid: 21310) of binary: /anaconda/envs/fastfold_py38/bin/python
Traceback (most recent call last):
  File "/anaconda/envs/fastfold_py38/bin/torchrun", line 33, in <module>
    sys.exit(load_entry_point('torch==1.11.0', 'console_scripts', 'torchrun')())
  File "/anaconda/envs/fastfold_py38/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper
    return f(*args, **kwargs)
  File "/anaconda/envs/fastfold_py38/lib/python3.8/site-packages/torch/distributed/run.py", line 724, in main
    run(args)
  File "/anaconda/envs/fastfold_py38/lib/python3.8/site-packages/torch/distributed/run.py", line 715, in run
    elastic_launch(
  File "/anaconda/envs/fastfold_py38/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/anaconda/envs/fastfold_py38/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
======================================================
inference.py FAILED
------------------------------------------------------
Failures:
[1]:
  time      : 2022-07-25_11:16:19
  host      : localhost
  rank      : 2 (local_rank: 2)
  exitcode  : -6 (pid: 21311)
  error_file: <N/A>
  traceback : Signal 6 (SIGABRT) received by PID 21311
[2]:
  time      : 2022-07-25_11:16:19
  host      : localhost
  rank      : 3 (local_rank: 3)
  exitcode  : -6 (pid: 21312)
  error_file: <N/A>
  traceback : Signal 6 (SIGABRT) received by PID 21312
------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2022-07-25_11:16:19
  host      : localhost
  rank      : 1 (local_rank: 1)
  exitcode  : -6 (pid: 21310)
  error_file: <N/A>
  traceback : Signal 6 (SIGABRT) received by PID 21310

cuda error for inference with multimer model

Inference speed

Hi,
I try to run inference scripts using T1050.fasta(779 residues) on nvidia a100, the following table shows the in time.

Type	Inference time
FastFold	262.14789s
OpenFold	124.69651s

It's a little different from the data in the paper. I don't know whether it's normal.
Thanks.

type_extentions import error

Warning: Torch did not find available GPUs on this system.

Hi,

When I installed fastfold, I meet the following problem.

I think it's because I don't have a cuda environment on the cluster server (cpu node, non-root user) (but the code can be submitted to the gpu node to run). Is there any solution please?

Thanks！

Template all atom mask was all zeros error

Hi,

i am running the latest release of fastfold using this command:

python inference.py input.fa database/pdb_mmcif/mmcif_files/ \
    --output_dir output/ \
    --gpus 1 \
    --model_preset multimer \
    --uniref90_database_path database/uniref90/uniref90.fasta \
    --mgnify_database_path database/mgnify/mgy_clusters_2018_12.fa \
    --pdb70_database_path database/pdb70/pdb70 \
    --uniclust30_database_path database/uniclust30/uniclust30_2018_08/uniclust30_2018_08 \
    --bfd_database_path database/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \
    --uniprot_database_path database/uniprot/uniprot_sprot.fasta \
    --pdb_seqres_database_path database/pdb_seqres/pdb_seqres.txt  \
    --param_path database/params/params_model_1_multimer_v2.npz \
    --model_name model_1_multimer_v2 \
    --jackhmmer_binary_path `which jackhmmer` \
    --hhblits_binary_path `which hhblits` \
    --hhsearch_binary_path `which hhsearch` \
    --kalign_binary_path `which kalign`

I came across a problem during the template generation. I get the following error message:

[11/03/22 14:41:37] INFO     colossalai - root - INFO: Invalid resolution format: ['.']
                    INFO     colossalai - root - INFO: Found an exact template match 6v8o_I.
Traceback (most recent call last):
  File "/scratch-cbe/users/handler/fastfold/FastFold/fastfold/data/templates.py", line 859, in _process_single_hit
    features, realign_warning = _extract_template_features(
  File "/scratch-cbe/users/handler/fastfold/FastFold/fastfold/data/templates.py", line 651, in _extract_template_features
    raise TemplateAtomMaskAllZerosError(
fastfold.data.templates.TemplateAtomMaskAllZerosError: Template all atom mask was all zeros: 6v8o_I. Residue range: 415-475

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "inference.py", line 513, in <module>
    main(args)
  File "inference.py", line 148, in main
    inference_multimer_model(args)
  File "inference.py", line 263, in inference_multimer_model
    feature_dict = data_processor.process_fasta(
  File "/scratch-cbe/users/handler/fastfold/FastFold/fastfold/data/data_pipeline.py", line 1165, in process_fasta
    chain_features = self._process_single_chain(
  File "/scratch-cbe/users/handler/fastfold/FastFold/fastfold/data/data_pipeline.py", line 1114, in _process_single_chain
    chain_features = self._monomer_data_pipeline.process_fasta(
  File "/scratch-cbe/users/handler/fastfold/FastFold/fastfold/data/data_pipeline.py", line 942, in process_fasta
    template_features = make_template_features(
  File "/scratch-cbe/users/handler/fastfold/FastFold/fastfold/data/data_pipeline.py", line 76, in make_template_features
    templates_result = template_featurizer.get_templates(
  File "/scratch-cbe/users/handler/fastfold/FastFold/fastfold/data/templates.py", line 1163, in get_templates
    result = _process_single_hit(
  File "/scratch-cbe/users/handler/fastfold/FastFold/fastfold/data/templates.py", line 885, in _process_single_hit
    "%s_%s (sum_probs: %.2f, rank: %d): feature extracting errors: "
TypeError: must be real number, not NoneType

The lines in the hhblits output corresponding to the ID raising the error are:

hmm_output.sto:#=GS 6v8o_I/416-476  DE [subseq from] mol:protein length:557  Chromatin structure-remodeling complex protein RSC8
hmm_output.sto:6v8o_I/416-476          -----EISEKYIEESQAIIQEL.VKLTMEKLESKF.TKLCDLETQlEMEKLKYVKES..eK.M.lN...D....RLSLS-....--------......-.--------..--....------------------------------------------------------------------------------------
hmm_output.sto:#=GR 6v8o_I/416-476  PP .....56799************.************.**9998865378888876554..24.4.25...5....65555.........................................................................................................................
hmm_output.sto:6v8o_I/416-476          -----
hmm_output.sto:#=GR 6v8o_I/416-476  PP .....

Is this a bug or is the problem here on my side.

Another question, is it correct to change the name of the model to params_model_1_multimer_v2.npz? In your readme you use params_model_1_multimer.npz but this is not included in the downloaded parameter tar file.

All the best and thank you,
Dominik

inference error when predicting multimer sequence

when i try to inference multimer sequence，i met the error after finishing running alignment and searching for templates：
Traceback (most recent call last):
File "inference.py", line 513, in
main(args)
File "inference.py", line 148, in main
inference_multimer_model(args)
File "inference.py", line 268, in inference_multimer_model
processed_feature_dict = feature_processor.process_features(
File "/home/FastFold/fastfold/data/feature_pipeline.py", line 124, in process_features
return np_example_to_features(
File "/home/FastFold/fastfold/data/feature_pipeline.py", line 96, in np_example_to_features
features = input_pipeline_multimer.process_tensors_from_config(
File "/home/FastFold/fastfold/data/input_pipeline_multimer.py", line 107, in process_tensors_from_config
tensors = compose(nonensembled)(tensors)
File "/home/FastFold/fastfold/data/data_transforms.py", line 76, in
return lambda x: f(x, *args, **kwargs)
File "/home/FastFold/fastfold/data/input_pipeline_multimer.py", line 124, in compose
x = f(x)
File "/home/FastFold/fastfold/data/data_transforms_multimer.py", line 298, in make_msa_profile
batch['msa_mask'][..., None],
KeyError: 'msa_mask'

i’ve tried several sequences，but met the same error . The sequence is

7M5F_1|Chain A|CdiI|Serratia marcescens (615)
MKEIKLMADYHCYPLWGTTPDDFGDISPDELPISLGLKNSLEAWAKRYDAILNTDDPALSGFKSVEEEKLFIDDGYKLAELLQEELGSAYKVIYHADY
7M5F_2|Chain B[auth C]|Toxin CdiA|Serratia marcescens (615)
MHHHHHHENLYFQSNAAKNSLTTKSLFKEMTIQGIKFTPENVVGAAKDNSGKIIFLEKGNSKSGLQHIVEEHGDQFAQIGVSEARIPDVVMKAVTDGKIVGYQGAGAGRPIYETMIDGKKYNIAVTVGSNGYVVGANLRGSVK

Issues with processing the templates

Hi!

I am running a few test cases and encountered some problems that I think are related to processing the templates.

I have a working example that led to a pdb output:

python /global/scratch/users/skyungyong/Software/FastFold/inference.py --output_dir ./ --model_preset multimer --use_precomputed_alignments Alignments --enable_workflow --inplace --param_path /global/scratch/users/skyungyong/Software/FastFold/data/params/params_model_1_multimer_v3.npz -model_name model_1_multimer AT3G18790-AT3G18790.fasta /global/scratch/users/skyungyong/Software/alphafold-multimer-v2.2.2-080922/Database/pdb_mmcif/mmcif_files/

WARNING:root:Triton is not available, fallback to old kernel.
WARNING:root:Triton is not available, fallback to old kernel.
WARNING:root:Triton is not available, fallback to old kernel.
running in multimer mode...
WARNING:root:Triton is not available, fallback to old kernel.
WARNING:root:Triton is not available, fallback to old kernel.
WARNING:root:Triton is not available, fallback to old kernel.
[02/21/23 20:04:34] INFO     colossalai - colossalai - INFO: /global/scratch/users/skyungyong/Software/anaconda3/envs/fastfold/lib/python3.8/site-packages/colossalai/context/parallel_context.py:521 set_device
                    INFO     colossalai - colossalai - INFO: process rank 0 is bound to device 0
[02/21/23 20:04:35] INFO     colossalai - colossalai - INFO: /global/scratch/users/skyungyong/Software/anaconda3/envs/fastfold/lib/python3.8/site-packages/colossalai/context/parallel_context.py:557 set_seed
                    INFO     colossalai - colossalai - INFO: initialized seed on rank 0, numpy: 1024, python random: 1024, ParallelMode.DATA: 1024, ParallelMode.TENSOR: 1024,the default parallel seed is ParallelMode.DATA.
                    INFO     colossalai - colossalai - INFO: /global/scratch/users/skyungyong/Software/anaconda3/envs/fastfold/lib/python3.8/site-packages/colossalai/initialize.py:116 launch
                    INFO     colossalai - colossalai - INFO: Distributed environment is initialized, data parallel size: 1, pipeline parallel size: 1, tensor parallel size: 1
Inference time: 144.00418956391513

These are the some of the runs that produced error messages

python /global/scratch/users/skyungyong/Software/FastFold/inference.py --output_dir ./ --model_preset multimer --use_precomputed_alignments Alignments --enable_workflow --inplace --param_path /global/scratch/users/skyungyong/Software/FastFold/data/params/params_model_1_multimer_v3.npz --model_name model_1_multimer AT1G23170-AT1G23170.fasta /global/scratch/users/skyungyong/Software/alphafold-multimer-v2.2.2-080922/Database/pdb_mmcif/mmcif_files/
WARNING:root:Triton is not available, fallback to old kernel.
WARNING:root:Triton is not available, fallback to old kernel.
WARNING:root:Triton is not available, fallback to old kernel.
running in multimer mode...
WARNING:root:Triton is not available, fallback to old kernel.
WARNING:root:Triton is not available, fallback to old kernel.
WARNING:root:Triton is not available, fallback to old kernel.
[02/21/23 20:16:03] INFO     colossalai - colossalai - INFO: /global/scratch/users/skyungyong/Software/anaconda3/envs/fastfold/lib/python3.8/site-packages/colossalai/context/parallel_context.py:521 set_device
                    INFO     colossalai - colossalai - INFO: process rank 0 is bound to device 0
[02/21/23 20:16:12] INFO     colossalai - colossalai - INFO: /global/scratch/users/skyungyong/Software/anaconda3/envs/fastfold/lib/python3.8/site-packages/colossalai/context/parallel_context.py:557 set_seed
                    INFO     colossalai - colossalai - INFO: initialized seed on rank 0, numpy: 1024, python random: 1024, ParallelMode.DATA: 1024, ParallelMode.TENSOR: 1024,the default parallel seed is ParallelMode.DATA.
                    INFO     colossalai - colossalai - INFO: /global/scratch/users/skyungyong/Software/anaconda3/envs/fastfold/lib/python3.8/site-packages/colossalai/initialize.py:116 launch
                    INFO     colossalai - colossalai - INFO: Distributed environment is initialized, data parallel size: 1, pipeline parallel size: 1, tensor parallel size: 1
Traceback (most recent call last):
  File "/global/scratch/users/skyungyong/Software/FastFold/inference.py", line 548, in <module>
    main(args)
  File "/global/scratch/users/skyungyong/Software/FastFold/inference.py", line 164, in main
    inference_multimer_model(args)
  File "/global/scratch/users/skyungyong/Software/FastFold/inference.py", line 293, in inference_multimer_model
    torch.multiprocessing.spawn(inference_model, nprocs=args.gpus, args=(args.gpus, result_q, batch, args))
  File "/global/scratch/users/skyungyong/Software/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 240, in spawn
    return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
  File "/global/scratch/users/skyungyong/Software/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 198, in start_processes
    while not context.join():
  File "/global/scratch/users/skyungyong/Software/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 160, in join
    raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:

-- Process 0 terminated with the following error:
Traceback (most recent call last):
  File "/global/scratch/users/skyungyong/Software/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 69, in _wrap
    fn(i, *args)
  File "/global/scratch/users/skyungyong/Software/FastFold/inference.py", line 151, in inference_model
    out = model(batch)
  File "/global/scratch/users/skyungyong/Software/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/global/scratch/users/skyungyong/Software/FastFold/fastfold/model/hub/alphafold.py", line 522, in forward
    outputs, m_1_prev, z_prev, x_prev = self.iteration(
  File "/global/scratch/users/skyungyong/Software/FastFold/fastfold/model/hub/alphafold.py", line 270, in iteration
    template_embeds = self.template_embedder(
  File "/global/scratch/users/skyungyong/Software/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/global/scratch/users/skyungyong/Software/FastFold/fastfold/model/fastnn/embedders_multimer.py", line 368, in forward
    self.template_single_embedder(
  File "/global/scratch/users/skyungyong/Software/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/global/scratch/users/skyungyong/Software/FastFold/fastfold/model/fastnn/embedders_multimer.py", line 238, in forward
    all_atom_multimer.compute_chi_angles(
  File "/global/scratch/users/skyungyong/Software/FastFold/fastfold/utils/all_atom_multimer.py", line 441, in compute_chi_angles
    chi_angle_atoms_mask = torch.prod(chi_angle_atoms_mask, dim=-1)
RuntimeError: CUDA driver error: invalid argument

python /global/scratch/users/skyungyong/Software/FastFold/inference.py --output_dir ./ --model_preset multimer --use_precomputed_alignments Alignments --enable_workflow --inplace --param_path /global/scratch/users/skyungyong/Software/FastFold/data/params/params_model_1_multimer_v3.npz --model_name model_1_multimer AT1G13220-AT1G13220.fasta /global/scratch/users/skyungyong/Software/alphafold-multimer-v2.2.2-080922/Database/pdb_mmcif/mmcif_files/
WARNING:root:Triton is not available, fallback to old kernel.
WARNING:root:Triton is not available, fallback to old kernel.
WARNING:root:Triton is not available, fallback to old kernel.
running in multimer mode...
Traceback (most recent call last):
  File "/global/scratch/users/skyungyong/Software/FastFold/fastfold/data/templates.py", line 859, in _process_single_hit
    features, realign_warning = _extract_template_features(
  File "/global/scratch/users/skyungyong/Software/FastFold/fastfold/data/templates.py", line 651, in _extract_template_features
    raise TemplateAtomMaskAllZerosError(
fastfold.data.templates.TemplateAtomMaskAllZerosError: Template all atom mask was all zeros: 6zmi_CE. Residue range: 4-304

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/global/scratch/users/skyungyong/Software/FastFold/inference.py", line 548, in <module>
    main(args)
  File "/global/scratch/users/skyungyong/Software/FastFold/inference.py", line 164, in main
    inference_multimer_model(args)
  File "/global/scratch/users/skyungyong/Software/FastFold/inference.py", line 281, in inference_multimer_model
    feature_dict = data_processor.process_fasta(
  File "/global/scratch/users/skyungyong/Software/FastFold/fastfold/data/data_pipeline.py", line 1165, in process_fasta
    chain_features = self._process_single_chain(
  File "/global/scratch/users/skyungyong/Software/FastFold/fastfold/data/data_pipeline.py", line 1114, in _process_single_chain
    chain_features = self._monomer_data_pipeline.process_fasta(
  File "/global/scratch/users/skyungyong/Software/FastFold/fastfold/data/data_pipeline.py", line 942, in process_fasta
    template_features = make_template_features(
  File "/global/scratch/users/skyungyong/Software/FastFold/fastfold/data/data_pipeline.py", line 76, in make_template_features
    templates_result = template_featurizer.get_templates(
  File "/global/scratch/users/skyungyong/Software/FastFold/fastfold/data/templates.py", line 1166, in get_templates
    result = _process_single_hit(
  File "/global/scratch/users/skyungyong/Software/FastFold/fastfold/data/templates.py", line 888, in _process_single_hit
    "%s_%s (sum_probs: %.2f, rank: %d): feature extracting errors: "
TypeError: must be real number, not NoneType

Will these be due to problems associated with the homologous templates themselves, or would there be a fix for this?

Thank you!

Issues after last commit

Last commit moves loss.py . This script still references old location.

FastFold/fastfold/relax/amber_minimize.py

Line 26 in ceee81d

import fastfold.model.loss as loss