GithubHelp home page GithubHelp logo

rosettacommons / rosettafold Goto Github PK

View Code? Open in Web Editor NEW
1.9K 59.0 427.0 171.98 MB

This package contains deep learning models and related scripts for RoseTTAFold

License: MIT License

Python 97.64% Shell 2.36%

rosettafold's Introduction

RoseTTAFold

This package contains deep learning models and related scripts to run RoseTTAFold.
This repository is the official implementation of RoseTTAFold: Accurate prediction of protein structures and interactions using a 3-track network.

Installation

  1. Clone the package
git clone https://github.com/RosettaCommons/RoseTTAFold.git
cd RoseTTAFold
  1. Create conda environment using RoseTTAFold-linux.yml file and folding-linux.yml file. The latter is required to run a pyrosetta version only (run_pyrosetta_ver.sh).
# create conda environment for RoseTTAFold
#   If your NVIDIA driver compatible with cuda11
conda env create -f RoseTTAFold-linux.yml
#   If not (but compatible with cuda10)
conda env create -f RoseTTAFold-linux-cu101.yml

# create conda environment for pyRosetta folding & running DeepAccNet
conda env create -f folding-linux.yml
  1. Download network weights (under Rosetta-DL Software license -- please see below)
    While the code is licensed under the MIT License, the trained weights and data for RoseTTAFold are made available for non-commercial use only under the terms of the Rosetta-DL Software license. You can find details at https://files.ipd.uw.edu/pub/RoseTTAFold/Rosetta-DL_LICENSE.txt

[Update Nov/02/2021] It's now including the weights (RF2t.pt) for RoseTTAFold-2track model used for yeast PPI screening. If you want to use it, please re-download weights. The original RoseTTAFold weights are not changed.

wget https://files.ipd.uw.edu/pub/RoseTTAFold/weights.tar.gz
tar xfz weights.tar.gz
  1. Download and install third-party software.
./install_dependencies.sh
  1. Download sequence and structure databases
# uniref30 [46G]
wget http://wwwuser.gwdg.de/~compbiol/uniclust/2020_06/UniRef30_2020_06_hhsuite.tar.gz
mkdir -p UniRef30_2020_06
tar xfz UniRef30_2020_06_hhsuite.tar.gz -C ./UniRef30_2020_06

# BFD [272G]
wget https://bfd.mmseqs.com/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt.tar.gz
mkdir -p bfd
tar xfz bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt.tar.gz -C ./bfd

# structure templates (including *_a3m.ffdata, *_a3m.ffindex) [over 100G]
wget https://files.ipd.uw.edu/pub/RoseTTAFold/pdb100_2021Mar03.tar.gz
tar xfz pdb100_2021Mar03.tar.gz
# for CASP14 benchmarks, we used this one: https://files.ipd.uw.edu/pub/RoseTTAFold/pdb100_2020Mar11.tar.gz
  1. Obtain a PyRosetta licence and install the package in the newly created folding conda environment (link).

Usage

# For monomer structure prediction
cd example
../run_[pyrosetta, e2e]_ver.sh input.fa .

# For complex modeling
# please see README file under example/complex_modeling/README for details.
python network/predict_complex.py -i paired.a3m -o complex -Ls 218 310 

# For PPI screening using faster 2-track version (example input and output are at example/complex_2track)
python network_2track/predict_msa.py -msa [paired MSA file in a3m format] -npz [output npz file name] -L1 [Length of first chain]
e.g. python network_2track/predict_msa.py -msa input.a3m -npz complex.npz -L1 218

Expected outputs

For the pyrosetta version, user will get five final models having estimated CA rms error at the B-factor column (model/model_[1-5].crderr.pdb).
For the end-to-end version, there will be a single PDB output having estimated residue-wise CA-lddt at the B-factor column (t000_.e2e.pdb).

FAQ

  1. Segmentation fault while running hhblits/hhsearch
    For easy install, we used a statically compiled version of hhsuite (installed through conda). Currently, we're not sure what exactly causes segmentation fault error in some cases, but we found that it might be resolved if you compile hhsuite from source and use this compiled version instead of conda version. For installation of hhsuite, please see here.

  2. Submitting jobs to computing nodes
    The modeling pipeline provided here (run_pyrosetta_ver.sh/run_e2e_ver.sh) is a kind of guidelines to show how RoseTTAFold works. For more efficient use of computing resources, you might want to modify the provided bash script to submit separate jobs with proper dependencies for each of steps (more cpus/memory for hhblits/hhsearch, using gpus only for running the networks, etc).

Links:

Credit to performer-pytorch and SE(3)-Transformer codes

The code in the network/performer_pytorch.py is strongly based on this repo which is pytorch implementation of Performer architecture. The codes in network/equivariant_attention is from the original SE(3)-Transformer repo which accompanies the paper 'SE(3)-Transformers: 3D Roto-Translation Equivariant Attention Networks' by Fabian et al.

References

M. Baek, et al., Accurate prediction of protein structures and interactions using a three-track neural network, Science (2021). link

I.R. Humphreys, J. Pei, M. Baek, A. Krishnakumar, et al, Computed structures of core eukaryotic protein complexes, Science (2021). link

rosettafold's People

Contributors

huhlim avatar minkbaek avatar neilfleckscri avatar runitralph avatar shawncal avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

rosettafold's Issues

source code changed after pretrain model submited

RuntimeError: Error(s) in loading state_dict for RoseTTAFoldModule_e2e:
Missing key(s) in state_dict: "templ_emb.encoder_L.layers.0.attn_L.to_q.weight", "templ_emb.encoder_L.layers.0.attn_L.to_k.weight",

Training dataset

Hi there;
Thanks for sharing your great job;
Do you plan to share your training dataset (+22000 target) publicly as well?
Thanks

Don't gain model folder.

Hi,
When I run ’run_pyrosetta_ver.sh input.fa .’, the result gets two folders hhblits and log. I don't get five final models. Why is this ?

Thank you

Last step of installation

Hi,
I am trying to install the RoseTTAfold on a Linux based cluster, and I am at the last step of the installation
6. Obtain a PyRosetta licence and install the package in the newly created folding conda environment (link).

I am a little confused which package to use and how would I configure the install at this step. Do you mind giving a little more clarification on this please?
Thanks,
Albert

No GPU available

Hi,
I want to use RoseTTAFold, but our Linux server has no GPU and only CPUs. Is it OK? Thank you!
Rusfell

Cuda out of memory error

Hello RoseTTAFold team,

Thank you for sharing the code and supporting the community.

I test the code on our HPC cluster, refer to README documentation, and hit a CUDA out of memory error.

$ python network/predict_complex.py -i example/complex_modeling/filtered.a3m -o complex -Ls 218 31
RuntimeError: CUDA out of memory. Tried to allocate 9.14 GiB (GPU 0; 15.78 GiB total capacity; 7.04 GiB already allocated; 6.08 GiB free; 8.36 GiB reserved in total by PyTorch)

From google search, find an article https://stackoverflow.com/questions/59129812/how-to-avoid-cuda-out-of-memory-in-pytorch;
I tried the following, it does not help.
import torch
torch.cuda.empty_cache()

Attempt to try the following, but I don't understand what are the 'variables' ? Should I try this on command line or embed it in the python code?
import gc
del variables
gc.collect()

If you have any suggestion, I much appreciate!

symlink in example folder

First of all, great work folks.

And now the issue / my suggestion:
Can you get rid of the symlinks in the example file (under model) or make them relative paths ? They are making Git flip out.

Looking forward to seeing this project grow!

Is the md5 hash for the database files available?

Hello,

I am just wondering if the md5 hash is available for those large database files and those decompressed files to compare. It took long to download and decompress, just want to make sure the files are intact.
I found that for the 272G bfd database file, wondering if it's available for the other two database files.

Thank you!

error in Install

Hello:
When I run commad'conda env create -f RoseTTAFold-linux-cu101.yml' or 'conda env create -f RoseTTAFold-linux.yml', I'm both getting the following error:
Collecting package metadata (repodata.json): done
Solving environment: failed

UnsatisfiableError: The following specifications were found to be incompatible with each other:
Is this my problem with my running environment. How to solve this problem?

Thany you!

OSError: libtorch_cuda_cu.so: cannot open shared object file: No such file or directory

Hi, thank you very much for this awesome work!

I tried to do complex modeling, but I encountered the following error.

$ python network/predict_complex.py -i example/complex_modeling/filtered2.a3m -o complex -Ls 218 310
Using backend: pytorch
Traceback (most recent call last):
  File "network/predict_complex.py", line 8, in <module>
    from RoseTTAFoldModel  import RoseTTAFoldModule_e2e
  File "/home/shengfa/RoseTTAFold/network/RoseTTAFoldModel.py", line 4, in <module>
    from Attention_module_w_str import IterativeFeatureExtractor
  File "/home/shengfa/RoseTTAFold/network/Attention_module_w_str.py", line 9, in <module>
    from InitStrGenerator import InitStr_Network
  File "/home/shengfa/RoseTTAFold/network/InitStrGenerator.py", line 6, in <module>
    import torch_geometric
  File "/home/shengfa/anaconda3/envs/RoseTTAFold/lib/python3.8/site-packages/torch_geometric/__init__.py", line 5, in <module>
    import torch_geometric.data
  File "/home/shengfa/anaconda3/envs/RoseTTAFold/lib/python3.8/site-packages/torch_geometric/data/__init__.py", line 1, in <module>
    from .data import Data
  File "/home/shengfa/anaconda3/envs/RoseTTAFold/lib/python3.8/site-packages/torch_geometric/data/data.py", line 8, in <module>
    from torch_sparse import coalesce, SparseTensor
  File "/home/shengfa/anaconda3/envs/RoseTTAFold/lib/python3.8/site-packages/torch_sparse/__init__.py", line 14, in <module>
    torch.ops.load_library(importlib.machinery.PathFinder().find_spec(
  File "/home/shengfa/.local/lib/python3.8/site-packages/torch/_ops.py", line 105, in load_library
    ctypes.CDLL(path)
  File "/home/shengfa/anaconda3/envs/RoseTTAFold/lib/python3.8/ctypes/__init__.py", line 381, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: libtorch_cuda_cu.so: cannot open shared object file: No such file or directory

I have installed the environment using the below script:

#   If your NVIDIA driver compatible with cuda11
conda env create -f RoseTTAFold-linux.yml

And the drivers are as follows:

$ nvidia-smi 
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 455.23.05    Driver Version: 455.23.05    CUDA Version: 11.1     |
|-------------------------------+----------------------+----------------------+
$ conda list
# packages in environment at /home/shengfa/anaconda3/envs/RoseTTAFold:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                        main    conda-forge
_openmp_mutex             4.5                       1_gnu  
biopython                 1.78             py38h497a2fe_2    conda-forge
blas                      1.0                         mkl    conda-forge
blast-legacy              2.2.26                        2    biocore
brotlipy                  0.7.0           py38h497a2fe_1001    conda-forge
bzip2                     1.0.8                h7b6447c_0  
ca-certificates           2021.5.25            h06a4308_1  
certifi                   2021.5.30        py38h06a4308_0  
cffi                      1.14.5           py38ha65f79e_0    conda-forge
chardet                   3.0.4                    pypi_0    pypi
cryptography              3.4.7            py38ha5dfef3_0    conda-forge
cudatoolkit               11.1.74              h6bb024c_0    nvidia
decorator                 4.4.2                    pypi_0    pypi
dgl-cu110                 0.6.1                    pypi_0    pypi
ffmpeg                    4.3                  hf484d3e_0    pytorch
freetype                  2.10.4               h5ab3b9f_0  
gmp                       6.2.1                h2531618_2  
gnutls                    3.6.15               he1e5248_0  
googledrivedownloader     0.4                pyhd3deb0d_1    conda-forge
hhsuite                   3.3.0           py38pl5262hc37a69a_2    bioconda
idna                      2.10               pyh9f0ad1d_0    conda-forge
intel-openmp              2021.2.0           h06a4308_610  
jinja2                    3.0.1              pyhd8ed1ab_0    conda-forge
joblib                    1.0.1              pyhd8ed1ab_0    conda-forge
jpeg                      9b                   h024ee3a_2  
lame                      3.100                h7b6447c_0  
lcms2                     2.12                 h3be6417_0  
ld_impl_linux-64          2.35.1               h7274673_9  
libblas                   3.9.0                     9_mkl    conda-forge
libcblas                  3.9.0                     9_mkl    conda-forge
libffi                    3.3                  he6710b0_2  
libgcc-ng                 9.3.0               h5101ec6_17  
libgfortran-ng            7.5.0               h14aa051_19    conda-forge
libgfortran4              7.5.0               h14aa051_19    conda-forge
libgomp                   9.3.0               h5101ec6_17  
libiconv                  1.15                 h63c8f33_5  
libidn2                   2.3.1                h27cfd23_0  
liblapack                 3.9.0                     9_mkl    conda-forge
libpng                    1.6.37               hbc83047_0  
libstdcxx-ng              9.3.0               hd4cf53a_17  
libtasn1                  4.16.0               h27cfd23_0  
libtiff                   4.2.0                h85742a9_0  
libunistring              0.9.10               h27cfd23_0  
libuv                     1.40.0               h7b6447c_0  
libwebp-base              1.2.0                h27cfd23_0  
lie-learn                 0.0.1.post1              pypi_0    pypi
lz4-c                     1.9.3                h2531618_0  
markupsafe                2.0.1            py38h497a2fe_0    conda-forge
mkl                       2021.2.0           h06a4308_296  
mkl-service               2.3.0            py38h27cfd23_1  
mkl_fft                   1.3.0            py38h42c9631_2  
mkl_random                1.2.1            py38ha9443f7_2  
ncurses                   6.2                  he6710b0_1  
nettle                    3.7.3                hbbd107a_1  
networkx                  2.5                        py_0    conda-forge
ninja                     1.10.2               hff7bd54_1  
numpy                     1.20.2           py38h2d18471_0  
numpy-base                1.20.2           py38hfae3a4d_0  
olefile                   0.46                       py_0    conda-forge
openh264                  2.1.0                hd408876_0  
openssl                   1.1.1k               h27cfd23_0  
packaging                 20.9               pyhd3eb1b0_0  
pandas                    1.2.5            py38h1abd341_0    conda-forge
perl                      5.26.2            h36c2ea0_1008    conda-forge
pillow                    8.2.0            py38he98fc37_0  
pip                       21.1.3           py38h06a4308_0  
psipred                   4.01                          1    biocore
pycparser                 2.20               pyh9f0ad1d_2    conda-forge
pyopenssl                 20.0.1             pyhd8ed1ab_0    conda-forge
pyparsing                 2.4.7              pyh9f0ad1d_0    conda-forge
pyrosetta                 2021.29+release.d8f55669792          pypi_0    pypi
pysocks                   1.7.1            py38h578d9bd_3    conda-forge
python                    3.8.10               h12debd9_8  
python-dateutil           2.8.1                      py_0    conda-forge
python-louvain            0.15               pyhd3deb0d_0    conda-forge
python_abi                3.8                      2_cp38    conda-forge
pytorch                   1.9.0           py3.8_cuda11.1_cudnn8.0.5_0    pytorch
pytorch-cluster           1.5.9           py38_torch_1.9.0_cu111    rusty1s
pytorch-geometric         1.7.2           py38_torch_1.9.0_cu111    rusty1s
pytorch-scatter           2.0.7           py38_torch_1.9.0_cu111    rusty1s
pytorch-sparse            0.6.10          py38_torch_1.9.0_cu111    rusty1s
pytorch-spline-conv       1.2.1           py38_torch_1.9.0_cu111    rusty1s
pytz                      2021.1             pyhd8ed1ab_0    conda-forge
readline                  8.1                  h27cfd23_0  
requests                  2.25.1             pyhd3deb0d_0    conda-forge
scikit-learn              0.24.2           py38ha9443f7_0  
scipy                     1.7.0                    pypi_0    pypi
setuptools                52.0.0           py38h06a4308_0  
six                       1.16.0             pyhd3eb1b0_0  
sqlite                    3.36.0               hc218d9a_0  
threadpoolctl             2.1.0              pyh5ca1d4c_0    conda-forge
tk                        8.6.10               hbc83047_0  
torchaudio                0.9.0                      py38    pytorch
torchvision               0.10.0               py38_cu111    pytorch
tqdm                      4.61.1             pyhd8ed1ab_0    conda-forge
typing_extensions         3.10.0.0           pyh06a4308_0  
urllib3                   1.25.11                  pypi_0    pypi
wheel                     0.36.2             pyhd3eb1b0_0  
xz                        5.2.5                h7b6447c_0  
zlib                      1.2.11               h7b6447c_3  
zstd                      1.4.9                haebb681_0

How can I fix this error?
Thank you for any help you can offer.

commercial license?

Hello:
Can you please let me know what the conditions are to use this software in a commercial setting, i.e. for a drug discovery project?
Thank you,
Markus

Issue with running hhsearch to completion

I'm working through getting RoseTTAFold operational using the example sequence.

HHblits and PSIPRED run to completion, however I encounter an error during hhsearch. In the hhsearch.stderr logfile, it says that the sequence ss_pred contains no residues. I checked t000_.msa0.ss2.a3m and the fastas for >ss_pred and >ss_conf are indeed empty, however the other alignment sequences are still present.

I tried to fix by recompiling hhsuite using the source binaries as suggested in your FAQ, however that doesn't seem to fix the issue. Does anyone have suggestions on how to move forward?

CONDA Environment

Hi,
I followed the installation using conda and installed the pyrosetta into the folding environment. However, I found that in the provided run_e2e script, it asks to conda activate RosettaFold rather than folding. I am wondering which one of the anaconda environments I should use to run.
Best,
Albert

hhsearch Segmentation fault

I tried these installation-packages or compiled version, all of them are segmentation fault.

#hh-suite
#wget https://github.com/soedinglab/hh-suite/releases/download/v3.3.0/hhsuite-3.3.0-SSE2-Linux.tar.gz
#tar xvfz hhsuite-3.3.0-SSE2-Linux.tar.gz

#wget https://github.com/soedinglab/hh-suite/releases/download/v3.3.0/hhsuite-3.3.0-AVX2-Linux.tar.gz
#tar xvfz hhsuite-3.3.0-AVX2-Linux.tar.gz

#wget https://mmseqs.com/hhsuite/hhsuite-linux-avx2.tar.gz --no-check-certificate
#tar xvfz hhsuite-linux-avx2.tar.gz

wget https://github.com/soedinglab/hh-suite/archive/refs/heads/master.zip --no-check-certificate
unzip hh-suite-master.zip
mv hh-suite-master hhsuite
cd hhsuite
mkdir -p ./build && cd ./build
apt install cmake
cmake -DCMAKE_INSTALL_PREFIX=.. ..
make -j 12
make install

CUDA kernel errors

When I try to run 'run_pyrosetta_ver.sh input.fa .', the error will be reported in 'network.stderr' file. Why is this?
Thanks.

Using backend: pytorch
Traceback (most recent call last):
  File "/home/ganjh/RoseTTAFold/network/predict_pyRosetta.py", line 199, in <module>
    pred = Predictor(model_dir=args.model_dir, use_cpu=args.use_cpu)
  File "/home/ganjh/RoseTTAFold/network/predict_pyRosetta.py", line 67, in __init__
    self.model = RoseTTAFoldModule(**MODEL_PARAM).to(self.device)
  File "/home/ganjh/.conda/envs/RoseTTAFold/lib/python3.8/site-packages/torch/nn/modules/module.py", line 852, in to
    return self._apply(convert)
  File "/home/ganjh/.conda/envs/RoseTTAFold/lib/python3.8/site-packages/torch/nn/modules/module.py", line 530, in _apply
    module._apply(fn)
  File "/home/ganjh/.conda/envs/RoseTTAFold/lib/python3.8/site-packages/torch/nn/modules/module.py", line 530, in _apply
    module._apply(fn)
  File "/home/ganjh/.conda/envs/RoseTTAFold/lib/python3.8/site-packages/torch/nn/modules/module.py", line 552, in _apply
    param_applied = fn(param)
  File "/home/ganjh/.conda/envs/RoseTTAFold/lib/python3.8/site-packages/torch/nn/modules/module.py", line 850, in convert
    return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
RuntimeError: CUDA error: the provided PTX was compiled with an unsupported toolchain.
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

Example for more-than-2-proteins complex estimation

The example provided for complex estimation and the make_joint_MSA_bacterial.py script refer to just 2 proteins. In the cases where more than 2 proteins must be docked (the paper talks about two or more sequences), how does the pipeline work?
Should we run make_joint_MSA_bacterial.py iteratively adding one protein at a time (after the first 2 have been docked)?

Ask for help when predicting thousands of proteins

Hi, RoseTTAFold team~

Since i need to predict thousands of proteins, but the scripts (run_e2e_ver.sh and run_pyrosetta_ver.sh) seem to only input one protein at the same time. What should i do to speedup my analysis?

Currently, i use the for loop to run pyrosetta for each protein, any suggustion?

Thanks!

Rusfell

ValueError: Number of processes must be at least 1

Hi, RoseTTAFold team.

There was no pdb outcome in the model folder when running pyrosetta version.

$ cat pick.stderr
Traceback (most recent call last):
File "/data/Rusfell/03.RoseTTAFold/RoseTTAFold/DAN-msa/pick_final_models.div.py", line 113, in
pool = mp.Pool(n_core_pool)
File "/home/Rusfell/anaconda3/envs/folding/lib/python3.7/multiprocessing/context.py", line 119, in Pool
context=self.get_context())
File "/home/Rusfell/anaconda3/envs/folding/lib/python3.7/multiprocessing/pool.py", line 169, in init
raise ValueError("Number of processes must be at least 1")
ValueError: Number of processes must be at least 1

Rusfell

Configuration file for the databases

I realized that paths for the database are hard-coded in the bash scripts. For example, in the "run_pyrosetta_ver.sh", the PDB100 database is referred as
DB="$PIPEDIR/pdb100_2021Mar03/pdb100_2021Mar03"
I am suggesting that it would be better to have a configuration file for the database paths for further updates on the databases.

csblast-2.2.3 missing

when running ../run_pyrosetta_ver.sh input.fa . encouter error !

more make_ss.stderr
/training/nong/protein/RoseTTAFold/input_prep/make_ss.sh: line 11: /training/nong/protein/RoseTTAFold/csblast-2.2.3/bi
n/csbuild: No such file or directory
[makemat] FATAL ERROR: Unable to open file t000_.msa0.tmp.chk

Bad psipred pass1 file format!
rm: cannot remove '/training/nong/protein/RoseTTAFold/example2/t000_.msa0.a3m.csb.hhblits.ss2': No such file or direct
ory

It seems csblast was not in the install folder

CUDA compatibility

I have an older version of the NVIDIA driver (450.36), which does not support CUDA toolkit >= 11.1 (CUDA compatibility)
In such cases, network/predict_*.py scripts failed to run smoothly using GPUs because of the compatibility issue.

Do the script really necessary to be run with the CUDA toolkit=11.1? When I created another RoseTTAFold conda environment with

  • cudatoolkit=10.2
  • pytorch=1.8.1 (it could be 1.9.0, but I was using 1.8.1, so I tested with it.)

, the scripts seem to work fine.
Someone who struggles with the same issue may use this YAML file for the RoseTTAFold conda environment creation.

no pdb outcome in the pyrosetta run version

runs normal, however, there is no pdb file out in the result folder

$ run_pyrosetta_ver.sh XXX.fasta .
Running HHblits
Running PSIPRED
Running hhsearch
Predicting distance and orientations
Running parallel RosettaTR.py
$ ls
hhblits log parallel.fold.list pdb-3track t000_.3track.npz t000_.atab t000_.hhr t000_.msa0.a3m t000_.msa0.ss2.a3m t000_.ss2
$ ls pdb-3track

any suggestion? run e2e version normal, thanks

Running predictions for Protein-Protein complexes

Hi, first of all thank you for open sourcing this amazing tool!

The paper mentions that

The final layer of the end-to-end version of our 3-track network generates 3D structure
models by combining features from discontinuous crops of the protein sequence (two segments
of the protein with a chain break between them). Because the network can seamlessly handle
chain breaks, it can be readily utilized to predict the structure of protein-protein complexes
directly from sequence information.

Looking at the code in Refine_module.py and predict_e2e.py, it looks like there are quite a lot of details to get right and many ways to screw up how the information from two disjoint chains should be merged. According to the supplementary material there's also changes in the positional encoding needed for this to work (the +200 offset).

Is there an example script or some guidelines that you could share that shows how to properly do a structure prediction for a Protein-Protein complex?

Finally, I was wondering if any experiments were done trying to estimate binding affinity scores from the hidden layer activations of the Refine_Module? This would be a very useful extension for doing protein design...

casp14 result

This is a wonderful job. But I have two questions.

This picture is the result of the evaluation of casp14.

  1. Do you only use UniRef30_2020_06 and bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt database to search MSAs?
  2. Do you only use pdb100_2021Mar03 database to search templates(with above MSAs)?

Type error from pyrosetta

Hi,

I am using pyrosetta4 and line 86 in folding/utils.py :
spline = rosetta.core.scoring.func.SplineFunc("", 1.0, 0.0, step, x,y)
throws the following error:

TypeError: __init__(): incompatible constructor arguments. The following argument types are supported:
    1. pyrosetta.rosetta.core.scoring.func.SplineFunc()
    2. pyrosetta.rosetta.core.scoring.func.SplineFunc(arg0: pyrosetta.rosetta.core.scoring.func.SplineFunc)
    3. pyrosetta.rosetta.core.scoring.func.SplineFunc(arg0: pyrosetta.rosetta.core.scoring.func.SplineFunc)

Is there a PyRosetta4 version issue here?

"buffer overflow detected" when using psipred.

here is the error:

./input_prep/make_ss.sh: line 18: 11775 Aborted (core dumped) psipred $ID.mtx $DATADIR/weights.dat $DATADIR/weights.dat2 $DATADIR/weights.dat3 > $ID.ss

I am running the fasta file in example/input.fa, I think this might be the gcc version error. Can you help me? and which version of gcc do you use?

question: model output

it looks the model only predicts backbone: N, C, C-alpha; no C-beta and O; no sidechain?

What happens if no complex is feasible?

I'm not sure if there's a way to tell from the output of the prediction if 2 or more proteins can't "merge".
This would be a nice feature to use for online predictions when testing combinations of proteins

Error in Uniref

Hi all. Excited to try this library. Right now I'm getting the following error:

  • 21:20:37.467 ERROR: In hh-suite/src/hhdatabase.cpp:446: getTemplateHMM:

  • 21:20:37.467 ERROR: Unrecognized HMM file format in '2703739471'.

  • 21:20:37.467 ERROR: Context:
    'EPEEEYMLAKRWVDHEDTEAAHRLVTSHLRLAAKIAMGYRGYGLPQAEVISEANVGLMQAVKRFDPEKGFRLATYAMWWIRASIQEYILRSWSLVKMGTTSAQKKLFFNLRKAKSKLGALEEgDlrpeNVKKIAHDLSVTEAEVIEMNRRLAGSDASLNAQLGgSEGEGGsEWM--EWLEDEDADQAGDYAERDEMDSRRALLAQALDVLNERERDILTERKLRDEPVTLEDLSTRYGVSRERIRQIEVRAFEKIQKRMKALARERGLLPAA--------------------------------------------------------------

  • 21:20:37.467 ERROR: >UniRef100_A0A2N3CLL8 RNA polymerase factor sigma-32 n=1 Tax=Alphaproteobacteria bacterium HGW-Alphaproteobacteria-2 TaxID=2013665 RepID=A0A2N3CLL8_9PROT

  • 21:20:37.467 ERROR: ---------------------------------------------------------------------------------------------------------------------MTKHLD--PERAFYRHAMAQELLDAETEADLARAWRDRRDEAALHRLITAYGRLALSIAQRYRRYQLPLEDIVQQAHLGLMRAADKFDPERGVRFSTYSAWWIKAAIQDYVMRNWSIVRGGATAAQKSLFFNLRRIHAEVERraqarGAVmTgeeIAEEIAGTLGVPLEQVRGMLGRVAGADLSLNATQRTEDGSREWQ--DLLEDDAPQAEEIVIEAAHRRRVTGALQAALRDLPARERHIVIERRLREEPRTLTDLGIELGVSKERVRQLEERALGRLRTAMAGLAEAGA-------------------------------------------------------------------'
    Does anyone have any idea what I'm doing wrong?
    Thank you.

AMD Driver ROCm

Could you provided the code for amd drivers compatible with ROCm?

Unable to download the weights dataset

Hi,
I am trying to download the weights dataset using the command :
wget https://files.ipd.uw.edu/pub/RoseTTAFold/weights.tar.gz as suggested in readme.

but I get the following error:
--2021-08-05 21:50:01-- https://files.ipd.uw.edu/pub/RoseTTAFold/weights.tar.gz
Resolving files.ipd.uw.edu (files.ipd.uw.edu)... 2607:4000:406::160:134, 2607:4000:406::160:135, 128.95.160.135, ...
Connecting to files.ipd.uw.edu (files.ipd.uw.edu)|2607:4000:406::160:134|:443... connected.
OpenSSL: error:1407742E:SSL routines:SSL23_GET_SERVER_HELLO:tlsv1 alert protocol version
Unable to establish SSL connection.

realpath: missing operand

Hello!
I run the example use the command ../run_pyrosetta_ver.sh input.fa
but error: realpath: missing operand Try 'realpath --help' for more information.
how to do?

BFD and UniRef30_2020_06 needed for viral protein study?

Hi RoseTTAFold team,

Much appreciated for the great work and provides the insight for protein science!
If I am going to study viral protein complex rather than bacterium enzyme, should I still use/download the BFD?
In addition, could I use NCBI blast database instead of using Unipro? If so, should I download “UniRef30_2020_06”?

Thanks!
David

CUDA error help

I encountered this error with the e2e script, with my protein of 255AA in length. The error happens at the last step when building the 3D models. Any idea why this is happening and what can I do to fix it? I am using a GTX1660, a better setup is not possible in the near future.

Thanks,
Wenzhe

RuntimeError: CUDA out of memory. Tried to allocate 456.00 MiB (GPU 0; 5.80 GiB
total capacity; 3.24 GiB already allocated; 420.88 MiB free; 3.66 GiB reserved i
n total by PyTorch)

Proper release and build instructions

Hi,

Any plan to address these would be appreciated:

Make proper release versions

  1. Tag and release on Github with Semantic versioning: https://semver.org
  2. Provide instructions to install the code without the use of conda (thus a standard python virtual environment).
    2.1) Document software and hardware requirements

This would help install and distribute software, generally, and especially on HPC systems.

Unfinished folding error

Hi RoseTTAFold,

I met a problem after installing the software in my University's server and start running a single model job.
I see four print info as below:
Running HHblits
Running PSIPRED
Running hhsearch
Predicting distance and orientations

Then I thought it was finished, but I cannot find any pdb result. Then I open your log folder try to figure out what happened. I see something strange in the network.stderr, as below:
RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
File "/mnt/home/lipai/anaconda3/envs/RoseTTAFold/lib/python3.8/site-packages/torch_geometric/utils/softmax.py", line 41, in softmax
elif index is not None:
N = maybe_num_nodes(index, num_nodes)
src_max = scatter(src, index, dim, dim_size=N, reduce='max')
~~~~~~~ <--- HERE
src_max = src_max.index_select(dim, index)
out = (src - src_max).exp()
File "/mnt/home/lipai/anaconda3/envs/RoseTTAFold/lib/python3.8/site-packages/torch_scatter/scatter.py", line 161, in scatter
return scatter_min(src, index, dim, out, dim_size)[0]
elif reduce == 'max':
return scatter_max(src, index, dim, out, dim_size)[0]
~~~~~~~~~~~ <--- HERE
else:
raise ValueError
File "/mnt/home/lipai/anaconda3/envs/RoseTTAFold/lib/python3.8/site-packages/torch_scatter/scatter.py", line 73, in scatter_max
out: Optional[torch.Tensor] = None,
dim_size: Optional[int] = None) -> Tuple[torch.Tensor, torch.Tensor]:
return torch.ops.torch_scatter.scatter_max(src, index, dim, out, dim_size)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
RuntimeError: CUDA error: a PTX JIT compilation failed
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

According to the last recommendation. I first ran CUDA_LAUNCH_BLOCKING=1 in the shell input and then run the code, but the same error happens.

Here is what I did to get the failure:

  1. I open a session in my university's server, and I load my conda
  2. I activate my conda base environment, with all your packages previously installed
  3. I load CUDA 11.1.1 module. And I try to use nvcc --version to check if my CUDA is loaded. It does say a CUDA version.
  4. I run run_pyrosetta_ver.sh
  5. I also tried run_e2e_ver.sh, which as has the similar result. It seems like you cannot use the CUDA?

I also give you a link of all my half-way failure result, as below:
https://github.com/phylars/shared_files_for_bugs/raw/main/RosettaFold_error/to_be_folded.zip

Could you please help with it?

Results on CASP14 targets

I am running some experiment with RoseTTAFold on CASP14 targets, but are uncertain about a few detailed settings:

  1. Input FASTA sequences. Do you use per-chain FASTA sequences as provided on the CASP14 website (w/o official domain definitions), or do you use per-domain FASTA sequences (cropped from per-chain sequences using official domain definitions)? If the latter one is the case, how do you deal with discontinuous domains, e.g. T1027-D1?
  2. Sequence/template databases. For experiments on CASP14 targets, is it correct to use following databases (to prevent data leakage)?
  1. Do we need to modify run_e2e_ver.sh and/or run_pyrosetta_ver.sh (or any other scripts), so as to reproduce results as reported in the paper?

Could you clarify above questions? Many thanks in advance.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.