Comments (15)
Upon investigation, the error is genuine but the cause is different.
Obsolete PDB ids were not getting replaced by newer ones, but not just for new additions to obsolete.dat, but for older entries as well. That is, for all cases.
FIXED:
Added small missing code to fix the bug.
NOTE: Recommend people pull again as this will affect a lot of proteins.
from openfold.
I used the latest version of the code, but I still got the same error:
File "/mnt/smile1/protein_proj/codes/github/openfold_ori/openfold/openfold/data/data_modules.py", line 158, in __getitem__
path + ".cif", file_id, chain_id, alignment_dir
File "/mnt/smile1/protein_proj/codes/github/openfold_ori/openfold/openfold/data/data_modules.py", line 138, in _parse_mmcif
chain_id=chain_id,
File "/mnt/smile1/protein_proj/codes/github/openfold_ori/openfold/openfold/data/data_pipeline.py", line 463, in process_mmcif
query_release_date=to_date(mmcif.header["release_date"])
File "/mnt/smile1/protein_proj/codes/github/openfold_ori/openfold/openfold/data/data_pipeline.py", line 55, in make_template_features
hits=hits_cat,
File "/mnt/smile1/protein_proj/codes/github/openfold_ori/openfold/openfold/data/templates.py", line 1059, in get_templates
kalign_binary_path=self._kalign_binary_path,
File "/mnt/smile1/protein_proj/codes/github/openfold_ori/openfold/openfold/data/templates.py", line 827, in _process_single_hit
with open(cif_path, "r") as cif_file:
FileNotFoundError: [Errno 2] No such file or directory: '/mnt/smile1/protein_proj/codes/github/openfold/data/pdb_mmcif/mmcif_files/6ek0.cif'
Epoch 0: 11%|█ | 62/575 [14:25<1:59:18, 13.95s/it, loss=7.65, v_num=0]
from openfold.
Hi yuzhiguo07,
Sorry, that you are facing this issue.
Can you please share some details about how you reproduced this? Namely, what was the specific protein for which the template generation failed?
Also, what version of the pdb_mmcif/obsolete.dat
are you using? Specifically, what date did you download it on?
from openfold.
I just attached the pdb_mmcif/obsolete.dat
, I downloaded it on Nov 16, 2021.
obsolete.dat.tar.gz
I'm still working on printing the pdbid on each iteration before the bug, or could u give me some tips of where should I print? (like, which python file and which function). Since the bug occurs in the middle of training, it may take some time to print it out.
Thank you so much for your work and effort!
from openfold.
I followed the deepmind MSA generation pipeline and it will take a super long time. So I just used a few data to try to train the openfold.
from openfold.
The failed protein is 6u4z_A. @sachinkadyan7
from openfold.
Thanks for letting us know.
It seems that because of some reason the obsolete protein id '6ek0' was not replaced by the newer protein id '6qzp' (as seen from obsolete.dat)
Is '6u4z_A' the protein for which you were trying to run the MSAs and templates?
from openfold.
yes 6u4z_A
is the target protein
from openfold.
Couple of questions to help figure out this issue:
- Are you passing the
obsolete.dat
path in the script call? - Did you generate the
release_dates
file? (release_dates
can be generated by running `scripts/generate_mmcif_cache.py') - Do you have the
release_dates
file in the correct path?
The only possible way that the above issue can occur is if there is no release_dates
file or obsolete.dat
file.
from openfold.
Sorry for the late reply.
I did generate the release_dates
file: mmcif_cache.json
and I put it in the correct path.
I'm not sure if I was passing the obsolete.dat
path in the script call, the path is /mnt/smile1/protein_proj/dataset/open_fold/try_mmcif_files/obsolete.dat
(following the default data path),
and my running command is:
python3 train_openfold.py /mnt/smile1/protein_proj/dataset/open_fold/try_mmcif_files/ /mnt/smile1/protein_proj/dataset/open_fold/try_alignments/ /mnt/smile1/protein_proj/codes/github/openfold/data/pdb_mmcif/mmcif_files /mnt/smile1/protein_proj/models/openfold/try 2021-10-10 --template_release_dates_cache_path mmcif_cache.json --precision 16 --gpus 1 --replace_sampler_ddp=True --seed 42 --deepspeed_config_path deepspeed_config.json
from openfold.
hello, I meet the same error.
FileNotFoundError: [Errno 2] No such file or directory: '/hdd/nas_157/dataset/pdbmmcif/mmcif_files/3wxw.cif'
3wxw.cif can not find. I download pdb_mmcif using the script in scripts/download_pdb_mmcif.sh.
and my running command is:
python -u train_openfold.py /hdd/nas_157/dataset/fold2/mmcif/ /hdd/nas_157/dataset/fold2/features/ /hdd/nas_157/dataset/pdbmmcif/mmcif_files/ output/ 2021-10-10 --template_release_dates_cache_path mmcif_cache.json --seed 42 --deepspeed_config_path deepspeed_config.json --gpus 1 --replace_sampler_ddp=True --precision 16
from openfold.
@hellofinch
It seems that your command does not contain the path of the obsolete.dat
file to the script. Upon inspection, I figured out that this is because (specifically) the training code does not have the commandline parameter to grab the obsolete.dat file path. faulty code
I also analyzed the code that actually parses the file and uses it to replace obsolete entries. There does not seem to be any way that the issue is happening in that part of the code. If the release_dates and obsolete_pdbs files are present, the obsolete hits should be replaced by their newer versions.
To verify, can you try running only the inference code through run_pretrained_openfold.py
on the specific protein for which the training code failed? Make sure to add the obsolete_pdbs_path
and the release_dates_path
to the command.
from openfold.
@sachinkadyan7
I run the inference code through run_pretrained_openfold.py
on the protein 3wxw which is multimers. The inference code works fine getting the PDB files. But nothing outputs in the console.
My command is python run_pretrained_openfold.py ./3wxw.fasta /hdd/dataset/protein/uniref90/uniref90.fasta /hdd/dataset/protein/mgnify/mgy_clusters_2018_12.fa /hdd/dataset/protein/pdb70/pdb70 /hdd/nas_157/dataset/pdbmmcif/mmcif_files/ /hdd/dataset/protein/uniclust30/uniclust30_2018_08/uniclust30_2018_08 --output_dir ./output --bfd_database_path /hdd/dataset/protein/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt --model_device cuda:1 --jackhmmer_binary_path /usr/bin/jackhmmer --hhblits_binary_path /usr/bin/hhblits --hhsearch_binary_path /usr/bin/hhsearch --kalign_binary_path /usr/bin/kalign --obsolete_pdbs_path /hdd/nas_157/dataset/pdbmmcif/obsolete.dat --release_dates_path ./mmcif_cache.json
. It already has obsolete_pdbs_path
and release_dates_path
.
from openfold.
@hellofinch
Do you see any the alignment files and the predicted structure in your output directory?
If the files are there, it means that there was no error during the execution and obsolete IDs were replaced.
from openfold.
@sachinkadyan7
I check the alignment files and they are there. Does it mean that I get the right dataset?
and I also try to add obsolete_pdbs_path
to my training command like python -u train_openfold.py /hdd/nas_157/dataset/fold2/mmcif/ /hdd/nas_157/dataset/fold2/features/ /hdd/nas_157/dataset/pdbmmcif/mmcif_files/ output/ 2021-10-10 --template_release_dates_cache_path mmcif_cache.json --seed 42 --deepspeed_config_path deepspeed_config.json --gpus 1 --replace_sampler_ddp=True --precision 16 --obsolete_pdbs_path /hdd/nas_157/dataset/pdbmmcif/obsolete.dat
.
But it doesn't work. An error comes out. train_openfold.py: error: unrecognized arguments: --obsolete_pdbs_path /hdd/nas_157/dataset/pdbmmcif/obsolete.dat
It seems that I should not add this option?
from openfold.
Related Issues (20)
- Inference error using precomputed alignments
- Pickling error in Docker? HOT 1
- Is the pdb_mmcif.zip from OpenProteinSet on AWS suitable for RODA training data? HOT 1
- Finetune from AF_Multimer parameters HOT 5
- Unable to create conda environment HOT 5
- I notice there is only instructions for a Linux Install, can I run this on my Windows laptop? HOT 1
- Frequently failed in training. HOT 2
- Rigid.from_3_points comment HOT 1
- Colab broken by version skew HOT 1
- Docker build broken HOT 4
- Question: Can the geometry module and rigid_utils be converted to each other?
- Alignment error during inference HOT 1
- ModuleNotFoundError: No module named 'attn_core_inplace_cuda' HOT 3
- Enable Dropout in inference HOT 1
- Questions about the meaning of folder naming conventions in OpenProteinSet HOT 1
- ModuleNotFoundError: No module named 'attn_core_inplace_cuda' HOT 1
- RuntimeError: Error building extension 'evoformer_attn' HOT 1
- Multimer predicting a homomer HOT 2
- Docker container AttributeError: Did you mean: 'linear_a_p'? when trying to run multimer inference
- Unable to install OpenFold in Google Colaboratory
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from openfold.