sokrypton / colabfold Goto Github PK
View Code? Open in Web Editor NEWMaking Protein folding accessible to all!
License: MIT License
Making Protein folding accessible to all!
License: MIT License
As has been pointed out, AlphaFold will generally only give one conformation of a protein or complex. Is this simply because it tries to maximize contacts of coevolving residue pairs? In cases where we have prior structural knowledge, it might be helpful to have the option to suppress a predicted contact, for instance if we would like to visualize a ligand-activated complex and we know that a contact is present only in the unactivated/ligand-free state. I may see if simply replacing residues of this sort with U gives the desired results. Of course it may be easier to simply use templates in this case. Are you considering any better ways to increase the number of predicted conformations?
Amber-relax fails on some structures. For example:
/usr/local/lib/python3.7/dist-packages/jax/interpreters/xla.py in _execute_compiled_primitive(prim, compiled, result_handler, *args)
388 device, = compiled.local_devices()
389 input_bufs = list(it.chain.from_iterable(device_put(x, device) for x in args if x is not token))
--> 390 out_bufs = compiled.execute(input_bufs)
391 check_special(prim.name, out_bufs)
392 return result_handler(*out_bufs)
RuntimeError: Internal: Failed to load in-memory CUBIN: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
Thanks for the nice scripts.
Do you have any idea how to implement the "inferencing-many-proteins" function mentioned here:
https://github.com/deepmind/alphafold#inferencing-many-proteins
In my test case of folding a 489-AA protein with 225 MSAs, the compilation takes about the same time of the prediction step.
It would be great to have a AlphaFold2_manyMSA_noTemplates_noMD script for making predictions on a large number of pre-computed MSAs.
Hi!
I would like to test colabfold on multi-chain protein structures. I found that only AlphaFold2_advanced can solve this problem. But advanced notebook doesn't support templates. Is there an easy way to support templates in advanced notebook?
I am trying the new pairing feature for the MSAs and get the following error:
found 0 pairs
47155 Sequences Found in Total
merging/filtering MSA using mmseqs2
7082 Sequences Found in Total (after filtering)
/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:239: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:240: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-3-ef6f7c7fd665> in <module>()
239 gap_ = msa_ != "-"
240 qid_ = msa_ == np.array(list(sequence))
--> 241 gapid = np.stack([gap_[:,Ln[i]:Ln[i+1]].max(-1) for i in range(len(seqs))],-1)
242 seqid = np.stack([qid_[:,Ln[i]:Ln[i+1]].mean(-1) for i in range(len(seqs))],-1).sum(-1) / gapid.sum(-1)
243 non_gaps = gap_.astype(np.float)
<ipython-input-3-ef6f7c7fd665> in <listcomp>(.0)
239 gap_ = msa_ != "-"
240 qid_ = msa_ == np.array(list(sequence))
--> 241 gapid = np.stack([gap_[:,Ln[i]:Ln[i+1]].max(-1) for i in range(len(seqs))],-1)
242 seqid = np.stack([qid_[:,Ln[i]:Ln[i+1]].mean(-1) for i in range(len(seqs))],-1).sum(-1) / gapid.sum(-1)
243 non_gaps = gap_.astype(np.float)
TypeError: 'bool' object is not subscriptable
Big thanks for the amazing work by the way :)
While running UniProt ID P02744 (sequence pasted below) in the AlphaFold2 w/ MMseqs2 notebook with templates and amber selected, I get an error from amber saying at least one residue n the protein has no atoms and it can't relax it.
Not sure why this is occurring, i don't HAVE to use the amber minimization (although it would be nice), but I'm worried about what is causing this empty residue in the first place.
ValueError Traceback (most recent call last)
in ()
49 Ls=[len(query_sequence)]*homooligomer,
50 model_params=model_params, use_model=use_model,
---> 51 do_relax=use_amber)
3 frames
/content/alphafold/relax/amber_minimize.py in _check_residues_are_well_defined(prot)
139 """Checks that all residues contain non-empty atom sets."""
140 if (prot.atom_mask.sum(axis=-1) == 0).any():
--> 141 raise ValueError("Amber minimization can only be performed on proteins with"
142 " well-defined residues. This protein contains at least"
143 " one residue with no atoms.")
ValueError: Amber minimization can only be performed on proteins with well-defined residues. This protein contains at least one residue with no atoms.
P02744 sequence:
LEEGEITSKVKFPPSSSPSFPRLVMVGTLPDLQEITLCYWFKVNQLKGTLGHSRVLBMFSYATAKKDNELLTFLDEQGDFLFNV
Hi, I wonder how to use GPU during inference in jupyter. It seems that it only use CPU. I don't know how to setting it.
Good morning,
I would like to propose you to share another extremely useful example of AF2 usage. Many scientists are using protein embeddings for downstream tasks (i.e. function prediction). AF2 issue described the codebase which gonna access you to the protein embedding vector but many users are not able to handle it by themself.
I hope you will consider my idea, to demonstrate how to load and prepare AF2 minimum setup to execute embedding part of the workflow on Colab or local machine. The most expected example could be AA sequence on the input and fixed-length numerical vector as output (averaged residue vector).
Warm regards,
Piotr
For 2 protein complexes, the current notebook works really nice!
Is there a way to compute 3 protein complexes? I know this would be more complicated for ABC than just AB.
Great thanks!
The following sequence returns no hits when submitted to either af_mmseqs2 or af_advanced:
NVEPLNGQSEVTGMLDKDITLQWQITFLKGEMLQSHDIYLPNRTKIVSNQPPELTPVGKRMYGTRLVPVFDADAAVFKLTLKNVKFTDSSHNFTLVVAFERKDDFNRRTGVADINIVNVE
However, if I truncate it by one aa, I get 70 hits. This is with both af_mmseqs2 or af_advanced
NVEPLNGQSEVTGMLDKDITLQWQITFLKGEMLQSHDIYLPNRTKIVSNQPPELTPVGKRMYGTRLVPVFDADAAVFKLTLKNVKFTDSSHNFTLVVAFERKDDFNRRTGVADINIVNV
I am unable to tell if this is the same bug as reported in issue #49
Hi -
Trying to load AlphaFold2_complexes.ipynb, I get the message
There was an error loading this notebook. Ensure that the file is accessible and try again.
Check dependency list! Synchronous require cannot resolve module 'vs/platform/quickinput/common/quickInput'. This is the first mention of this module!
https://github.com/sokrypton/ColabFold/blob/main/beta/AlphaFold2_advanced.ipynb
Hello,
I am having a memory error while trying to analyze my protein (sequence and error message attached below). I am using Colab Pro with 25Gb memory. The prediction works if I cut my protein in half, but I would like to analyze the full length protein if possible. Can I know if there is a way to get this to work?
Thanks!
Danny
Protein sequence:
EESAAPQVHLSILATTDIHANMMDYDYYSDKETADFGLARTAQLIQKHREQNPNTLLVDNGDLIQGNPLGEYAVKYQKDDIISGTKTHPIISVMNALKYDAGTLGNHEFNYGLDFLDGTIKGADFPIVNANVKTTSGENRYTPYVINEKTLIDENGNEQKVKVGYIGFVPPQIMTWDKKNLEGQVQVQDIVESANETIPKMKAEGADVIIALAHTGIEKQAQSSGAENAVFDLATKTKGIDAIISGHQHGLFPSAEYAGVAQFNVEKGTINGIPVVMPSSWGKYLGVIDLKLEKADGSWKVADSKGSIESIAGNVTSRNETVTNTIQQTHQNTLEYVRKPVGKTEADINSFFAQVKDDPSIQIVTDAQKWYAEKEMKDTEYKNLPILSAGAPFKAGGRNGANYYTNIPAGDLAIKNVGDLYLYDNTVQIVKLTGSEVKDWLEMSAGQFNQIDPAKGGDQALLNENFRSYNFDVIDGVTYQVDVTKPAKYNENGKVINADSSRIINLSYEGKPISPSQEFLVVTNNYRASGGGGFPHLTSDKIVHGSAVENRQVLMDYIIEQKTVNPKADNNWSIAPVSGTNLTFESSLLAKPFADKADDVAYVGKSANEGYGVYKLQFDDDSNPDPPKDGLWDLTVMHTNDTHAHLDDAARRMTKINEVRSETNHNILLDAGDVFSGDLYFTKWNGLADLKMMNMMGYDAMTFGNHEFDKGPTVLSDFLSGNSATVDPANRYHFEAPEFPIVSANVDVSNEPKLKSFVKKPQTFTAGEKKEAGIHPYILLDVDGEKVAVFGLTTEDTATTSSPGKSIVFNDAFETAQNTVKAIQEEEKVNKIIALTHIGHNRDLELAKKVKGIDLIIGGHTHTLVDKMEVVNNEEPTIVAQAKEYGQFLGRVDVAFDEKGVVQTDKSNLSVLPIDEHTEENPEAKQELDQFKNELEDVKNEKVGYTDVALDGQREHVRTKETNLGNFIADGMLAKAKEAAGARIAITNGGGIRAGIDKGDITLGEVLNVMPFGNTLYVADLTGKQIKEALEQGLSNVENGGGAFPQVAGIEYTFTLNNKPGHRVLEVKIESPNGDKVAINTDDTYRVATNNFVGAGGDGYSVFTEASHGEDLGYVDYEIFTEQLKKLGNKVSPKVEGRIKEVFLPTKQKDGSWTLDEDKFAIYAKNANTPFVYYGIHEGSQEKPINLKVKKDQVKLLKERESDPSLTMFNYWYSMKMPMANLKTADTAIGIKSTGELDVSLSDVYDFTVKQKGKEIKSFKEPVQLSLRMFDIEEAHNPAIYHVDRKKKAFTKTGHGSVDDDMVTGYTNHFSEYTILNSGSNNKPPAFPSDQPTGGDDGNHGGGSDKPGGKQPTDGNGGNDTPPGTQPTNGSGGNGSGGSGTDGPAGGLLPDT
Error message:
UnfilteredStackTrace Traceback (most recent call last)
in ()
50 model_params=model_params, use_model=use_model,
---> 51 do_relax=use_amber)
13 frames
UnfilteredStackTrace: RuntimeError: Resource exhausted: Out of memory while trying to allocate 14268435552 bytes.
The stack trace below excludes JAX-internal frames.
The preceding is the original exception that occurred, unmodified.
The above exception was the direct cause of the following exception:
RuntimeError Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/jax/interpreters/xla.py in _execute_compiled(compiled, avals, handlers, kept_var_idx, *args)
910 for i, x in enumerate(args)
911 if x is not token and i in kept_var_idx))
--> 912 out_bufs = compiled.execute(input_bufs)
913 check_special(xla_call_p.name, out_bufs)
914 return [handler(*bs) for handler, bs in zip(handlers, _partition_outputs(avals, out_bufs))]
RuntimeError: Resource exhausted: Out of memory while trying to allocate 14268435552 bytes.
This line was throwing an error in the Download cell - when I commented it out the cell ran successfully:
text_file.write(f"num_relax={num_relax}\n")
NameError: name 'num_relax' is not defined
Hello,
I would like to run locally the msa building step of the colab notebook and use the exact same set of databases to do some comparison with other databases.
Is it possible to get access to the set of databases the mmseq2 server is using as well as the version of mmseqs2 and the specific command lines executed on the server?
In the slides you presented (awesome presentation!), you mentioned you are using a 30%id clustered DB built from SMAG, MGNIFY, BFD, and MetaEuk. Do you provide somewhere a downloadable version of the master 30%seq_id db?
Thanks a lot!
Hi I'm trying to get oligomeric structure of a protein. I'm able to get the monomer through AlphaFold Colab but when i try to use the oligomeric feature it is crashing.
Error # Your session crashed after using all available RAM. If you are interested in access to high-RAM runtimes, you may want to check out Colab Pro.
NameError Traceback (most recent call last)
in ()
5 use_model = {}
6 if "model_params" not in dir(): model_params = {}
----> 7 for model_name in ["model_1","model_2","model_3","model_4","model_5"][:num_models]:
8 use_model[model_name] = True
9 if model_name not in model_params:
NameError: name 'num_models' is not defined
When i try to use local runtime, getting another error...
ModuleNotFoundError is coming continuously...
Please help, if you have any solution for this.
Thanks
Pankaj
When I select msa_mod->custom I do not see an "upload box" at the end of the "Input Protein ..." box.
Hi,
I have some longer sequences I would like to try so I have switched to using a local runtime. Is there an easy way to restrict which GPUs are selected for processing? Currently it is trying to allocate memory on a GPU that is already maxed out by an unrelated process.
Thanks
When inputting a complex in AAA:BBB format, and custom MSA, I keep getting "ERROR: the length of msa does not match input sequence". My MSA has hyphens in it because some homologs have insertions; I have tried including the appropriate hyphens in the input sequence, but it looks like they are being ignored for the length calculation. Is it possible to use an MSA where my target is shorter than some of the homologs?
When I try to run the predefined example sequence (PIAQIHILEGRSDEQKETLIREVSEAISRSLDAPLTSVRVIITEMAKGHFGIGGELASK
) with templates, one model and amber it works in AlphaFold2_mmseqs2, but fails in AlphaFold2_batch with the following error:
ValueError Traceback (most recent call last)
<ipython-input-3-a76dac23e0b1> in <module>()
391 Ls=[len(query_sequence)], crop_len=crop_len,
392 model_params=model_params, use_model=use_model,
--> 393 do_relax=use_amber)
394
395 # gather MSA info
<ipython-input-3-a76dac23e0b1> in predict_structure(prefix, feature_dict, Ls, crop_len, model_params, use_model, do_relax, random_seed)
276 stiffness=10.0,exclude_residues=[],
277 max_outer_iterations=20)
--> 278 relaxed_pdb_str, _, _ = amber_relaxer.process(prot=unrelaxed_protein)
279 relaxed_pdb_lines.append(relaxed_pdb_str)
280
/content/alphafold/relax/relax.py in process(self, prot)
62 tolerance=self._tolerance, stiffness=self._stiffness,
63 exclude_residues=self._exclude_residues,
---> 64 max_outer_iterations=self._max_outer_iterations)
65 min_pos = out['pos']
66 start_pos = out['posinit']
/content/alphafold/relax/amber_minimize.py in run_pipeline(prot, stiffness, max_outer_iterations, place_hydrogens_every_iteration, max_iterations, tolerance, restraint_set, max_attempts, checks, exclude_residues)
459 # `protein.to_pdb` will strip any poorly-defined residues so we need to
460 # perform this check before `clean_protein`.
--> 461 _check_residues_are_well_defined(prot)
462 pdb_string = clean_protein(prot, checks=checks)
463
/content/alphafold/relax/amber_minimize.py in _check_residues_are_well_defined(prot)
139 """Checks that all residues contain non-empty atom sets."""
140 if (prot.atom_mask.sum(axis=-1) == 0).any():
--> 141 raise ValueError("Amber minimization can only be performed on proteins with"
142 " well-defined residues. This protein contains at least"
143 " one residue with no atoms.")
ValueError: Amber minimization can only be performed on proteins with well-defined residues. This protein contains at least one residue with no atoms.
When I submitted a long sequence, I met the 'out of memory' error.
Error states that variable L
is undefined.
Hi,
I was wondering if the project could be integrated into a singularity installation of AlphaFold2. If yes, how would one go about achieving it?
Best,
Pranav
I have a cyclic peptide sequence. I put it into Alphafold2 Colab, but I didn't get a cyclic peptide structure. What should I do to connect the C-terminal and N-terminal for the next dynamic simulation (GROMACS)?
Should I do some processing on the structure obtained by Alphafold2 Colab for Gromacs dynamics? Or input sequence to Alphafold2 Colab need for some pre-processing? Or can Alphafold2 not predict the sequence of cyclic peptides?
It's really important for my research, thanks for any help.
Hello,I am using this ipynb file on Colab
https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/AlphaFold2.ipynb
When I did the Gather input features, predict structure step. I found an error.
=============================================
Running model_1
InvalidArgumentError Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/ops.py in _create_c_op(graph, node_def, inputs, control_inputs, op_def)
1879 try:
-> 1880 c_op = pywrap_tf_session.TF_FinishOperation(op_desc)
1881 except errors.InvalidArgumentError as e:
InvalidArgumentError: Cannot reshape a tensor with 2705220 elements to shape [6441,127,1] (818007 elements) for '{{node reshape_msa}} = Reshape[T=DT_INT32, Tshape=DT_INT32](Const_6, reshape_msa/shape)' with input shapes: [6441,420], [3] and with input tensors computed as partial shapes: input[1] = [6441,127,1].
During handling of the above exception, another exception occurred:
ValueError Traceback (most recent call last)
12 frames
/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/ops.py in _create_c_op(graph, node_def, inputs, control_inputs, op_def)
1881 except errors.InvalidArgumentError as e:
1882 # Convert to ValueError for backwards compatibility.
-> 1883 raise ValueError(str(e))
1884
1885 return c_op
ValueError: Cannot reshape a tensor with 2705220 elements to shape [6441,127,1] (818007 elements) for '{{node reshape_msa}} = Reshape[T=DT_INT32, Tshape=DT_INT32](Const_6, reshape_msa/shape)' with input shapes: [6441,420], [3] and with input tensors computed as partial shapes: input[1] = [6441,127,1].
============================================================
My protein sequence is:
SMNPPPPETSNPNKPKRQTNQLQYLLRVVLKTLWKHQFAWPFQQPVDAVKLNLPDYYKIIKTPMDMGTIKKRLENNYYWNAQECIQDFNTMFTNCYIYNKPGDDIVLMAEALEKLFLQKINELPTEE
Summary: The Amber relaxation step fails because the number of atoms in one or more residues is zero (?).
The error message:
ValueError: Amber minimization can only be performed on proteins with well-defined residues. This protein contains at least one residue with no atoms.
[Resolved] There was an X in the sequence.
Hi, I've been running the MMseq2 ColabFold for a specific type of protein sequences, and I always get the same set of ~4000 sequences in the .a3m file. Would it be possible to put these sequences from the .a3m file in a file in one of the folders for the local installation of AlphaFold2, and tricking AlphaFold2 to always look at these when running?
I have been playing with the new AlphaFold2_advanced rollout and have gotten through the whole pipeline for several heterologous protein-protein interactions. Suddenly this afternoon I am receiving this error but I don't know where at in the code the problem could be coming from or why it has suddenly changed. I have been keeping all of my parameters the same and only changing the second amino-acid sequence in the sequence input.
Many thanks for setting this up! I've found it really useful in my research. I'm looking into some FAD-dependent oxidases, but it seems that cofactors like FAD is not included in the predicted model. Is there a way to do this? Or do i have to dock this after generating the apo protein model?
Thanks!
edit: this doesn't appear if use_turbo
is unchecked
in the run alphafold
section of the advanced
notebook, this error appears while the first model is being run:
UnfilteredStackTrace Traceback (most recent call last)
in ()
206 # predict
--> 207 prediction_result, (r, t) = cf.to(model_runner.predict(processed_feature_dict, random_seed=seed),"cpu")
20811 frames
UnfilteredStackTrace: RuntimeError: Internal: CUBLAS_STATUS_EXECUTION_FAILED
The stack trace below excludes JAX-internal frames.
The preceding is the original exception that occurred, unmodified.
The above exception was the direct cause of the following exception:
RuntimeError Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/jax/interpreters/xla.py in _execute_compiled(compiled, avals, handlers, kept_var_idx, *args)
910 for i, x in enumerate(args)
911 if x is not token and i in kept_var_idx))
--> 912 out_bufs = compiled.execute(input_bufs)
913 check_special(xla_call_p.name, out_bufs)
914 return [handler(*bs) for handler, bs in zip(handlers, _partition_outputs(avals, out_bufs))]RuntimeError: Internal: CUBLAS_STATUS_EXECUTION_FAILED
Using the AlphaFold2.0 Google Colab Notebook I am trying to predict the structure of a 700-800 amino acid long protein chain, How long will the program take to run?
Go this error twice today on AlphaFold step with different input sequences - can you help?
Advanced notebook, input was a protein plus peptide (formatted as AAAAAAA:BBBBBB), genetic search succeeded.
Thanks for any help!
IndexError Traceback (most recent call last)
in ()
188
189 prediction_result, (r, t) = cf.to(model_runner.predict(processed_feature_dict, random_seed=seed),"cpu")
--> 190 outs[key] = parse_results(prediction_result, processed_feature_dict)
191
192 # report
3 frames
/usr/local/lib/python3.7/dist-packages/jax/_src/numpy/lax_numpy.py in _expand_bool_indices(idx, shape)
5400 expected_shape = shape[len(out): len(out) + _ndim(i)]
5401 if i_shape != expected_shape:
-> 5402 raise IndexError("boolean index did not match shape of indexed array in index "
5403 f"{dim_number}: got {i_shape}, expected {expected_shape}")
5404 out.extend(np.where(i))
IndexError: boolean index did not match shape of indexed array in index 2: got (63,), expected (64,)
Hello,
I am trying to refine my structure predictions using Amber relax (in the alphafold2_advance book). However, I am getting the following error, both for my own structure as well as for the test sequence/structure:
UnfilteredStackTrace Traceback (most recent call last)
<ipython-input-16-404fe963ee1d> in <module>()
63 max_outer_iterations=20)
---> 64 relaxed_pdb_lines, _, _ = amber_relaxer.process(prot=outs[key]["unrelaxed_protein"])
65 with open(pred_output_path, 'w') as f:
33 frames
UnfilteredStackTrace: TypeError: take requires ndarray or scalar arguments, got <class 'list'> at position 0.
The stack trace below excludes JAX-internal frames.
The preceding is the original exception that occurred, unmodified.
--------------------
The above exception was the direct cause of the following exception:
TypeError Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/jax/_src/numpy/lax_numpy.py in _check_arraylike(fun_name, *args)
557 if not _arraylike(arg))
558 msg = "{} requires ndarray or scalar arguments, got {} at position {}."
--> 559 raise TypeError(msg.format(fun_name, type(arg), pos))
560
561 def _check_no_float0s(fun_name, *args):
TypeError: take requires ndarray or scalar arguments, got <class 'list'> at position 0.
Could you maybe help me figure out what's going on? Thank you very much!
If I choose mmseqs2 (uniref + environmental), it should include SMAG and MetaEuk databases. Could you tell me the exact link of the two databases? Thank you!
The alphafold2_mmseqs2 notebook throws an error if I give it a custom sequence in a3m format where the sequence is broken over several lines. It works if I manually edit the alignment so each sequence is one line.
(This doesn't happen on the alphafold2_advanced notebook, as far as I can tell)
I can reproduce this behavior with the following alignments:
Works: alignment_single-line-seqs.a3m.txt
Throws error: alignment_multi-line-seqs.a3m.txt
Error text:
running model_1
---------------------------------------------------------------------------
InvalidArgumentError Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/ops.py in _create_c_op(graph, node_def, inputs, control_inputs, op_def)
1879 try:
-> 1880 c_op = pywrap_tf_session.TF_FinishOperation(op_desc)
1881 except errors.InvalidArgumentError as e:
InvalidArgumentError: Cannot reshape a tensor with 1710 elements to shape [15,14,1] (210 elements) for '{{node reshape_msa}} = Reshape[T=DT_INT32, Tshape=DT_INT32](Const_6, reshape_msa/shape)' with input shapes: [15,114], [3] and with input tensors computed as partial shapes: input[1] = [15,14,1].
During handling of the above exception, another exception occurred:
ValueError Traceback (most recent call last)
12 frames
/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/ops.py in _create_c_op(graph, node_def, inputs, control_inputs, op_def)
1881 except errors.InvalidArgumentError as e:
1882 # Convert to ValueError for backwards compatibility.
-> 1883 raise ValueError(str(e))
1884
1885 return c_op
ValueError: Cannot reshape a tensor with 1710 elements to shape [15,14,1] (210 elements) for '{{node reshape_msa}} = Reshape[T=DT_INT32, Tshape=DT_INT32](Const_6, reshape_msa/shape)' with input shapes: [15,114], [3] and with input tensors computed as partial shapes: input[1] = [15,14,1].
The Google Notebook program terminates at this step and gives an error message "File Not Found Error". Any help on this issue would be great,
UnfilteredStackTrace Traceback (most recent call last)
in ()
50 model_params=model_params, use_model=use_model,
---> 51 do_relax=use_amber)
35 frames
UnfilteredStackTrace: TypeError: take requires ndarray or scalar arguments, got <class 'list'> at position 0.
The stack trace below excludes JAX-internal frames.
The preceding is the original exception that occurred, unmodified.
The above exception was the direct cause of the following exception:
TypeError Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/jax/_src/numpy/lax_numpy.py in _check_arraylike(fun_name, *args)
557 if not _arraylike(arg))
558 msg = "{} requires ndarray or scalar arguments, got {} at position {}."
--> 559 raise TypeError(msg.format(fun_name, type(arg), pos))
560
561 def _check_no_float0s(fun_name, *args):
TypeError: take requires ndarray or scalar arguments, got <class 'list'> at position 0.
Great tool and interface, many thanks to everyone involved!
One small suggestion to help folks interpret and judge model quality quickly from browser:
Would it be possible to add a color bar to the py3Dmol structure preview for the IDDT colors? It would go a long ways towards judging model quality, especially at regions of particular interest, without requiring users to have PyMol expertise or to try and match residue numbers between the IDDT plot and the structure.
Thanks again!
Hello!
I was getting an out of memory issue. I get the error in step 5 of "Gather input features, predict structure". I was originally running a sequence of ~480 amino acids for 3 homooligomers. Thinking it was a sequence length issue, I then truncated the sequence to have ~360 amino acids for 1 homooligomer. However, I am still getting this issue. I have "Factory reset runtime" to see if that would help, but still the same error.
Could you please tell how to run the notebooks over a fasta file ? I wish to loop through the fasta file and generate .pdb files
Starting today, after the progress bar fills completely for this step, we're hanging here (output after interrupting the kernel)
KeyboardInterrupt
---------------------------------------------------------------------------
KeyboardInterrupt Traceback (most recent call last)
/tmp/ipykernel_57/1853154669.py in <module>
53 prefix = os.path.join('tmp',prefix)
54 print(f"running mmseqs2")
---> 55 A3M_LINES = cf.run_mmseqs2(seqs, prefix, filter=True)
56
57 # filter sequences to 10K\n",
/app/alphafold/colabfold.py in run_mmseqs2(x, prefix, use_env, filter)
111 while out["status"] in ["UNKNOWN","RUNNING","PENDING"]:
112 t = 5 + random.randint(0,5)
--> 113 time.sleep(t)
114 out = status(ID)
115 pbar.set_description(out["status"])
According to the README.md, the memory goes as follows:
Maximum length limits depends on free GPU provided by Google-Colab fingers-crossed
For GPU: Tesla T4 or Tesla P100 with ~16G the max length is ~1400
For GPU: Tesla K80 with ~12G the max length is ~1000
To check what GPU you got, open a new code cell and type !nvidia-smi
I am interested in structures of around either (a) one single chain of 240-280aa or around (b) 2 different chains of ~120 + ~140aa. What would be the minimal GPU that would allow us to run this locally?
I am thinking that given our own custom MSAs, it wouldn't need to connect to MMSeqs2 or download the 2Tb of sequence data, thus going straight into running the prediction based on the MSA of internal data on the docker container?
Or am I missing something obvious that would still require Colab or something else remote?
Hi,
I submitted hExoI with variant residues found in one of the samples.
It gets terminated with no error message at Gather input features, predict structure step.
I also get name errors even after factory reset and re-run.
NameError Traceback (most recent call last)
<ipython-input-2-1d8cadd9b758> in <module>()
1 #@title Gather input features, predict structure
2 # parse TEMPLATES
----> 3 if use_templates: template_features = mk_template(jobname)
4 else: template_features = mk_mock_template(query_sequence)
5
NameError: name 'use_templates' is not defined
Can you please let me know how to fix it?
Not sure if I'm doing something incorrectly.
A particular sequence doing a batch run using the alphafold2_batch or a single run with the AlphaFold2_mmseqs2 notebook seems to cause the MMSeq2 query step to hang.
My current run has been stuck at 0% on the MSA (MMSeq2 (Uniref+environmental)) step for the past 1h 43 minutes and it's been stuck for longer in the past before I kill the process.
The raw sequence is; MAQVQLVESGGGLVQAGGSLRLSCAVSGRPFSEYNLGWFRQAPGKEREFVARIRSSGTTVYTDSVKGRFSASRDNFLATT
LERIEKNFVITDPRLPDNPIIFASDSFLQLTEYSREEILGRNCRFLQGPETDRATVRKIRDAIDNQTEVTVQLINYTKSG
KKFWNLFHLQPMRDQKGDVQYFIGVQLDGTEHVRDAAEREGVMLIKKTAENIDEAAKELAKNMGYLQLNSLEPEDTAVYY
CAMSRVDTDSPAFYDYWGQGTQVTVSTPRS
Other variations on this sequence have worked flawlessly with these settings, so I'm not sure what is wrong with this sequence.
Running the notebook off a colab instance, not locally.
LaM-8_NA73-O.txt
file extension changed from fasta to txt so that github would allow the upload
Hello,
I would like to build a pipeline to search MSAs using local MMSeqs. However, There are some problems when I fllow the script 'msa.sh' returned from online MMSeqs service. For examle, when I run the command
"${MMSEQS}" expandaln "${BASE}/qdb" "${DBBASE}/${DB1}.idx" "${BASE}/res" "${DBBASE}/${DB1}.idx" "${BASE}/res_exp" --db-load-mode 2 ${EXPAND_PARAM}
Then the following error will return :
Input database "/nfs/database/uniref30_mmseqs/uniref30.idx" has the wrong type (Generic)
Allowed input:
- Alignment
- Prefilter
- Bi-directional prefilter
- Clustering
I just want to figure out whether the MMSeqs used in service is consistent with the latest one from github.
If not, how can we get the version applied to the service?
I am trying to use AlphaFold2_advanced.ipynb Colab with default parameters to predict the structure of the following sequence:
MQFSTVASVAFVALANFVAAESAAAISQITDGQIQATTTATTEATTTAAPSSTVETVSPSSTETISQQTENGAAKAAVGMGAGALAAAAMLL
However, as the model_1_ptm_seed_0 runs, AlphaFold2 fails with error:
UnfilteredStackTrace Traceback (most recent call last)
<ipython-input-4-0d880bbb1ecf> in <module>()
188
--> 189 prediction_result, (r, t) = cf.to(model_runner.predict(processed_feature_dict, random_seed=seed),"cpu")
190 outs[key] = parse_results(prediction_result, processed_feature_dict)
11 frames
UnfilteredStackTrace: RuntimeError: Internal: CUBLAS_STATUS_EXECUTION_FAILED
The stack trace below excludes JAX-internal frames.
The preceding is the original exception that occurred, unmodified.
--------------------
The above exception was the direct cause of the following exception:
RuntimeError Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/jax/interpreters/xla.py in _execute_compiled(compiled, avals, handlers, kept_var_idx, *args)
910 for i, x in enumerate(args)
911 if x is not token and i in kept_var_idx))
--> 912 out_bufs = compiled.execute(input_bufs)
913 check_special(xla_call_p.name, out_bufs)
914 return [handler(*bs) for handler, bs in zip(handlers, _partition_outputs(avals, out_bufs))]
RuntimeError: Internal: CUBLAS_STATUS_EXECUTION_FAILED
Any insight is highly appreciated!
Best regards,
Looks like there is a difference in the mmseq2/complexes notebooks compared to the advanced notebook. The mmseq2/complex notebooks define the following function for creating the hash for a given job:
import hashlib
def add_hash(x,y):
return x+"_"+hashlib.sha1(y.encode()).hexdigest()[:5]
However, in the advanced notebook, this function is absent and there is instead a get_hash method call where the object doesn't seem to be defined? Here is the relevant traceback:
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-1-f214cb658cff> in <module>()
12
13 # prediction directory
---> 14 output_dir = 'prediction_' + cf.get_hash(full_sequence)[:5]
15 os.makedirs(output_dir, exist_ok=True)
16 print(f"working directory: {output_dir}")
NameError: name 'cf' is not defined
Thanks
Hi guys,
First of all, thank you very much for a terrific job. Incredibly useful in facilitating alphaFold.
I encountered a problem with the MSA construction. when I send a request to the API, I'm getting "MMseqs2 server did not return a valid result" and an empty msa file.
I thought I might have abused the API, but even after 24h, I cannot submit a single request.
Thanks in advance,
Hi,
I always suffered a broken when the program started prediction (calling parse_results(prediction_result, processed_feature_dict) ) after searching.
Is there any suggestion and solution for this? Many thanks!
See the error below:
..../python3.7/site-packages/jax/_src/numpy/lax_numpy.py in _expand_bool_indices(idx, shape)
5400 expected_shape = shape[len(out): len(out) + _ndim(i)]
5401 if i_shape != expected_shape:
-> 5402 raise IndexError("boolean index did not match shape of indexed array in index "
5403 f"{dim_number}: got {i_shape}, expected {expected_shape}")
5404 out.extend(np.where(i))
IndexError: boolean index did not match shape of indexed array in index 2: got (63,), expected (64,)
Hi there,
I was trying to model a bacterial protein complex based on the suggestions on top of the notebook, i.e. pair_msa
and disable_mmseqs2_filter
options on, when it ran out of memory at the Gather input features, predict structure
step.
The two input proteins were 901 and 351 amino acids.
I thought it could be a good idea to report the error:
pairs found: 417
running model_1_ptm
---------------------------------------------------------------------------
UnfilteredStackTrace Traceback (most recent call last)
<ipython-input-7-0be664d1f433> in <module>()
57 }
---> 58 plddts, paes = predict_structure(jobname, feature_dict, Ls=Ls)
13 frames
UnfilteredStackTrace: RuntimeError: Resource exhausted: Out of memory while trying to allocate 12341783136 bytes.
The stack trace below excludes JAX-internal frames.
The preceding is the original exception that occurred, unmodified.
--------------------
The above exception was the direct cause of the following exception:
RuntimeError Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/jax/interpreters/xla.py in _execute_compiled(compiled, avals, handlers, kept_var_idx, *args)
891 for i, x in enumerate(args)
892 if x is not token and i in kept_var_idx))
--> 893 out_bufs = compiled.execute(input_bufs)
894 check_special(xla_call_p.name, out_bufs)
895 return [handler(*bs) for handler, bs in zip(handlers, _partition_outputs(avals, out_bufs))]
RuntimeError: Resource exhausted: Out of memory while trying to allocate 12341783136 bytes.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.