Comments (4)
Thanks everyone. @jadolfbr, based on the code you referenced, it looks like the logic is as follows for indexing:
- Find the smallest and largest residue indices in the PDB file, extracting from the coordinates section.
- Scan over the range of integers between the smallest and largest integers, recording masks/pads for missing residues.
Based on that logic, in my example with "NXXXXXN", the first "N" would be index "1" and the second index "7" assuming that there were 5 indices between them. If I were to add missing residues at the beginning or end, I would get the same indexing scheme as the first case as those residues will not show up in the coordinates section of the PDB file.
@tony-res, I believe indexing should start at 1 for ProteinMPNN. If you look at example 4 you can see that noted in a comment. Also you can see when masking fixed positions that "1" is subtracted from the input array.
I think it's important to note that the index in the code @jadolfbr copied is not the same as the index for freezing positions. The index in the copied code is direct from the PDB file, while the indices in the example I linked are 1-indexed to the relative position of a residue in the file.
from proteinmpnn.
from proteinmpnn.
From what I've seen the second "N" would be at position 2. What I do is I just replace the "X" with "" and then count.
sequence_no_gaps = sequence_with_gaps.replace("X","")
That said, I forget if the index starts at 0 or at 1. I think it starts at 0. So
--position_list "0 1"
rather than
--position_list "1 2"
As a sanity check I always use a python script to go through the output .fa
file and calculate the changes. A good Python library for that is from collections import Counter
.
For example,
>>> from collections import Counter
>>> word = "mississippi"
>>> counter = {}
>>> for letter in word:
counter[letter] = counter.get(letter, 0) + 1
>>> counter
{'m': 1, 'i': 4, 's': 4, 'p': 2}
Some cool code to handle everything: https://huggingface.co/spaces/simonduerr/ProteinMPNN/blob/main/app.py#L382
from proteinmpnn.
protein_mpnn_utils.py, line 125; I commented out a few of the places where 'gaps' are being added for sequence numbering.
Becomes this:
try:
for resn in range(min_resn,max_resn+1):
if resn in seq:
for k in sorted(seq[resn]): seq_.append(aa_3_N.get(seq[resn][k],20))
#else: seq_.append(20)
if resn in xyz:
for k in sorted(xyz[resn]):
for atom in atoms:
if atom in xyz[resn][k]: xyz_.append(xyz[resn][k][atom])
else: xyz_.append(np.full(3,np.nan))
#else:
# for atom in atoms: xyz_.append(np.full(3,np.nan))
print("Seq_", seq_)
from proteinmpnn.
Related Issues (20)
- Questions about model weights
- .Fa Output reorganization question
- Sampling temperature for flexible chains
- what pdbx package does parse_cif_noX.py expect? HOT 1
- Global_score
- No use of GPU?
- `parse_cif_noX.py` misses some chains in CATH? HOT 2
- Training model
- Retrieve per-position scores or score a chain in the context of another
- whether to redesign low confidence aas
- Design complexes with unknown chains (proposed fix included)
- Amino acid sequence has too many "K/E" HOT 2
- How do I use a PSSM with proteinMPNN?
- Need of assistance and advising
- Model is adding an amino acid to the original sequence HOT 5
- Creates hydrophobic surface patches wit many Ala side chains HOT 2
- Training time HOT 1
- What is the difference between --conditional_probs_only_backbone and --unconditional_probs_only HOT 1
- Empty parsed_pdbs.jsonl file from parse_multiple_chains.py helper script? HOT 1
- RuntimeError: Class values must be smaller than num_classes. | protein_mpnn_utils.py & mask_size issue?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from proteinmpnn.