Comments (11)
Hi,
Here is the code for the profile_with_prior
feature, I hope that helps:
def profile_with_prior(amino_acid_probs, prior_amino_probs,
s_ij, nc, beta=10.0):
"""Compute priors which takes into account priors and Blosum matrix.
Args:
amino_acid_probs: the observed amino acid probabilities in a column of MSA.
prior_amino_probs: the prior amino acid probabilities for that column.
s_ij: the substitution matrix going from amino acid i to j.
nc: the mean number of different amino acid types.
beta: weight of the prior, default set to 10.
Returns:
A profile of length 21.
"""
f_j_over_g_j = np.divide(amino_acid_probs, prior_amino_probs)
g_i = np.matmul(s_ij, f_j_over_g_j)
alpha = nc - 1.0
profiles = (alpha * amino_acid_probs + beta * g_i) / (alpha + beta)
return profiles
from deepmind-research.
Hi,
Here is the code for the
profile_with_prior
feature, I hope that helps:def profile_with_prior(amino_acid_probs, prior_amino_probs, s_ij, nc, beta=10.0): """Compute priors which takes into account priors and Blosum matrix. Args: amino_acid_probs: the observed amino acid probabilities in a column of MSA. prior_amino_probs: the prior amino acid probabilities for that column. s_ij: the substitution matrix going from amino acid i to j. nc: the mean number of different amino acid types. beta: weight of the prior, default set to 10. Returns: A profile of length 21. """ f_j_over_g_j = np.divide(amino_acid_probs, prior_amino_probs) g_i = np.matmul(s_ij, f_j_over_g_j) alpha = nc - 1.0 profiles = (alpha * amino_acid_probs + beta * g_i) / (alpha + beta) return profiles
Hi,
Thank you, it's helpful.
But, It's for profile_with_prior_without_gaps with 21 number in the feature.
With gaps, a profile of length is 22.
Additionally, I wonder how to calculate s_ij for with gaps.
Please provide more information about the formula....
Thanks
from deepmind-research.
Hi @newtonjoo,
sorry for the late reply.
But, It's for profile_with_prior_without_gaps with 21 number in the feature.
The code above should also work for profile with gaps.
Additionally, I wonder how to calculate s_ij for with gaps.
# 0.3176 from www.ncbi.nlm.nih.gov/pmc/articles/PMC146917/pdf/253389.pdf
s_ij = np.matmul(prior, prior.T) * np.exp(0.3176 * blosum)
s_ij_without_gaps = (np.matmul(prior_without_gaps, prior_without_gaps.T)
* np.exp(0.3176 * blosum[:-1, :-1]))
from deepmind-research.
Hi @Augustin-Zidek ,
I wonder how do you calculate nc (the mean number of different amino acid types) parameter that you are sending to the function.
from deepmind-research.
Hi @alyosama,
You calculate nc
as:
def mean_unique_aa_types(sequence_matrix):
"""Compute the average unique amino acids in a column for the MSA."""
num_res = sequence_matrix.shape[1]
unique_aa = np.zeros(num_res)
for i in range(num_res):
unique_aa[i] = len(np.unique(sequence_matrix[:, i]))
return np.mean(unique_aa)
nc = mean_unique_aa_types(seq_matrix)
where seq_matrix
is the MSA encoded as a matrix where each amino acid has been replaced with a unique integer.
from deepmind-research.
Hi @Augustin-Zidek Thanks for your fast response,
Also, I was wondering about prior
calculations as well. Are you taking the frequencies of amino acids in the whole MSA as prior ?
from deepmind-research.
Hi @alyosama, the prior
is a static matrix (i.e. it does not depend on the current MSA) we calculated once over a set of high quality proteins.
from deepmind-research.
Hi @alyosama, the
prior
is a static matrix (i.e. it does not depend on the current MSA) we calculated once over a set of high quality proteins.
Thank you, please provide the static matrix. It will be helpful.
from deepmind-research.
Hi John,
here is pseudo-code to regenerate the prior
matrix from a set of proteins with their HHBlits MSAs:
prior = np.zeros((512, 22)) # 20 amino acids + unknown (X) + gap (-)
for protein in proteins:
msa = protein.msa # Shape: (number_of_alignments, sequence_length)
# Iterate over all sequences in the MSA.
for seq in msa:
for i, aa in enumerate(seq):
prior[i, aa] += 1
from deepmind-research.
Hi John,
here is pseudo-code to regenerate the
prior
matrix from a set of proteins with their HHBlits MSAs:prior = np.zeros((512, 22)) # 20 amino acids + unknown (X) + gap (-) for protein in proteins: msa = protein.msa # Shape: (number_of_alignments, sequence_length) # Iterate over all sequences in the MSA. for seq in msa: for i, aa in enumerate(seq): prior[i, aa] += 1
Thanks for your providing.
prior matrix shape is (512, 22), does it mean you consider sequences less than 512 residues?
from deepmind-research.
Does it mean you consider sequences less than 512 residues?
Correct, but that is only when we constructed the prior matrix (which we did once).
from deepmind-research.
Related Issues (20)
- Question regarding training speed
- Nowcasting – Question regarding the Dataset
- Geometry optimisation with DM21 in PySCF?
- RL Unplugged - DM Lab colab broken
- [MeshGraphNets] cuda_blas.cc:428, failed to run cuBLAS routine: CUBLAS_STATUS_EXECUTION_FAILED HOT 4
- [RL Unplugged] - Trained policies for finger_turn_hard do not match the datasets
- linear evaluation
- Code and data for A data-driven approach to learning to control computers HOT 1
- SOSCF with dm21 in pySCF HOT 4
- Input file for the charge delocalization
- Invalid open-source Kinetics dataset url.
- stochdepth_rate in NFNets HOT 1
- 'curl: (77) error setting certificate verify locations' error message when trying to download Basenji2 training data HOT 2
- Invalid download for wikigraph HOT 1
- /dev/shm/tmpk7uinr_c FileNotFoundError
- enformer SEQUENCE_LENGTH
- Request for dataset access for paper replication HOT 2
- About remesher of MeshGraphNets ?
- metadata in learning_to_simulate HOT 3
- MeshGraphNets sphere_dynamic HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from deepmind-research.