GithubHelp home page GithubHelp logo

Comments (11)

Augustin-Zidek avatar Augustin-Zidek commented on July 28, 2024

Hi,

Here is the code for the profile_with_prior feature, I hope that helps:

def profile_with_prior(amino_acid_probs, prior_amino_probs,
                       s_ij, nc, beta=10.0):
  """Compute priors which takes into account priors and Blosum matrix.

  Args:
    amino_acid_probs: the observed amino acid probabilities in a column of MSA.
    prior_amino_probs: the prior amino acid probabilities for that column.
    s_ij: the substitution matrix going from amino acid i to j.
    nc: the mean number of different amino acid types.
    beta: weight of the prior, default set to 10.

  Returns:
    A profile of length 21.
  """
  f_j_over_g_j = np.divide(amino_acid_probs, prior_amino_probs)
  g_i = np.matmul(s_ij, f_j_over_g_j)
  alpha = nc - 1.0
  profiles = (alpha * amino_acid_probs + beta * g_i) / (alpha + beta)
  return profiles

from deepmind-research.

newtonjoo avatar newtonjoo commented on July 28, 2024

Hi,

Here is the code for the profile_with_prior feature, I hope that helps:

def profile_with_prior(amino_acid_probs, prior_amino_probs,
                       s_ij, nc, beta=10.0):
  """Compute priors which takes into account priors and Blosum matrix.

  Args:
    amino_acid_probs: the observed amino acid probabilities in a column of MSA.
    prior_amino_probs: the prior amino acid probabilities for that column.
    s_ij: the substitution matrix going from amino acid i to j.
    nc: the mean number of different amino acid types.
    beta: weight of the prior, default set to 10.

  Returns:
    A profile of length 21.
  """
  f_j_over_g_j = np.divide(amino_acid_probs, prior_amino_probs)
  g_i = np.matmul(s_ij, f_j_over_g_j)
  alpha = nc - 1.0
  profiles = (alpha * amino_acid_probs + beta * g_i) / (alpha + beta)
  return profiles

Hi,
Thank you, it's helpful.
But, It's for profile_with_prior_without_gaps with 21 number in the feature.
With gaps, a profile of length is 22.
Additionally, I wonder how to calculate s_ij for with gaps.
Please provide more information about the formula....
Thanks

from deepmind-research.

Augustin-Zidek avatar Augustin-Zidek commented on July 28, 2024

Hi @newtonjoo,

sorry for the late reply.

But, It's for profile_with_prior_without_gaps with 21 number in the feature.

The code above should also work for profile with gaps.

Additionally, I wonder how to calculate s_ij for with gaps.

# 0.3176 from www.ncbi.nlm.nih.gov/pmc/articles/PMC146917/pdf/253389.pdf
s_ij = np.matmul(prior, prior.T) * np.exp(0.3176 * blosum)
s_ij_without_gaps = (np.matmul(prior_without_gaps, prior_without_gaps.T)
                     * np.exp(0.3176 * blosum[:-1, :-1]))

from deepmind-research.

alyosama avatar alyosama commented on July 28, 2024

Hi @Augustin-Zidek ,

I wonder how do you calculate nc (the mean number of different amino acid types) parameter that you are sending to the function.

from deepmind-research.

Augustin-Zidek avatar Augustin-Zidek commented on July 28, 2024

Hi @alyosama,

You calculate nc as:

def mean_unique_aa_types(sequence_matrix):
  """Compute the average unique amino acids in a column for the MSA."""
  num_res = sequence_matrix.shape[1]
  unique_aa = np.zeros(num_res)
  for i in range(num_res):
    unique_aa[i] = len(np.unique(sequence_matrix[:, i]))
  return np.mean(unique_aa)

nc = mean_unique_aa_types(seq_matrix)

where seq_matrix is the MSA encoded as a matrix where each amino acid has been replaced with a unique integer.

from deepmind-research.

alyosama avatar alyosama commented on July 28, 2024

Hi @Augustin-Zidek Thanks for your fast response,

Also, I was wondering about prior calculations as well. Are you taking the frequencies of amino acids in the whole MSA as prior ?

from deepmind-research.

Augustin-Zidek avatar Augustin-Zidek commented on July 28, 2024

Hi @alyosama, the prior is a static matrix (i.e. it does not depend on the current MSA) we calculated once over a set of high quality proteins.

from deepmind-research.

newtonjoo avatar newtonjoo commented on July 28, 2024

Hi @alyosama, the prior is a static matrix (i.e. it does not depend on the current MSA) we calculated once over a set of high quality proteins.

Thank you, please provide the static matrix. It will be helpful.

from deepmind-research.

Augustin-Zidek avatar Augustin-Zidek commented on July 28, 2024

Hi John,

here is pseudo-code to regenerate the prior matrix from a set of proteins with their HHBlits MSAs:

prior = np.zeros((512, 22))  # 20 amino acids + unknown (X) + gap (-)
for protein in proteins:
  msa = protein.msa  # Shape: (number_of_alignments, sequence_length)
  # Iterate over all sequences in the MSA.
  for seq in msa:
    for i, aa in enumerate(seq):
      prior[i, aa] += 1

from deepmind-research.

newtonjoo avatar newtonjoo commented on July 28, 2024

Hi John,

here is pseudo-code to regenerate the prior matrix from a set of proteins with their HHBlits MSAs:

prior = np.zeros((512, 22))  # 20 amino acids + unknown (X) + gap (-)
for protein in proteins:
  msa = protein.msa  # Shape: (number_of_alignments, sequence_length)
  # Iterate over all sequences in the MSA.
  for seq in msa:
    for i, aa in enumerate(seq):
      prior[i, aa] += 1

Thanks for your providing.
prior matrix shape is (512, 22), does it mean you consider sequences less than 512 residues?

from deepmind-research.

Augustin-Zidek avatar Augustin-Zidek commented on July 28, 2024

Does it mean you consider sequences less than 512 residues?

Correct, but that is only when we constructed the prior matrix (which we did once).

from deepmind-research.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.