GithubHelp home page GithubHelp logo

kalyan-immadisetty / af2_conformations Goto Github PK

View Code? Open in Web Editor NEW

This project forked from delalamo/af2_conformations

0.0 0.0 0.0 3.52 MB

A series of scripts that facilitate the prediction of protein structures in multiple conformations using AlphaFold2

License: MIT License

R 14.74% Python 46.38% Jupyter Notebook 38.88%

af2_conformations's Introduction

Prediction of alternative conformations using AlphaFold 2

This repository accompanies the manuscript "Sampling the conformational landscapes of transporters and receptors with AlphaFold2" by Diego del Alamo, Davide Sala, Hassane S. Mchaourab, and Jens Meiler. The code used to generate these models can be found in scripts/ and was derived from the closely related repository ColabFold. This repository also includes the scripts used to plot the data, which can be found in figures/, as well as scripts to analyze data analyses_scripts. Finally, a Google Colab notebook is available for use in notebooks/.

The model generation code does not change the underlying AlphaFold v2.0.1 prediction pipeline (multimer prediction is not currently supported - we are working on that!). Therefore, please follow the installation instructions provided by DeepMind and review the AlphaFold2 license and disclaimer before use (additionally, please refer to the AlphaFold FAQ and ColabFold FAQ). The objective of the code contained here is to provide access to otherwise hard-to-reach settings that facilitate the generation of conformationally heterogeneous models of protein structures. Genetic and/or structural databases do not need to be downloaded - everything is accessible through the cloud via the MMseqs2 API.

De novo prediction of protein structures in multiple alternative conformations can usually be achieved for proteins that are absent from the AlphaFold2 training set, i.e. their structures were not determined prior to 30 April 2018. Predicting multiple conformations of proteins in the training set is, in our experience, sometimes but not usually possible.

We recommend sampling across several MSA depths. When MSAs are too shallow, the proteins are totally misfolded, whereas when they are too deep the models are conformationally uniform. The "Goldilocks range" of MSA depths that achieve the maximum number of correctly folded, but structurally diverse models seems to differ from protein to protein; in our experience, they appear to correlate with the number of amino acids. In any case, initial guesses for MSA depths can range 32-128 sequences for proteins absent from the training set and 8-64 sequences for proteins in the training set (note that this is much less than the 1000-5000 sequences that are used by AlphaFold2 by default). Once generated, these models can be analyzed using any dimensionality reduction and/or clustering algorithm; in our study we use PCA and focus mainly on the models at either extreme.

How to use the code in this repository

Before importing the code contained in the scripts/ folder, the user needs to install the AlphaFold source code and download the parameters to a directory named params/. Additional Python modules that must be installed include Numpy and Logging.

The scripts can be imported and used out-of-the-box to fetch multiple sequence alignments and/or templates of interest:

from af2_conformations.scripts import mmseqs2

# Jobname for reference
jobname = 'T4_lysozyme'

# Amino acid sequence. Whitespace and inappropriate characters are automatically removed
sequence = ("MNIFEMLRIDEGLRLKIYKDTEGYYTIGIGHLLTKSPSLNAAKSELDKAIGRNCNGVIT"
            "KDEAEKLFNQDVDAAVRGILRNAKLKPVYDSLDAVRRCALINMVFQMGETGVAGFTNSL"
            "RMLQQKRWDEAAVNLAKSRWYNQTPNRAKRVITTFRTGTWDAYKNL" )
            
# PDB IDs, written uppercase with chain ID specified
pdbs = ["6LB8_A",
        "6LB8_C",
        "4PK0_A",
        "6FW2_A"]

# Initializes the Runner object that queries the MMSeqs2 server
mmseqs2_runner = mmseqs2.MMSeqs2Runner( jobname, sequence )

# Fetches the data and saves to the appropriate directory
a3m_lines, template_path = mmseqs2_runner.run_job( templates = pdbs )

The following code then runs a prediction without templates. Note that the max_msa_clusters and max_extra_msa options can be provided to reduce the size of the multiple sequence alignment. If these are not provided, the networks default values will be used. Additional options allow the number of recycles, as well as the number of loops through the recurrent Structure Module, to be specified.

from af2_conformations.scripts import predict

predict.predict_structure_no_templates( sequence, "out.pdb",
         a3m_lines, model_id = 1, max_msa_clusters = 16,
         max_extra_msa = 32, max_recycles = 1, n_struct_module_repeats = 8 )

To run a prediction with templates:

predict.predict_structure_from_templates( sequence, "out.pdb",
        a3m_lines, template_path = template_path,
        model_id = 1, max_msa_clusters = 16, max_extra_msa = 32,
        max_recycles = 1, n_struct_module_repeats = 8 )

There is also functionality to introduce mutations (e.g. alanines) across the entire MSA to remove the evolutionary evidence for specific interactions (see here and here on why you would want to do this). This can be achieved as follows:

# Define the mutations and introduce into the sequence and MSA
residues = [ 41,42,45,46,56,59,60,63,281,282,285,286,403,407 ]
muts = { r: "A" for r in residues }
mutated_msa = util.mutate_msa( a3m_lines, muts )

Known issues

Here is a shortlist of known problems that we are currently working on:

  • The MMSeqs2 server queries the PDB70, rather than the full PDB. This can cause some structures to be missed if their sequences are nearly identical to those of other PDB files.
  • Multimer prediction is not currently supported.
  • Custom MSAs are not currently supported.

If you find any other issues please let us know in the "issues" tab above.

Citation

If the code in this repository has helped your scientific project, please consider citing our preprint:

@article {delAlamo2021.11.22.469536,
	author = {del Alamo, Diego and Sala, Davide and Mchaourab, Hassane S. and Meiler, Jens},
	title = {Sampling the conformational landscapes of transporters and receptors with AlphaFold2},
	elocation-id = {2021.11.22.469536},
	year = {2021},
	doi = {10.1101/2021.11.22.469536},
	publisher = {Cold Spring Harbor Laboratory},
	URL = {https://www.biorxiv.org/content/early/2021/11/22/2021.11.22.469536},
	eprint = {https://www.biorxiv.org/content/early/2021/11/22/2021.11.22.469536.full.pdf},
	journal = {bioRxiv}
}

af2_conformations's People

Contributors

davidesala avatar delalamo avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.