GithubHelp home page GithubHelp logo

eelnowes231 / colabfold-protocol Goto Github PK

View Code? Open in Web Editor NEW

This project forked from steineggerlab/colabfold-protocol

0.0 0.0 0.0 590.19 MB

ColabFold protocol

Jupyter Notebook 99.41% Shell 0.45% TeX 0.13% Python 0.01% R 0.01%

colabfold-protocol's Introduction

Easy and accurate protein structure prediction using ColabFold

A step-by-step protocol for predicting protein mono-, multi-mer structures and conformations using ColabFold/AlphaFold2

Kim G, Lee S, Levy Karin E, Kim H, Moriwaki Y, Ovchinnikov S, Steingger M and Mirdita M. Easy and accurate protein structure prediction using ColabFold. Protocol Exchange (2023) doi: 10.21203/rs.3.pex-2490/v1

Overview

ColabFold has two interfaces:

  1. Web-based jupyter notebooks utilizing Google Colaboratory
  2. Command-line tools (local)

Our protocol guides readers in three scenarios with both interfaces: (1) monomer prediction, (2) complex prediction, and (3) conformation sampling.

We demonstrate the use of ColabFold with the human glycosylphosphatidylinositol transamidase (GPIT) protein for monomer and complex prediction, and the human Alanine Serine Transporter 2 (ASCT2) (alternative conformation) for conformation sampling.

Equipment

Protein sequence queries (FASTA/CSV) used in this protocol can be found in the /query directory

Corresponding experimental structures (PDB) can be found in the /ref directory

Running web notebooks

To open the notebooks in Google Colaboratory environment, press Open In Colab button on the top of each notebook.

While most of the notebooks can be executed for free, complex prediction (GPIT_complex.ipynb) requires a paid Colab Pro account, due to its long length.

Also, please note that the notebooks are provided as guides for tuning parameters and are not intended for rerunning, considering the potential changes in the Google Colaboratory environment itself. To run the ColabFold-AlphaFold2 notebook from the beginning, navigate here.

The results of the notebooks can be found in the /web/result directory. It includes the following output:

  • predicted structures (PDB)
  • generated MSAs (A3M)
  • confidence measures (JSON, log.txt): pLDDT, PAE, pTM, ipTM (for complex prediction)
  • images (PNG) for visualizing MSA sequence coverage and confidence measures

Running locally

For detailed instructions on how to install ColabFold locally, refer to ColabFold or localcolabfold.

To run ColabFold locally, one can use this command line:

colabfold_batch input_seqs.fasta /path/to/results

To run the procedures in this protocol locally, please follow the steps below.

# Clone this repository
git clone https://github.com/steineggerlab/colabfold-protocol.git

# move to the directory
cd colabfold-protocol/batch
# Run colabfold for each monomer
sh script/run_colabfold_GPIT_monomer.sh
# Predict the structure of GPIT complex with alphafold_multimer_v3
sh script/run_colabfold_GPIT_complex.sh
# Predict complex structure pairwisely
sh script/run_colabfold_GPIT_pair.sh
# Validate the predicted structure by aligning to the experimental structure
sh script/validate_colabfold_prediction.sh
# Render the structure alignment (ChimeraX required)
open script/render_structure_alignment.cxc

The results of the local predictions can be found in the /batch/result directory, and are organized in the same way as the web notebook output.

Conformation prediction

This procedure generates multiple conformational states from a single input sequence by increasing the uncertainty of the AlphaFold2 model network. To this end, we use two different strategies: (1) MSA depth reduction and (2) Dropout layers activation.

Structure prediction

For each strategy, set ColabFold parameters as follows:

  • MSA depth reduction: max_msa=32:64, num_seeds=16
  • Dropout: use-dropout, num_seeds=16

With the above setting, a total of 80 structures will be generated for each strategy, and the results will be found in the web/result/conformation and batch/result/conformation directories for web and local predictions, respectively. Considering its huge size, we only provide 20 representative structures for each strategy.

Identifying alternative conformations

This part processes the positions of the amino-acids' alpha carbons under each model with PCA analysis using CPPTRAJ. Its aim is to reduce the dimensionality from 451 parameters (# processed residues) to only few, which capture most of the conformational movements. One can identify multiple conformational states based on this PCA result, by selecting the representative structures furthest apart from each other along the PC1 and/or PC2 axis (depending on the amount of variance captured by each PC).

The provided CPPTRAJ script (run with this bash script) performs the following steps. This script is largely based on the script provided by del Alamo et al.

  1. Trim off terminal stretches of low-pLDDT scores to reduce noise in the PCA.
  2. Compute the average position of the remaining 451 alpha carbons across the 80 protein models and deduct it from each of the models.
  3. Compute the covariance matrix of the 451 updated positions.
  4. Compute the Eigenvalues and Eigenvectors of the covariance matrix, sort them and rearrange the matrix.
  5. Compute the projections of the 80 models along the first three principal components (PCs).

The results of the PCA can be found in batch/result/conformation/pca. Some key outputs are:

  • project.dat: The first three PCs for each model (you can ignore the last column (PC3) in this case)
  • %eigenvec.dat: the amount of variance captured by each PC in descending order (2nd column)

How do I reference this work?

  • Kim G, Lee S, Levy Karin E, Kim H, Moriwaki Y, Ovchinnikov S, Steingger M and Mirdita M. Easy and accurate protein structure prediction using ColabFold.
    Protocol Exchange (2023) doi: 10.21203/rs.3.pex-2490/v1

  • Mirdita M, Schütze K, Moriwaki Y, Heo L, Ovchinnikov S and Steinegger M. ColabFold: Making protein folding accessible to all.
    Nature Methods (2022) doi: 10.1038/s41592-022-01488-1

  • If you’re using AlphaFold, please also cite:
    Jumper et al. "Highly accurate protein structure prediction with AlphaFold."
    Nature (2021) doi: 10.1038/s41586-021-03819-2

  • If you’re using AlphaFold-multimer, please also cite:
    Evans et al. "Protein complex prediction with AlphaFold-Multimer."
    biorxiv (2021) doi: 10.1101/2021.10.04.463034v1

  • If you are using RoseTTAFold, please also cite:
    Minkyung et al. "Accurate prediction of protein structures and interactions using a three-track neural network."
    Science (2021) doi: 10.1126/science.abj8754

colabfold-protocol's People

Contributors

eelnowes231 avatar khb7840 avatar martin-steinegger avatar gyuuul2 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.