GithubHelp home page GithubHelp logo

yashasdevasurmutt / manyfold Goto Github PK

View Code? Open in Web Editor NEW

This project forked from instadeepai/manyfold

0.0 0.0 0.0 1.42 MB

🧬 ManyFold: An efficient and flexible library for training and validating protein folding models

License: Other

Python 99.25% Dockerfile 0.75%

manyfold's Introduction

ManyFold

Python Version Jax Version license

An efficient and flexible library in Jax for distributed training and batched validation of protein folding models.

Description

ManyFold supports the AlphaFold2 and OpenFold models. In addition, it also implements the pLMFold model, an MSA-free folding model using ESM-1b protein language model (pLM) embeddings and attention heads as inputs. A schematic of the pLMFold model is depicted below:

plmfold

Fig. 1: The pLMFold model: a pre-trained pLM model first converts the input amino acid sequence into single and pair representations that are processed in the pLMformer module. The output is then fed into the structure module to generate the predicted protein structure.

ManyFold allows for training full AlphaFold/pLMFold models from either (i) randomly initialized parameters and optimizer state, (ii) a previously stored checkpoint, or (iii) pre-trained model parameters (for model fine-tuning). The library was used to train a pLMFold model from scratch, obtaining plausible protein structures (Fig. 2) while significantly reducing forward/backward times with respect to AlphaFold.

structures

Fig. 2: AlphaFold/OpenFold (model_1_ptm), and pLMFold predictions aligned to the experimental structure of the target with chain id 7EJG_C.

First-time setup

  1. Clone this repository locally and cd into it:
git clone https://github.com/instadeepai/manyfold.git
cd manyfold
  1. Download data for training and inference (see datasets/README.md for a description of the datasets):
curl -fsSL https://storage.googleapis.com/manyfold-data/datasets.tar | tar x -C datasets/
  1. Download pLMFold/AlphaFold/OpenFold pretrained parameters. These are meant for validation inference or model fine-tuning:
# pLMFold
mkdir -p params
curl -fsSL https://storage.googleapis.com/manyfold-data/params.tar | tar x -C params/

# AlphaFold
mkdir -p params/alphafold
curl -fsSL https://storage.googleapis.com/alphafold/alphafold_params_2021-07-14.tar | tar x -C params/alphafold

# OpenFold
mkdir -p params/openfold
for i in 1 2; do
    wget -qnc https://files.ipd.uw.edu/krypton/openfold/openfold_model_ptm_${i}.npz -O params/openfold/params_model_${i}_ptm.npz
done

The easiest way to run ManyFold is using the docker files provided in the docker folder. You will need a machine running Linux.

For a detailed explanation on how to run experiments, please refer to experiments/README.md. This involves two main steps:

  • Build the docker image and run the docker container.
  • Launch training runs or validation inference.

Acknowledgements

This research has been supported with TPUs from Google's TPU Research Cloud (TRC).

Citing ManyFold

If you find this repository useful in your work, please add the following citation to our associated paper in Bioinformatics:

@software{manyfold2022github,
  author = {Villegas-Morcillo, Amelia and Robinson, Louis and Flajolet, Arthur and Barrett, Thomas D},
  title = {{ManyFold}: An efficient and flexible library for training and validating protein folding models},
  journal = {Bioinformatics},
  year = {2022},
  url = {https://doi.org/10.1093/bioinformatics/btac773},
}

manyfold's People

Contributors

amelvim avatar tomdbar avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.