GithubHelp home page GithubHelp logo

_2023_blaabjerg_ssemb's Introduction

A joint embedding of protein sequence and structure enables robust variant effect predictions

Introduction

This repository contains scripts and data to repeat the analyses in Blaabjerg et al.:
"A joint embedding of protein sequence and structure enables robust variant effect predictions".

Execution

Execute the pipeline using src/run_pipeline.py.
This main script will call other scripts in the src directory to train, validate and test the SSEmb model as described in the paper.

Requirements

The code has been developed and tested in a Unix environment using the following packages:

  • python==3.7.16
  • pytorch==1.13.1
  • pyg==2.2.0
  • pytorch-scatter==2.1.0
  • pytorch-cluster==1.6.0
  • fair-esm==2.0.0
  • numpy==1.21.6
  • pandas==1.3.5
  • biopython==1.79
  • openmm==7.6.0
  • pdbfixer==1.8.1
  • scipy==1.7.3
  • scikit-learn==1.0.2
  • tqdm==4.64.1
  • pytz==2022.7
  • matplotlib==3.2.2
  • mpl-scatter-density==0.7

Downloads

Data related to the paper can be download here: https://zenodo.org/records/10362251.
The directory contains the folding subdirectories:

  • train
    • model_weights: Final weights for the SSEmb-MSATransformer and SSEmb-GVPGNN modules.
    • optimizer_weights: Parameters for the optimizer at time of early-stopping.
    • msa: MSAs for the proteins in the training set.
  • mave_val:
    • msa: MSAs for the proteins in the MAVE validation set.
  • rocklin:
    • msa: MSAs for the proteins in the mega-scale stability change test set.
  • proteingym:
    • structure: AlphaFold-2 generated structures used for the ProteinGym test.
    • msa: MSAs for the proteins in the ProteinGym test set.
  • scannet:
    • model_weights: Final weights for the SSEmb downstream model trained on the ScanNet data set.
    • optimizer_weights: Parameters for the optimizer at time of early-stopping.
    • msa: MSAs for the proteins in the ScanNet data set.

License

Source code and model weights are licensed under the MIT License.

Acknowledgements

Code for the original MSA Transformer was developed by the ESM team at Meta Research:
https://github.com/facebookresearch/esm.

Code for the original GVP-GNN was developed by Jing et al:
https://github.com/drorlab/gvp-pytorch.

Citation

Please cite:

Lasse M. Blaabjerg, Nicolas Jonsson, Wouter Boomsma, Amelie Stein, Kresten Lindorff-Larsen (2023). A joint embedding of protein sequence and structure enables robust variant effect predictions. bioRxiv, 2023.12.

@article {Blaabjerg2023.12.14.571755,
	author = {Lasse M. Blaabjerg and Nicolas Jonsson and Wouter Boomsma and Amelie Stein and Kresten Lindorff-Larsen},
	title = {A joint embedding of protein sequence and structure enables robust variant effect predictions},
	elocation-id = {2023.12.14.571755},
	year = {2023},
	doi = {10.1101/2023.12.14.571755},
	URL = {https://www.biorxiv.org/content/early/2023/12/16/2023.12.14.571755},
	eprint = {https://www.biorxiv.org/content/early/2023/12/16/2023.12.14.571755.full.pdf},
	journal = {bioRxiv}
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.