GithubHelp home page GithubHelp logo

smurf's Introduction

SMURF

Source code to accompany: End-to-end learning of multiple sequence alignments with differentiable Smith-Waterman See: https://www.biorxiv.org/content/10.1101/2021.10.23.465204v1

[1] Petti S, Bhattacharya N, Rao R, Dauparas J, Thomas N, Zhou J, Rush AM, Koo PK, Ovchinnikov S. End-to-end learning of multiple sequence alignments with differentiable Smith-Waterman. BioRxiv. 2021 Oct 24.

Source code for SSW and SMURF:

  • laxy.py: Wrapper around JAX for basic neural networks, see https://github.com/sokrypton/laxy.

  • sw_functions.py: Differentiable JAX implementations of smooth Smith Waterman and Needleman Wunsch. Features affine gap and temperature parameters.

  • network_functions.py: SMURF pipeline including the BasicAlign and TrainMRF modules.

examples/SSW_examples: example usage and speed tests for SSW (Smooth Smith-Waterman):

  • ssw_examples.ipynb: Tutorial on how to use our smooth Smith Waterman implementation.

  • nw_speedtest.ipynb: Runtime comparison of our vectorized code to a naive implementation and to the "deepBLAST" implementation given in [Morton et al. 2020]. For implementations of both local (sw) and global (nw) alignment algorithms.

  • sw_in_tensorflow_pytorch.ipynb: Implementations in TensorFlow and PyTorch

examples/SMURF_examples: example usage of SMURF on RNA and protein and creation of associated figures

  • run_smurf.py: Code that executes SMURF and MLM-GREMLIN. Selected hyperparameters described in the comments. Outputs a single file containing the contact prediction AUCs for the families.

  • run_smurf_w_contacts_aln.py: Code that executes SMURF and MLM-GREMLIN. Selected hyperparameters described in the comments. Outputs a one file per family that contains the predicted contacts, contact prediction AUC, and learned alignment (for SMURF only).

  • ablation_test.py: Code that executes ablations described in [1].

  • data_description.txt: Description of data used in [1].

  • make_SMURF_figures_protein.ipynb and make_SMURF_figures_RNA.ipynb: Code to generate figures in [1].

examples/LAM_AF_examples: example usage of LAM with AF and creation of associated figures

  • CASP_examples: folder containing MMSeqs2 generated alignments and true structures for the examples analyzed in [1].

  • learned_alns: folder containing alignments learned by LAM for each family; generated via save_and_view_msas.ipynb.

  • af_msa_backprop.ipynb: Backprop through AlphaFold to "learn" an MSA from a collection of unaligned sequences that maximizes the confidence metric (and hopefully returns a more accurate structure). Illustrates trajectories.

  • af_opt_and_save_v2.py: Same pipeline as above, executes choice of random seed, learning rate, cooling, and E-value restrictions used in [1]

  • make_AF_figures.ipynb: Plots figures that show pLDDT and RMSD of best points in each trajectory for each family

  • make_pairwise_aln_figures.ipynb: Code to generate all figures in [1] relating to the pairwise alignments of learned MSAs (for both SMURF and AF)

  • save_and_view_msas.ipynb: Code to construct MSAs from the saved parameters of an LAM. Results stored in learned_alns folder. Also displays the alignment shown in [1].

  • sensitivity_of_AF_preds.ipynb: Code that evaluates the sensitivity of AF predictions to (a) the random mask and (b) the removal of sets of sequences. Generates related figures in [1].

smurf's People

Contributors

sokrypton avatar spetti avatar juannanzhou avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.