GithubHelp home page GithubHelp logo

lamm-mit / silkomegpt Goto Github PK

View Code? Open in Web Editor NEW
4.0 1.0 1.0 1.44 MB

Generative strategies for modeling, design and analysis of silk protein sequences for enhanced mechanical properties

Jupyter Notebook 100.00%

silkomegpt's Introduction

SilkomeGPT: Generative strategies for modeling, design and analysis of spider silk protein sequences for enhanced mechanical properties

Generative strategies for modeling, design and analysis of silk protein sequences for enhanced mechanical properties

Wei Lu, David L. Kaplan, Markus J. Buehler

Massachusetts Institute of Technology, 77 Massachusetts Ave., Cambridge, MA 02139, USA

Contact email: [email protected]

Abstract: Spider silks are remarkable materials characterized by superb mechanical properties such as strength, extensibility and lightweightedness. Yet, to date, limited models are available to fully explore sequence-property relationships for analysis and design. Here a custom generative large-language model is proposed to enable design of novel spider silk protein sequences to meet complex combinations of target mechanical properties. The model, pretrained on a large set of protein sequences, is fine-tuned on ~1,000 major ampullate spidroin (MaSp) sequences for which associated fiber-level mechanical properties exist, to yield an end-to-end forward and inverse generative approach that is aplied in a multi-agent strategy. Performance is assessed through: (1) a novelty analysis and protein type classification for generated spidroin sequences through Basic Local Alignment Search Tool (BLAST) searches, (2) property evaluation and comparison with similar sequences, (3) comparison of molecular structures, as well as, and (4) a detailed sequence motif analyses. This work generates silk sequences with property combinations that do not exist in nature, and develops a deep understanding of the mechanistic roles of sequence patterns in achieving overarching key mechanical properties (elastic modulus, strength, toughness, failure strain). The model provides an efficient approach to expand the silkome dataset, facilitating further sequence-structure analyses of silks, and establishes a foundation for synthetic silk design and optimization. This work not only shows the capacity of generative transformer models to design complex materials, but also illustrates an effective use of agentic modeling for self-improving design solutions.

Keywords: biomaterials; deep learning; generative autoregressive transformer; hierarchical; multiscale modeling; spider silk; spidroin

image

We report a generative modeling, design, and analysis technique applied to create novel spider silk protein sequences for enhanced mechanical properties. The model allows us to create property combinations that do not exist in nature and develop a deep understanding of the mechanistic roles of sequence patterns in achieving overarching key mechanical properties (elastic modulus, strength, toughness, failure strain).

image

Installation

conda create -n SilkomeGPT python=3.8
conda activate SilkomeGPT
git clone https://github.com/lamm-mit/SilkomeGPT/
cd SilkomeGPT

Then, install SilkomeGPT:

pip install -e .

Start Jupyter Lab (or Jupyter Notebook):

jupyter-lab --no-browser
jupyter notebook

Schematic of the model implemented

(a) Summary of the autoregressive decoder-only transformer model architecture, with rotary positional embedding.
(b) Overview of modeling approach. A query including the task and relevant context is used to create the responses, with interactions among all elements considered via graph-forming attention mechanisms. Tasks included in this work include the petraining “sequence” task, as well as a “calculate” and a “generate” task. The “calculate” task predicts a set of mechanical properties based on a given sequence, and the “generate” task yields a silk sequence with associated mechanical properties (details see paper).

image

Sample Notebooks

A sample Notebooks is provided (SilkomeGPT_inference.ipynb) for spidroin sequence prediction and generation

The trained model can be downloaded from Hugging Face 🤗: https://huggingface.co/lamm-mit/SilkomeGPT

Sample results (details, see paper)

Self-consistency Analysis

Protein sequence property comparison
image

Motif analysis
image

Molecular structure comparison
image

Citation

To cite this work:

@article{WeiKaplanBuehler_2023,
    title   = {Generative Modeling, Design, and Analysis of Spider Silk Protein Sequences for Enhanced Mechanical Properties},
    author  = {W. Lu, D. L., Kaplan, M.J. Buehler},
    journal = {Adv. Funct. Mater.},
    year    = {2023},
    volume  = {},
    pages   = {},
    url     = {https://doi.org/10.1002/adfm.202311324}
}

silkomegpt's People

Contributors

lamm-mit avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar

Forkers

jeff20151

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.