GithubHelp home page GithubHelp logo

shunsunsun / generative-ai Goto Github PK

View Code? Open in Web Editor NEW

This project forked from dd1github/generative-ai

0.0 0.0 0.0 725 KB

Public repository for Generative AI Design and Exploration of Nucleoside Analogs

License: MIT License

Python 100.00%

generative-ai's Introduction

Generative-AI

Public repository for the paper "Generative AI Design and Exploration of Nucleoside Analogs".

Abstract:

Nucleosides are fundamental building blocks of DNA and RNA in all life forms and viruses. In addition, natural nucleosides and their analogs are critical in prebiotic chemistry, innate immunity, signaling, antiviral drug discovery and artificial synthesis of DNA / RNA sequences. Combined with the fact that quantitative structure activity relationships (QSAR) have been widely performed to understand their antiviral activity, nucleoside analogs could be used to benchmark generative molecular design. Here, we undertake the first generative design of nucleoside analogs using an approach that we refer to as the Conditional Randomized Transformer (CRT). We also benchmark our model against five previously published molecular generative models. We demonstrate that AI-generated molecules include nucleoside analogs that are of significance in a wide range of areas including prebiotic chemistry, antiviral drug discovery and synthesis of oligonucleotides. Our results show that CRT explores distinct molecular spaces and chemical transformations, some of which are similar to those undertaken by nature and medicinal chemists. Finally, we demonstrate the potential application of the CRT model in the generative design of molecules conditioned on Remdesivir and Molnupiravir as well as other nucleoside analogs with in vitro activity against SARS-CoV-2.

Key Modules used to build CRT:

The key modules that were used with CRT are:

  • Python 3.7
  • Pytorch 1.9.0
  • Rdkit 2020.09.10
  • Tqdm 4.62.0
  • pandas 1.2.5
  • numpy 1.20.3

Code description:

The code was largely developed and run on a single GeForce Nvidia RTX 3050 GPU with cuda toolkit version 11.1.1.

The code consists of three main files for model training, fine-tuning and molecule generation:

  • Training.py
  • Fine_tuning.py
  • Generation.py

Generating molecules consists of three steps:

  1. Training a model to learn the rules of chemical grammar by running train_main.py. A pre-trained model is provided. Training a model from scratch takes approximately 10-12 hours on a single GPU.
  2. Fine tuning a model on a specific dataset of interest by running fine_tune.py. A pre-trained model is provided.
  3. Generating molecules with generation.py.

To run train_main.py, fine_tune.py, and generation.py, the data must be downloaded, unzipped, placed in the appropriate folders and folder addresses updated in the respective code files.

Data and Pre-trained models:

The links to the data and pre-trained models: https://drive.google.com/file/d/15DtJP4WFBpeYDuGV6SzE2yuJpGv-oH_E/view?usp=sharing and https://drive.google.com/file/d/16zbDPK2Wg1Pol4S6iKNYoOYX7Z_Do6FV/view?usp=sharing

The data should be downloaded, unzipped and placed in the …/data folder. The underlying training and validation SMILES data was taken from: https://github.com/ETHmodlab/virtual_libraries. The training and validation Morgan fingerprint property data was developed using Rdkit.

The pre-trained models should be downloaded, unzipped and placed in the …/models folder.

Baseline models:

The baseline models can be found at the following links:

Cite:

If you find our research to be useful, please cite our publication:

Dablain, Damien, Geoffrey Siwo, and Nitesh Chawla. "Generative AI Design and Exploration of Nucleoside Analogs." (2021).

generative-ai's People

Contributors

dd1github avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.