GithubHelp home page GithubHelp logo

daoyuanli2816 / molecule-generator Goto Github PK

View Code? Open in Web Editor NEW
8.0 1.0 1.0 625 KB

Variational Autoencoder (VAE)-based molecular SMILES string generator

License: MIT License

Python 100.00%
ai4science chemistry generative-model molecular-simulation smiles-strings tokenizer vae-pytorch

molecule-generator's Introduction

VAE-Based SMILES String Generator

This project is a Variational Autoencoder (VAE)-based molecular SMILES string generator. It generates molecules composed of CHOH/CH2OH (referred to as A) and CH/CH2/CH3 (referred to as B) repeat units. The generated molecules are saturated and contain no rings.

Image

Project Structure

The project consists of the following Python scripts:

  • VAE.py: Defines the VAE model and includes functions for training and testing the model.
  • generate.py: Generates new SMILES strings by perturbing the latent space of the trained VAE.
  • interpolate.py: Generates interpolated SMILES strings between two given SMILES strings using the latent space of the trained VAE.
  • synthetic_dataset.py: Generates a synthetic dataset of SMILES strings based on specified constraints.

Features

  • Generates over 100,000 synthetic SMILES strings.
  • Only A and B repeat units are included.
  • No molecule contains more than six consecutive A repeat units.
  • All molecules in the dataset are saturated and contain no rings.

Installation

  1. Clone the repository:

    git clone https://github.com/DaoyuanLi2816/Molecule-Generator.git
    cd Molecule-Generator
  2. Install the required dependencies:

    pip install -r requirements.txt
  3. Ensure you have RDKit installed. RDKit is required for molecular operations. Installation instructions can be found here.

Usage

Generating Synthetic Dataset

To generate a synthetic dataset of SMILES strings, run synthetic_dataset.py:

python synthetic_dataset.py

This will create a CSV file named molecules.csv containing the generated SMILES strings.

Training the VAE Model

To train the VAE model, run VAE.py:

python VAE.py

This will train the VAE model on the generated dataset and save the trained model as beta_tc_vae_model.pth.

Generating New SMILES Strings

To generate new SMILES strings using the trained VAE model, run generate.py:

python generate.py

This will output new SMILES strings generated by perturbing the latent space of the trained VAE.

Interpolating Between Two SMILES Strings

To generate interpolated SMILES strings between two given SMILES strings, run interpolate.py:

python interpolate.py

This will output SMILES strings that are interpolations between the two input SMILES strings in the latent space of the trained VAE.

Contributing

If you would like to contribute to this project, please open an issue or submit a pull request. We welcome contributions from the community.

License

This project is licensed under the MIT License. See the LICENSE file for details.

molecule-generator's People

Contributors

daoyuanli2816 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Forkers

molierflower

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.