GithubHelp home page GithubHelp logo

laixindev / selfies Goto Github PK

View Code? Open in Web Editor NEW

This project forked from aspuru-guzik-group/selfies

0.0 0.0 0.0 6.06 MB

Robust representation of semantically constrained graphs, in particular for molecules in chemistry

License: Apache License 2.0

Python 100.00%

selfies's Introduction

SELFIES

SELFIES (SELF-referencIng Embedded Strings) is a general-purpose, sequence-based, robust representation of semantically constrained graphs. It is based on a Chomsky type-2 grammar, augmented with two self-referencing functions. A main objective is to use SELFIES as direct input into machine learning models, in particular in generative models, for the generation of graphs with high semantical and syntactical validity.

See the paper at arXiv: https://arxiv.org/abs/1905.13741

The code presented here is a concrete application of SELFIES in chemistry, for the robust representation of molecule.

SELFIES has a validity of >99.99% even for entire random strings.

Installation

You can install SELFIES via

pip install selfies

Examples

Several examples can be seen in examples/selfies_example.py. Here is a simple encoding and decoding:

from selfies import encoder, decoder, selfies_alphabet()  
    
test_molecule1='CN1C(=O)C2=C(c3cc4c(s3)-c3sc(-c5ncc(C#N)s5)cc3C43OCCO3)N(C)C(=O)C2=C1c1cc2c(s1)-c1sc(-c3ncc(C#N)s3)cc1C21OCCO1' # non-fullerene acceptors for organic solar cells
selfies1=encoder(test_molecule1)
smiles1=decoder(selfies1)

print('test_molecule1: '+test_molecule1+'\n')
print('selfies1: '+selfies1+'\n')
print('smiles1: '+smiles1+'\n')
print('equal: '+str(test_molecule1==smiles1)+'\n\n\n')

my_alphabet=selfies_alphabet() # contains all semantically valid SELFIES symbols.
  • an example of SELFIES in a generative model can be seen in the directory 'VariationalAutoEncoder_with_SELFIES'. There, SMILES datasets are automatically translated into SELFIES, and used for training of a variational autoencoder (VAE).

Python version

fully tested with Python 3.7.1 on

supported:

  • Python 3.7.2, 3.7.1, 3.6.8, 3.6.7, 2.7.15

Versions

0.2.4 (01.10.2019):

   - added:
       -> functon selfies_alphabet() which returns a list of 29 selfies symbols whos arbitrary combination produce >99.99% valid molecules
   - bug fixes:
       -> fixed bug which happens when three rings start at one node, and two of them form a double ring
       -> enabled rings with sizes of up to 8000 SELFIES symbols
       -> bugfix for tiny ring to RDkit syntax conversion, spanning multiple branches
   - we thank Kevin Ryan (LeanAndMean@github), Theophile Gaudin and Andrew Brereton for suggestions and bug reports 

0.2.2 (19.09.2019):

   - added:
       -> Enabled [C@],[C@H],[C@@],[C@@H],[H] to use in a semantic constrained way
   - we thank Andrew Brereton for suggestions and bug reports 

0.2.1 (02.09.2019):

   - added:
       -> Decoder: added optional argument to restrict nitrogen to 3 bonds. decoder(...,N_restrict=False) to allow for more bonds;
                   standard: N_restrict=True
       -> Decoder: added optional argument make ring-function bi-local (i.e. confirms bond number at target).
                   decoder(...,bilocal_ring_function=False) to not allow bi-local ring function; standard:
                   bilocal_ring_function=True. The bi-local ring function will allow validity of >99.99% of random molecules
       -> Decoder: made double-bond ring RDKit syntax conform
       -> Decoder: added state X5 and X6 for having five and six bonds free
   - bug fixes:
        -> Decoder+Encoder: allowing for explicit brackets for organic atoms, for instance [I]
        -> Encoder: explicit single/double bond for non-canconical SMILES input issue fixed
        -> Decoder: bug fix for [Branch*] in state X1
   - we thank Benjamin Sanchez-Lengeling, Theophile Gaudin and Zhenpeng Yao for suggestions and bug reports 

0.1.1 (04.06.2019):

   - initial release 

selfies's People

Contributors

mariokrenn6240 avatar florianhase avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.