GithubHelp home page GithubHelp logo

chao1224 / geom3d Goto Github PK

View Code? Open in Web Editor NEW
107.0 2.0 9.0 891 KB

Geom3D: Geometric Modeling on 3D Structures, NeurIPS 2023

Home Page: https://openreview.net/forum?id=ygXSNrIU1p

License: MIT License

Python 96.34% Cython 0.44% Shell 1.21% Jupyter Notebook 2.00%
3d 3d-structures biology chemistry crystals drugs equivariance geometry group invariance

geom3d's Introduction

Symmetry-Informed Geometric Representation for Molecules, Proteins, and Crystalline Materials

Authors: Shengchao Liu, Weitao Du, Yanjing Li, Zhuoxinran Li, Zhiling Zheng, Chenru Duan, Zhiming Ma, Omar Yaghi, Anima Anandkumar, Christian Borgs, Jennifer Chayes, Hongyu Guo, Jian Tang

[ArXiv]

This is Geom3D, a platfrom for geometric modeling on 3D structures:

Environment

Conda

Setup the anaconda

wget https://repo.continuum.io/archive/Anaconda3-2019.10-Linux-x86_64.sh
bash Anaconda3-2019.10-Linux-x86_64.sh -b
export PATH=$PWD/anaconda3/bin:$PATH

Packages

Start with some basic packages.

conda create -n Geom3D python=3.7
conda activate Geom3D
conda install -y -c rdkit rdkit
conda install -y numpy networkx scikit-learn
conda install -y -c conda-forge -c pytorch pytorch=1.9.1
conda install -y -c pyg -c conda-forge pyg=2.0.2
pip install ogb==1.2.1

pip install sympy

pip install ase

pip install lie_learn # for TFN and SE3-Trans

pip install packaging # for SEGNN
pip3 install e3nn # for SEGNN

pip install transformers # for smiles
pip install selfies # for selfies

pip install atom3d # for Atom3D
pip install cffi # for Atom3D
pip install biopython # for Atom3D

pip install cython # for pyximport 

conda install -y -c conda-forge py-xgboost-cpu # for XGB

Datasets

We cover three types of datasets:

  • Small Molecules
    • QM9
    • MD17
    • rMD17
    • COLL
  • Proteins
    • EC
    • FOLD
  • Small Molecules and Proteins
    • LBA
    • LEP
  • Materials
    • MatBench
    • QMOF

For dataset acquisition:

  • We provide a set of raw and processed dataset HuggingFace. You can download the data using python download_data.py under ./data.
  • Please refer to the data folder for more details.

Overview of Models

Representation Models

Geom3D includes the following representation models:

We also include the following 7 1D models and 11 2D models (specifically for small molecules):

Notice that there is no pretraining considered at this stage. For geoemtric pretraining models, please check the following section.

Geometric Pretraining

We include the following 14 geometric pretraining methods:

Scripts

The python scripts can be found in examples_3D. We list the bash scripts (and hyperparameters) in scripts. For example, the bash script for SchNet on QM9 is:

cd examples_3D

export model_3d=SchNet
export dataset=QM9
export task_list=(mu alpha homo lumo gap r2 zpve u0 u298 h298 g298 cv)

export lr_list=(5e-4)
export lr_scheduler_list=(CosineAnnealingLR)
export split=customized_01
export seed=42
export emb_dim_list=(128 300)
export batch_size_list=(128)

export epochs=1000

for task in "${task_list[@]}"; do
for lr in "${lr_list[@]}"; do
for lr_scheduler in "${lr_scheduler_list[@]}"; do
for emb_dim in "${emb_dim_list[@]}"; do
for batch_size in "${batch_size_list[@]}"; do

    export output_model_dir=output/random/"$model_3d"/"$dataset"/"$task"_"$split"_"$seed"/"$lr"_"$lr_scheduler"_"$emb_dim"_"$batch_size"_"$epochs"
    export output_file="$output_model_dir"/result.out
    mkdir -p "$output_model_dir"

    python finetune_QM9.py \
    --model_3d="$model_3d" --dataset="$dataset" --epochs="$epochs" \
    --task="$task" \
    --split="$split" --seed="$seed" \
    --batch_size="$batch_size" \
    --emb_dim="$emb_dim" \
    --lr="$lr" --lr_scheduler="$lr_scheduler" --no_eval_train --print_every_epoch=1 --num_workers=8 \
    --output_model_dir="$output_model_dir" \
    > "$output_file"
    
done
done
done
done
done

Now only the bash scripts for QM9 are available. We will release the complete version soon, together with Notebook demo. Please stay tuned.

Checkpoints

Checkpoints for all the pretraining and downstream tasks will be released soon.

Cite us

Feel free to cite this work if you find it useful to you!

@article{liu2023symmetry,
    title={Symmetry-Informed Geometric Representation for Molecules, Proteins, and Crystalline Materials},
    author={Liu, Shengchao and Du, Weitao and Li, Yanjing and Li, Zhuoxinran and Zheng, Zhiling and Duan, Chenru and Ma, Zhiming and Yaghi, Omar and Anandkumar, Anima and Borgs, Christian and others},
    journal={arXiv preprint arXiv:2306.09375},
    year={2023}
}

geom3d's People

Contributors

chao1224 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

geom3d's Issues

Reproducing MatBench results

Hello, and thank you for making this work available! If I want to reproduce your results on the MatBench dataset (e.g., the log10 G result for DimeNet++), can you point me to what script I should run?

Compared to something like the QM9 script, I assume I'd want to use the dataloader functionality here and dataset here. Are there any other important changes from the QM9 script to keep in mind?

torch verision conflict with the arch with A100/3090?

Hi, thanks for your impressive work.

I just got some issue when I attemped to deploy this repo on my cluster (with 8 A100 GPU)
First I request more than 100Gb memory but still got the segmentation fault error, it is about the memory leaking, does it mean there is any memory sensitive operation in the code?

Also, when I check the package verision, I found the torch is set as 1.9, but my torch is 1.13 with CUDA 11.6.
Does this code only work with the torch 1.9, becasue the current arch GPU, such as A100/A6000/3090 only support torch>1.11?

Looking forward to your resposne!
Best Regards

About the code of MoleculeJAE

Hi Authors,

Congrats on the great work!

As I was reading your team's paper about MoleculeJAE, and I saw the code of the MoleculeJAE is available on this website, so I want to reproduce the greate work, but I didn't see the work. Could you tell where I can find the code?

Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.