GithubHelp home page GithubHelp logo

lucidrains / equiformer-pytorch Goto Github PK

View Code? Open in Web Editor NEW
226.0 14.0 22.0 17.86 MB

Implementation of the Equiformer, SE3/E3 equivariant attention network that reaches new SOTA, and adopted for use by EquiFold for protein folding

License: MIT License

Python 100.00%
artificial-intelligence deep-learning equivariance transformers attention-mechanisms protein-folding molecules

equiformer-pytorch's Introduction

Equiformer - Pytorch (wip)

Implementation of the Equiformer, SE3/E3 equivariant attention network that reaches new SOTA, and adopted for use by EquiFold (Prescient Design) for protein folding

The design of this seems to build off of SE3 Transformers, with the dot product attention replaced with MLP Attention and non-linear message passing from GATv2. It also does a depthwise tensor product for a bit more efficiency. If you think I am mistakened, please feel free to email me.

Update: There has been a new development that makes scaling the number of degrees for SE3 equivariant networks dramatically better! This paper first noted that by aligning the representations along the z-axis (or y-axis by some other convention), the spherical harmonics become sparse. This removes the mf dimension from the equation. A follow up paper from Passaro et al. noted the Clebsch Gordan matrix has also become sparse, leading to removal of mi and lf. They also made the connection that the problem has been reduced from SO(3) to SO(2) after aligning the reps to one axis. Equiformer v2 (Official repository) leverages this in a transformer-like framework to reach new SOTA.

Will definitely be putting more work / exploration into this. For now, I've incorporated the tricks from the first two paper for Equiformer v1, save for complete conversion into SO(2).

Install

$ pip install equiformer-pytorch

Usage

import torch
from equiformer_pytorch import Equiformer

model = Equiformer(
    num_tokens = 24,
    dim = (4, 4, 2),               # dimensions per type, ascending, length must match number of degrees (num_degrees)
    dim_head = (4, 4, 4),          # dimension per attention head
    heads = (2, 2, 2),             # number of attention heads
    num_linear_attn_heads = 0,     # number of global linear attention heads, can see all the neighbors
    num_degrees = 3,               # number of degrees
    depth = 4,                     # depth of equivariant transformer
    attend_self = True,            # attending to self or not
    reduce_dim_out = True,         # whether to reduce out to dimension of 1, say for predicting new coordinates for type 1 features
    l2_dist_attention = False      # set to False to try out MLP attention
).cuda()

feats = torch.randint(0, 24, (1, 128)).cuda()
coors = torch.randn(1, 128, 3).cuda()
mask  = torch.ones(1, 128).bool().cuda()

out = model(feats, coors, mask) # (1, 128)

out.type0 # invariant type 0    - (1, 128)
out.type1 # equivariant type 1  - (1, 128, 3)

This repository also includes a way to decouple memory usage from depth using reversible networks. In other words, if you increase depth, the memory cost will stay constant at the usage of one equiformer transformer block (attention and feedforward).

import torch
from equiformer_pytorch import Equiformer

model = Equiformer(
    num_tokens = 24,
    dim = (4, 4, 2),
    dim_head = (4, 4, 4),
    heads = (2, 2, 2),
    num_degrees = 3,
    depth = 48,          # depth of 48 - just to show that it runs - in reality, seems to be quite unstable at higher depths, so architecture stil needs more work
    reversible = True,   # just set this to True to use https://arxiv.org/abs/1707.04585
).cuda()

feats = torch.randint(0, 24, (1, 128)).cuda()
coors = torch.randn(1, 128, 3).cuda()
mask  = torch.ones(1, 128).bool().cuda()

out = model(feats, coors, mask)

out.type0.sum().backward()

Edges

with edges, ex. atomic bonds

import torch
from equiformer_pytorch import Equiformer

model = Equiformer(
    num_tokens = 28,
    dim = 64,
    num_edge_tokens = 4,       # number of edge type, say 4 bond types
    edge_dim = 16,             # dimension of edge embedding
    depth = 2,
    input_degrees = 1,
    num_degrees = 3,
    reduce_dim_out = True
)

atoms = torch.randint(0, 28, (2, 32))
bonds = torch.randint(0, 4, (2, 32, 32))
coors = torch.randn(2, 32, 3)
mask  = torch.ones(2, 32).bool()

out = model(atoms, coors, mask, edges = bonds)

out.type0 # (2, 32)
out.type1 # (2, 32, 3)

with adjacency matrix

import torch
from equiformer_pytorch import Equiformer

model = Equiformer(
    dim = 32,
    heads = 8,
    depth = 1,
    dim_head = 64,
    num_degrees = 2,
    valid_radius = 10,
    reduce_dim_out = True,
    attend_sparse_neighbors = True,  # this must be set to true, in which case it will assert that you pass in the adjacency matrix
    num_neighbors = 0,               # if you set this to 0, it will only consider the connected neighbors as defined by the adjacency matrix. but if you set a value greater than 0, it will continue to fetch the closest points up to this many, excluding the ones already specified by the adjacency matrix
    num_adj_degrees_embed = 2,       # this will derive the second degree connections and embed it correctly
    max_sparse_neighbors = 8         # you can cap the number of neighbors, sampled from within your sparse set of neighbors as defined by the adjacency matrix, if specified
)

feats = torch.randn(1, 128, 32)
coors = torch.randn(1, 128, 3)
mask  = torch.ones(1, 128).bool()

# placeholder adjacency matrix
# naively assuming the sequence is one long chain (128, 128)

i = torch.arange(128)
adj_mat = (i[:, None] <= (i[None, :] + 1)) & (i[:, None] >= (i[None, :] - 1))

out = model(feats, coors, mask, adj_mat = adj_mat)

out.type0 # (1, 128)
out.type1 # (1, 128, 3)

Appreciation

  • StabilityAI for the generous sponsorship, as well as my other sponsors out there

Testing

Tests for equivariance etc

$ python setup.py test

Example

First install sidechainnet

$ pip install sidechainnet

Then run the protein backbone denoising task

$ python denoise.py

Todo

  • move xi and xj separate project and sum logic into Conv class

  • move self interacting key / value production into Conv, fix no pooling in conv with self interaction

  • go with a naive way to split up contribution from input degrees for DTP

  • for dot product attention in higher types, try euclidean distance

  • consider a all-neighbors attention layer just for type0, using linear attention

  • integrate the new finding from spherical channels paper, followed up by so(3) -> so(2) paper, which reduces the computation from O(L^6) -> O(L^3)!

    • add rotation matrix -> ZYZ euler angles
    • function for deriving rotation matrix for r_ij -> (0, 1, 0)
    • prepare get_basis to return D for rotating representations to (0, 1, 0) to greatly simplify spherical harmonics
    • add tests for batch rotating vectors to align with another - handle edge cases (0, 0, 0)?
    • redo get_basis to only calculate spherical harmonics Y for (0, 1, 0) and cache
    • do the further optimization to remove clebsch gordan (since m_i only depends on m_o), as noted in eSCN paper
    • validate one can train at higher degrees
    • figure out the whole linear bijection argument in appendix of eSCN and why parameterized lf can be removed
    • figure out why training NaNs with float32
    • refactor into full so3 -> so2 linear layer, as proposed in eSCN paper
    • add equiformer v2, and start looking into equivariant protein backbone diffusion again

Citations

@article{Liao2022EquiformerEG,
    title   = {Equiformer: Equivariant Graph Attention Transformer for 3D Atomistic Graphs},
    author  = {Yi Liao and Tess E. Smidt},
    journal = {ArXiv},
    year    = {2022},
    volume  = {abs/2206.11990}
}
@article {Lee2022.10.07.511322,
    author  = {Lee, Jae Hyeon and Yadollahpour, Payman and Watkins, Andrew and Frey, Nathan C. and Leaver-Fay, Andrew and Ra, Stephen and Cho, Kyunghyun and Gligorijevic, Vladimir and Regev, Aviv and Bonneau, Richard},
    title   = {EquiFold: Protein Structure Prediction with a Novel Coarse-Grained Structure Representation},
    elocation-id = {2022.10.07.511322},
    year    = {2022},
    doi     = {10.1101/2022.10.07.511322},
    publisher = {Cold Spring Harbor Laboratory},
    URL     = {https://www.biorxiv.org/content/early/2022/10/08/2022.10.07.511322},
    eprint  = {https://www.biorxiv.org/content/early/2022/10/08/2022.10.07.511322.full.pdf},
    journal = {bioRxiv}
}
@article{Shazeer2019FastTD,
    title   = {Fast Transformer Decoding: One Write-Head is All You Need},
    author  = {Noam M. Shazeer},
    journal = {ArXiv},
    year    = {2019},
    volume  = {abs/1911.02150}
}
@misc{ding2021cogview,
    title   = {CogView: Mastering Text-to-Image Generation via Transformers},
    author  = {Ming Ding and Zhuoyi Yang and Wenyi Hong and Wendi Zheng and Chang Zhou and Da Yin and Junyang Lin and Xu Zou and Zhou Shao and Hongxia Yang and Jie Tang},
    year    = {2021},
    eprint  = {2105.13290},
    archivePrefix = {arXiv},
    primaryClass = {cs.CV}
}
@inproceedings{Kim2020TheLC,
    title   = {The Lipschitz Constant of Self-Attention},
    author  = {Hyunjik Kim and George Papamakarios and Andriy Mnih},
    booktitle = {International Conference on Machine Learning},
    year    = {2020}
}
@article{Zitnick2022SphericalCF,
    title   = {Spherical Channels for Modeling Atomic Interactions},
    author  = {C. Lawrence Zitnick and Abhishek Das and Adeesh Kolluru and Janice Lan and Muhammed Shuaibi and Anuroop Sriram and Zachary W. Ulissi and Brandon C. Wood},
    journal = {ArXiv},
    year    = {2022},
    volume  = {abs/2206.14331}
}
@article{Passaro2023ReducingSC,
  title     = {Reducing SO(3) Convolutions to SO(2) for Efficient Equivariant GNNs},
  author    = {Saro Passaro and C. Lawrence Zitnick},
  journal   = {ArXiv},
  year      = {2023},
  volume    = {abs/2302.03655}
}
@inproceedings{Gomez2017TheRR,
    title   = {The Reversible Residual Network: Backpropagation Without Storing Activations},
    author  = {Aidan N. Gomez and Mengye Ren and Raquel Urtasun and Roger Baker Grosse},
    booktitle = {NIPS},
    year    = {2017}
}
@article{Bondarenko2023QuantizableTR,
    title   = {Quantizable Transformers: Removing Outliers by Helping Attention Heads Do Nothing},
    author  = {Yelysei Bondarenko and Markus Nagel and Tijmen Blankevoort},
    journal = {ArXiv},
    year    = {2023},
    volume  = {abs/2306.12929},
    url     = {https://api.semanticscholar.org/CorpusID:259224568}
}
@inproceedings{Arora2023ZoologyMA,
  title   = {Zoology: Measuring and Improving Recall in Efficient Language Models},
  author  = {Simran Arora and Sabri Eyuboglu and Aman Timalsina and Isys Johnson and Michael Poli and James Zou and Atri Rudra and Christopher R'e},
  year    = {2023},
  url     = {https://api.semanticscholar.org/CorpusID:266149332}
}

equiformer-pytorch's People

Contributors

anton-bushuiev avatar hypnopump avatar javierbq avatar lucidrains avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

equiformer-pytorch's Issues

Why this implementation takes up much more memory than equifold?

Hi, great work! I find this library very memory-intensive. How can I reduce the memory usage? Do you have any plans to reduce the occupation of GPU memory?
When I run the following code, I will get CUDA out of memory Error on RTX4090 which have 24GB gpu memory.

import torch

from equiformer_pytorch.equiformer_pytorch import Equiformer

model = Equiformer(
    dim=128,
    depth=2,
    l2_dist_attention=True,
    reduce_dim_out=True
).to('cuda')

feats = torch.randn(2, 64, 128, device='cuda')
coors = torch.randn(2, 64, 3, device='cuda')
mask = torch.ones(2, 64, dtype=torch.bool, device='cuda')

out = model(feats, coors, mask)

and error is

...
File "/xxx/equiformer-pytorch/equiformer_pytorch/equiformer_pytorch.py", line 426, in forward
    out = out + R[..., i] * B[..., i]
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.66 GiB (GPU 0; 23.99 GiB total capacity; 21.60 GiB already allocated; 0 bytes free; 22.44 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Looking forward to your reply, thanks!

Specifying edge index / adjacency

Hi, @lucidrains! Thank you for your impressive work!

Equiformer, as seen in the architecture figure, takes a graph as an input. I am, however, not sure how to specify custom edges (adjacency matrix) to Equiformer.forward. I see that it is already implemented but I am still confused how I should properly use edges and neighbor_mask. I would be very grateful if you could add a simple example with a minimalistic graph.

Question About Graph Sparsity/Edges

Hey There,
So I'm currently trying to use the equiformer for a protein/ligand prediction task. I've inherited the dataset from an earlier model I've made and it is in the PyG batching format of one large graph made of sub graphs. I've got the adjacency matrices made of shape [1, N, N] as shown in example and am passing to the model. But the loss is directly related to the size of the batch being fed, which means something is up with graphs talking to one another. I'm using the settings of 'num_neighbors=0' and 'max_sparse_neighbors=32'. My understanding from the documentation is that this means I'll only be selecting 32 neighbors for each node, and those neighbors must come from the adjacency matrix. Is that understanding correct? Or if there are some small graphs with >32 nodes am I going to start cross contaminating? Additionally, if I wanted to convert the format of dataset to the suggested batching system (with masks), would it I simply set num_neighbors to 32 and call it a day?

Adapting Equiformer for Efficient Handling of Graphs with Sparse Matrix in COO format?

Thank you for implementing Equiformer. Decoupling the model from the original OC20 tasks significantly broadens its applicability.

In my project, I'm utilizing PyTorch Geometric, where batches are merged into a large graph, and edges are represented in COO format of a sparse matrix. I noticed that you have implemented support for sparse matrices. However, the dimension remains NxN, which becomes impractically large with an increased number of nodes.

Could you consider adapting Equiformer to not always operate on an NxN basis, but instead focus on a subset of nodes at a time, with edges defined in COO format?

Thanks.

Dependency Conflict

Hey There,
Was attempting to do a pip install and run of equiformer in colab. Ran into a dependency issue. tensorflow-probability 0.22.0 was the native install but it requires a typing extension of 4.6.0 or less. So had to do a tensorflow-probability update prior to the pip install of equiformer. Not a big deal but wanted to post.

Error when use equiformer-pytorch

Hi, I install the equiformer-pytorch , but there is an error when I use it in my project:

File "/root/autodl-tmp/project/DeepPROTACs/model.py", line 7, in
from equiformer_pytorch import Equiformer
File "/root/miniconda3/envs/DeepPROTACs/lib/python3.7/site-packages/equiformer_pytorch/init.py", line 1, in
from equiformer_pytorch.equiformer_pytorch import Equiformer
File "/root/miniconda3/envs/DeepPROTACs/lib/python3.7/site-packages/equiformer_pytorch/equiformer_pytorch.py", line 121, in
class Linear(nn.Module):
File "/root/miniconda3/envs/DeepPROTACs/lib/python3.7/site-packages/beartype/_decor/main.py", line 193, in beartype
return beartype_args_mandatory(obj, conf)
File "/root/miniconda3/envs/DeepPROTACs/lib/python3.7/site-packages/beartype/_decor/_core.py", line 123, in beartype_args_mandatory
return _beartype_type(cls=obj, conf=conf) # type: ignore[return-value]
File "/root/miniconda3/envs/DeepPROTACs/lib/python3.7/site-packages/beartype/_decor/_core.py", line 313, in _beartype_type
f'{repr(cls)} not decoratable by @beartype, as '
beartype.roar.BeartypeDecorWrappeeException: <class 'equiformer_pytorch.equiformer_pytorch.Linear'> not decoratable by @beartype, as non-dataclasses (i.e., types not decorated by @dataclasses.dataclass) currently unsupported by @beartype.

I thought it was a problem with the version of the beartype package, but when I tried various versions the problem still persisted. I don't know how to fix it?
Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.