astrazeneca / chemicalx Goto Github PK

A PyTorch and TorchDrug based deep learning library for drug pair scoring. (KDD 2022)

Home Page: https://chemicalx.readthedocs.io

License: Apache License 2.0

Python 99.95% Shell 0.05%

deep-learning pytorch deep-chemistry graph-neural-network drug drug-pair polypharmacy drug-discovery pharma drug-interaction

chemicalx's People

Contributors

Stargazers

Watchers

Forkers

cthoyt sailfish009 bgyori kkaris hzcheney sebastiandro quocdat32461997 ameerhamza111 andrejlamov laplacekorea andriy-nikolov yuwvandy gavedwards terragord7 deepsystemspharmacology bbaillif shiyx409 ngo010 manikant92 vivek1240 fathimhiri amgfernandes stjordanis dimmu bbyun28 kevkle manu87ds minh-caolecong chsreenivas-india vishalbelsare masinde70 bukenyalukman abhishekdutt-blr ardeat sukritipaul05 markussagen antiverso takshan nvmoyar gachet bkbonde gcostaneto vinayasathyanarayana loesterfranco dualword eaglepython mew233 limberc whitehat32 tianyuzelin abdulk084 joskid terrisgo desmondteoch abchotujnn1 minoh0201 keangzhu gaoshan2006 aliushn rnaimehaom garrymorris cpusummer-wdn nds-vu yingzi0 cshukai amanzadi xiangyan93 shouhengtuo sameensanobarsubiya gg-big-org mengfei25 ajunlonglive avi-pomicell nickdst sarwanpasha brennadeadyw jjaniel apcial1 bishnukuet kuankuan0222 cgeof mughetto qiaoyu-tan vhwebdesigner healthwhale avamos2 lucag2 fcas

chemicalx's Issues

Setup the basic repo documentation.

Add the relevant basic parts of a readthedocs documentation.
Setup hooks.

Typo within documentation

Typo within documentation of data_processing.rst.

Add the DrugComb and DrugCombDB data cleaning

Add the MatchMaker model

Dear @andrejlamov,

Please read the paper first. It is here.
There is also code-release with the paper here.
After that read the contributing guidelines.
If there is an existing open-source version of the model please take a look.
ChemicalX is built on top of PyTorch 1.10. and torchdrug.
A similar model is DeepSynergy which usesFeedForward neural network to generate drug representations. Take a look at the layer definition here.
The library heavily builds on top on torchdrug and molecules in batches are PackedGraphs.
There is already a model class under ./chemicalx/models/
Context features, drug level features, and labels are all FloatTensors.
Look at the examples and tests under ./examples/ and ./tests/.
Add auxiliary layers as you see fit - please document these, add tests, and add these layers to the main readme.md if needed.
Add typing to the initialization and forward pass.
Non-data-dependent hyper-ammeters should have default values.
Please add tests under ./tests/ and make sure that your model/layer is tested with real data.
Write an example under ./examples/. What is the AUC on the test set? Is it reasonable?

Incorporate various dataset splits

I also have one suggestion for future updates of this library perhaps. The current dataloaders, if I'm not mistaken, are not considering the different dataset split strategies. Recent works have highlighted the importance of evaluations on different dataset splits, e.g. split pairs, split drugs, split cell lines (for synergy), etc. It would be great to see this library also having such features.

Add the DeepDDI model

Dear @hzcheney,

Please read the paper first. It is here.
After that read the contributing guidelines.
If there is an existing open source version of the model please take a look.
ChemicalX is built on top of PyTorch 1.10. and torchdrug.
A similar model is which uses to generate drug representations. Take a look at the layer definition here.
The library heavily builds on top on torchdrug and molecules in batches are PackedGraphs.
There is already a model class under ./chemicalx/models/
Context features, drug level features and labels are all FloatTensors.
Look at the examples and tests under ./examples/ and ./tests/.
Add auxiliary layers as you see fit - please document these, add tests and add these layers to the main readme.md if needed.
Add typing to the initialisation and forward pass.
Non data dependent hyperparameters should have default values.
Please add tests under ./tests/ and make sure that your model/layer is tested with real data.
Write an example under ./examples/. What is the AUC on the test set? Is it reasonable?

Add the SSI-DDI Model

Please read the paper first. It is here.
There is also code-release with the paper here.
After that read the contributing guidelines.
If there is an existing open-source version of the model please take a look.
ChemicalX is built on top of PyTorch 1.10. and torchdrug.
A similar model is EPGCNDS which usesGraphConvolutions to generate drug representations. Take a look at the layer definition here. You should use the layers from torchdrug not the models.
The library heavily builds on top on torchdrug and molecules in batches are PackedGraphs.
There is already a model class under ./chemicalx/models/
Context features, drug level features, and labels are all FloatTensors.
Look at the examples and tests under ./examples/ and ./tests/.
Add auxiliary layers as you see fit - please document these, add tests, and add these layers to the main readme.md if needed.
Add typing to the initialization and forward pass.
Non-data-dependent hyper-ammeters should have default values.
Please add tests under ./tests/ and make sure that your model/layer is tested with real data.
Write an example under ./examples/. What is the AUC on the test set? Is it reasonable?

BatchGenerator move back to name space

BatchGenerator was removed from the central data namespace. This should not be like this.

Add the MHCADDI model

Dear @sebastiandro,

Please read the paper first. It is here.
After that read the contributing guidelines.
If there is an existing open source version of the model please take a look.
ChemicalX is built on top of PyTorch 1.10. and torchdrug.
A similar model is which uses to generate drug representations. Take a look at the layer definition here.
The library heavily builds on top on torchdrug and molecules in batches are PackedGraphs.
There is already a model class under ./chemicalx/models/
Context features, drug level features and labels are all FloatTensors.
Look at the examples and tests under ./examples/ and ./tests/.
Add auxiliary layers as you see fit - please document these, add tests and add these layers to the main readme.md if needed.
Add typing to the initialisation and forward pass.
Non data dependent hyper-ammeters should have default values.
Please add tests under ./tests/ and make sure that your model/layer is tested with real data.
Write an example under ./examples/. What is the AUC on the test set? Is it reasonable?

DeepDDS Softmax bug

The softmax layer in Deep DDS Squashes makes the probabilities converge to one. Should be replaced with sigmoid.

Data resolver tests

Add tests for the data resolvers.

ChemicalX Tutorials are not working

The tutorial code here is not working, can you fix it? Also do you have any Jupyter notebooks coming up or available soon?

Tutorials with code that doesn't work
https://chemicalx.readthedocs.io/en/latest/notes/tutorial.html

How are the methods implemented outside of the domain they are designed for?

For example, DeepSynergy and MatchMaker are requiring cell line information, and they are both implemented in the DrugBankDDI & TWOSIDES benchmarks where no cell line information is available at all (and TWOSIDES is even at the patient level), with DS reaching the highest performance among all methods. What then was the "cell line gene expression" component in both methods replaced within those tasks? Also, does this ensure a fair comparison?

Code quality assurance

Add black, flake8, isort, etc. code quality assurance. This will be very important if/when other people start contributing.

Add the GCNBMP model

Please read the paper first. It is here.
There is also code-release with the paper here.
After that read the contributing guidelines.
If there is an existing open-source version of the model please take a look.
ChemicalX is built on top of PyTorch 1.10. and torchdrug.
The library heavily builds on top on torchdrug and molecules in batches are PackedGraphs.
There is already a model class under ./chemicalx/models/
Context features, drug level features, and labels are all FloatTensors.
Look at the examples and tests under ./examples/ and ./tests/.
Add auxiliary layers as you see fit - please document these, add tests, and add these layers to the main readme.md if needed.
Add typing to the initialization and forward pass.
Non-data-dependent hyperparameters should have default values.
Please add tests under ./tests/ and make sure that your model/layer is tested with real data.
Write an example under ./examples/. What is the AUC on the test set? Is it reasonable?

Fix readme

Inconsistent labels in DrugCombDB

Hi , it looks like there are a few drug pairs within the same context are labelled inconsistently . For example , drug 59691338 and drug 11960529 in EFM192B.

Contribution guide - forking and ssh needed

In the contribution guide, we have

2. Clone this repo
git clone https://github.com/AstraZeneca/chemicalx

however, people won't generally have permission to push branches to AstraZeneca/chemicalx directly. Instead, I think we should describe that users should first fork the repository, then push to a new branch on their fork, and then pull request to AstraZeneca/chemicalx. Another issue is that for the push to work, I think ssh instead of https should be used when cloning. Shall I update the guidelines according to the above?

KeyError: 'node_feature' from running deepdds_example.py

Please see the following log.

 File "deepdds.py", line 27, in <module>
    main()
  File "deepdds.py", line 14, in main
    results = pipeline(
  File "/storage/htc/nih-tcga/sc724/conda/synergy/lib/python3.8/site-packages/chemicalx/pipeline                                                        .py", line 155, in pipeline
    prediction = model(*model.unpack(batch))
  File "/storage/htc/nih-tcga/sc724/conda/synergy/lib/python3.8/site-packages/torch/nn/modules/m                                                        odule.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/storage/htc/nih-tcga/sc724/conda/synergy/lib/python3.8/site-packages/chemicalx/models/d                                                        eepdds.py", line 176, in forward
    features_left = self._forward_molecules(molecules_left)
  File "/storage/htc/nih-tcga/sc724/conda/synergy/lib/python3.8/site-packages/chemicalx/models/d                                                        eepdds.py", line 158, in _forward_molecules
    features = self.drug_conv(molecules, molecules.data_dict["node_feature"])["node_feature"]
KeyError: 'node_feature'

Add the DDI Dataset from DeepDDI

The triples are here Dataset.

Fix Citation

Citation in the body and the CIF file.

Add the DeepDDS model

Please read the paper first. It is here.
There is also code-release with the paper here.
After that read the contributing guidelines.
If there is an existing open-source version of the model please take a look.
ChemicalX is built on top of PyTorch 1.10. and torchdrug.
The library heavily builds on top on torchdrug and molecules in batches are PackedGraphs.
There is already a model class under ./chemicalx/models/
Context features, drug level features, and labels are all FloatTensors.
Look at the examples and tests under ./examples/ and ./tests/.
Add auxiliary layers as you see fit - please document these, add tests, and add these layers to the main readme.md if needed.
Add typing to the initialization and forward pass.
Non-data-dependent hyper-ammeters should have default values.
Please add tests under ./tests/ and make sure that your model/layer is tested with real data.
Write an example under ./examples/. What is the AUC on the test set? Is it reasonable?

Create data loader

Load drug side features.
Load triples.
Document.
Test generation.

Add the TWOSIDES dataset with molecular features

The dataset has to be mapped to some common identifier system.
We need the molecular structures.
We also need negative samples.

Add the MR-GNN model

Dear @cthoyt,

Please read the paper first. It is here.
There is also code-release with the paper here.
After that read the contributing guidelines.
If there is an existing open-source version of the model please take a look.
ChemicalX is built on top of PyTorch 1.10. and torchdrug.
The library heavily builds on top on torchdrug and molecules in batches are PackedGraphs.
There is already a model class under ./chemicalx/models/
Context features, drug level features, and labels are all FloatTensors.
Look at the examples and tests under ./examples/ and ./tests/.
Add auxiliary layers as you see fit - please document these, add tests, and add these layers to the main readme.md if needed.
Add typing to the initialization and forward pass.
Non-data-dependent hyper-ammeters should have default values.
Please add tests under ./tests/ and make sure that your model/layer is tested with real data.
Write an example under ./examples/. What is the AUC on the test set? Is it reasonable?

Pipeline - Unexpected Labels

I tried the pipeline code, on your github page and it resulted in the following error

from chemicalx import pipeline
from chemicalx.models import DeepSynergy
from chemicalx.data import DrugCombDB

model = DeepSynergy(context_channels=112, drug_channels=256)
dataset = DrugCombDB()

results = pipeline(
    dataset=dataset,
    model=model,
    # Data arguments
    batch_size=5120,
    context_features=True,
    drug_features=True,
    drug_molecules=False,
    labels=True,
    # Training arguments
    epochs=100,
)

# Outputs information about the AUC-ROC, etc. to the console.
results.summarize()

# Save the model, losses, evaluation, and other metadata.
results.save("~/test_results/")



TypeError                                 Traceback (most recent call last)

[<ipython-input-9-b0c19c387d65>](https://localhost:8080/#) in <module>()
     16     labels=True,
     17     # Training arguments
---> 18     epochs=100,
     19 )
     20 

TypeError: pipeline() got an unexpected keyword argument 'labels'

Make dataset loaders on-the-fly

I think it would be better to have the dataset download and processing happen client-side, then use pystow to store the results in a reliable place. This would also allow the TWOSIDES and DrugBank datasets, which require random negative sampling, to be used with multiple random seeds, e.g. to investigate the robustness of results. Further, it would allow for a more idiomatic dataset loader that's extensible to new datasets

Depends on:

Setup documentation.

Setup the readthedocs base.

GPU Transfer

Accelerating training by transfer to GPU.

Add the DeepDrug Model

Dear @kajocina,

Please read the paper first. It is here.
There is also code-release with the paper here.
After that read the contributing guidelines.
If there is an existing open-source version of the model please take a look.
ChemicalX is built on top of PyTorch 1.10. and torchdrug.
A similar model is EPGCNDS which usesGraphConvolutions to generate drug representations. Take a look at the layer definition here. You should use the layers from torchdrug not the models.
The library heavily builds on top on torchdrug and molecules in batches are PackedGraphs.
There is already a model class under ./chemicalx/models/
Context features, drug level features, and labels are all FloatTensors.
Look at the examples and tests under ./examples/ and ./tests/.
Add auxiliary layers as you see fit - please document these, add tests, and add these layers to the main readme.md if needed.
Add typing to the initialization and forward pass.
Non-data-dependent hyper-ammeters should have default values.
Please add tests under ./tests/ and make sure that your model/layer is tested with real data.
Write an example under ./examples/. What is the AUC on the test set? Is it reasonable?

Add the DPDDI model

Readme example

Add example in the readme.

Add the DeepCCI model.

The original paper: https://arxiv.org/abs/1704.08432

Benedek Rozemberczki.
Write auxiliary layers.
Write a model architecture layer.
Write documentation.
Write smoke tests with the test data.
Write an example for the ./examples/ folder.

Add the CASTER model

Please read the paper first. It is here.
There is also code-release with the paper here.
After that read the contributing guidelines.
If there is an existing open-source version of the model please take a look.
ChemicalX is built on top of PyTorch 1.10. and torchdrug.
The library heavily builds on top on torchdrug and molecules in batches are PackedGraphs.
There is already a model class under ./chemicalx/models/
Context features, drug level features, and labels are all FloatTensors.
Look at the examples and tests under ./examples/ and ./tests/.
Add auxiliary layers as you see fit - please document these, add tests, and add these layers to the main readme.md if needed.
Add typing to the initialization and forward pass.
Non-data-dependent hyperparameters should have default values.
Please add tests under ./tests/ and make sure that your model/layer is tested with real data.
Write an example under ./examples/. What is the AUC on the test set? Is it reasonable?

Performance bug of deepsynergy, deepdds and matchmaker on DrugBankDDI

It seems deepsynergy, deepdds and matchmaker use the groundtruth (which is called context in the dataset) as model input. So the models performance are extremely high.

Is this repo dead? Improve communication of its status

I'm under the impression that AstraZeneca isn't allocating resources to maintaining this repository or answering questions. Is this correct?

I don't feel comfortable answering questions or maintaining this as long as it lives in the AstraZeneca namespace and I'm not being paid for consulting.

I don't recall anyone else being active in the repository besides minor model-specific contributions pre-publication. If it's the case that AZ doesn't have any plans for this, then I think we should minimally put a notice on the README saying so and also potentially archive this repository.

'LabeledTriples' object is not iterable

got an error when trying to reproduce the snippet in the parper
line 14 prediction = model(batch.context_features,batch.drug_features_left,batch.drug_features_right)

TypeError: 'LabeledTriples' object is not iterable]()

Add the DeepSynergy model

Benedek

Tensor's device mismatch

Hi! I have found a bug during the training of the caster model. It was caused by the torch.eye manipulation, simply it did not specify the device. When the Cuda is available, torch.eye will create the tensor on the CPU while the whole model is on the GPU.

Add the CASTER model

benedek

astrazeneca / chemicalx Goto Github PK

chemicalx's People

Contributors

Stargazers

Watchers

Forkers

chemicalx's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs