GithubHelp home page GithubHelp logo

astrazeneca / chemicalx Goto Github PK

View Code? Open in Web Editor NEW
700.0 22.0 89.0 20.87 MB

A PyTorch and TorchDrug based deep learning library for drug pair scoring. (KDD 2022)

Home Page: https://chemicalx.readthedocs.io

License: Apache License 2.0

Python 99.95% Shell 0.05%
deep-learning pytorch deep-chemistry graph-neural-network drug drug-pair polypharmacy drug-discovery pharma drug-interaction

chemicalx's People

Contributors

andrejlamov avatar andriy-nikolov avatar avi-pomicell avatar benedekrozemberczki avatar bgyori avatar bliutech avatar cthoyt avatar hzcheney avatar kajocina avatar kkaris avatar mughetto avatar sebastiandro avatar yuwvandy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

chemicalx's Issues

Add the MatchMaker model

Dear @andrejlamov,

  • Please read the paper first. It is here.
  • There is also code-release with the paper here.
  • After that read the contributing guidelines.
  • If there is an existing open-source version of the model please take a look.
  • ChemicalX is built on top of PyTorch 1.10. and torchdrug.
  • A similar model is DeepSynergy which usesFeedForward neural network to generate drug representations. Take a look at the layer definition here.
  • The library heavily builds on top on torchdrug and molecules in batches are PackedGraphs.
  • There is already a model class under ./chemicalx/models/
  • Context features, drug level features, and labels are all FloatTensors.
  • Look at the examples and tests under ./examples/ and ./tests/.
  • Add auxiliary layers as you see fit - please document these, add tests, and add these layers to the main readme.md if needed.
  • Add typing to the initialization and forward pass.
  • Non-data-dependent hyper-ammeters should have default values.
  • Please add tests under ./tests/ and make sure that your model/layer is tested with real data.
  • Write an example under ./examples/. What is the AUC on the test set? Is it reasonable?

Incorporate various dataset splits

I also have one suggestion for future updates of this library perhaps. The current dataloaders, if I'm not mistaken, are not considering the different dataset split strategies. Recent works have highlighted the importance of evaluations on different dataset splits, e.g. split pairs, split drugs, split cell lines (for synergy), etc. It would be great to see this library also having such features.

Add the DeepDDI model

Dear @hzcheney,

  • Please read the paper first. It is here.
  • After that read the contributing guidelines.
  • If there is an existing open source version of the model please take a look.
  • ChemicalX is built on top of PyTorch 1.10. and torchdrug.
  • A similar model is which uses to generate drug representations. Take a look at the layer definition here.
  • The library heavily builds on top on torchdrug and molecules in batches are PackedGraphs.
  • There is already a model class under ./chemicalx/models/
  • Context features, drug level features and labels are all FloatTensors.
  • Look at the examples and tests under ./examples/ and ./tests/.
  • Add auxiliary layers as you see fit - please document these, add tests and add these layers to the main readme.md if needed.
  • Add typing to the initialisation and forward pass.
  • Non data dependent hyperparameters should have default values.
  • Please add tests under ./tests/ and make sure that your model/layer is tested with real data.
  • Write an example under ./examples/. What is the AUC on the test set? Is it reasonable?

Add the SSI-DDI Model

  • Please read the paper first. It is here.
  • There is also code-release with the paper here.
  • After that read the contributing guidelines.
  • If there is an existing open-source version of the model please take a look.
  • ChemicalX is built on top of PyTorch 1.10. and torchdrug.
  • A similar model is EPGCNDS which usesGraphConvolutions to generate drug representations. Take a look at the layer definition here. You should use the layers from torchdrug not the models.
  • The library heavily builds on top on torchdrug and molecules in batches are PackedGraphs.
  • There is already a model class under ./chemicalx/models/
  • Context features, drug level features, and labels are all FloatTensors.
  • Look at the examples and tests under ./examples/ and ./tests/.
  • Add auxiliary layers as you see fit - please document these, add tests, and add these layers to the main readme.md if needed.
  • Add typing to the initialization and forward pass.
  • Non-data-dependent hyper-ammeters should have default values.
  • Please add tests under ./tests/ and make sure that your model/layer is tested with real data.
  • Write an example under ./examples/. What is the AUC on the test set? Is it reasonable?

Add the MHCADDI model

Dear @sebastiandro,

  • Please read the paper first. It is here.
  • After that read the contributing guidelines.
  • If there is an existing open source version of the model please take a look.
  • ChemicalX is built on top of PyTorch 1.10. and torchdrug.
  • A similar model is which uses to generate drug representations. Take a look at the layer definition here.
  • The library heavily builds on top on torchdrug and molecules in batches are PackedGraphs.
  • There is already a model class under ./chemicalx/models/
  • Context features, drug level features and labels are all FloatTensors.
  • Look at the examples and tests under ./examples/ and ./tests/.
  • Add auxiliary layers as you see fit - please document these, add tests and add these layers to the main readme.md if needed.
  • Add typing to the initialisation and forward pass.
  • Non data dependent hyper-ammeters should have default values.
  • Please add tests under ./tests/ and make sure that your model/layer is tested with real data.
  • Write an example under ./examples/. What is the AUC on the test set? Is it reasonable?

DeepDDS Softmax bug

The softmax layer in Deep DDS Squashes makes the probabilities converge to one. Should be replaced with sigmoid.

How are the methods implemented outside of the domain they are designed for?

For example, DeepSynergy and MatchMaker are requiring cell line information, and they are both implemented in the DrugBankDDI & TWOSIDES benchmarks where no cell line information is available at all (and TWOSIDES is even at the patient level), with DS reaching the highest performance among all methods. What then was the "cell line gene expression" component in both methods replaced within those tasks? Also, does this ensure a fair comparison?

Code quality assurance

Add black, flake8, isort, etc. code quality assurance. This will be very important if/when other people start contributing.

Add the GCNBMP model

  • Please read the paper first. It is here.
  • There is also code-release with the paper here.
  • After that read the contributing guidelines.
  • If there is an existing open-source version of the model please take a look.
  • ChemicalX is built on top of PyTorch 1.10. and torchdrug.
  • The library heavily builds on top on torchdrug and molecules in batches are PackedGraphs.
  • There is already a model class under ./chemicalx/models/
  • Context features, drug level features, and labels are all FloatTensors.
  • Look at the examples and tests under ./examples/ and ./tests/.
  • Add auxiliary layers as you see fit - please document these, add tests, and add these layers to the main readme.md if needed.
  • Add typing to the initialization and forward pass.
  • Non-data-dependent hyperparameters should have default values.
  • Please add tests under ./tests/ and make sure that your model/layer is tested with real data.
  • Write an example under ./examples/. What is the AUC on the test set? Is it reasonable?

Inconsistent labels in DrugCombDB

Hi , it looks like there are a few drug pairs within the same context are labelled inconsistently . For example , drug 59691338 and drug 11960529 in EFM192B.

Contribution guide - forking and ssh needed

In the contribution guide, we have

2. Clone this repo
git clone https://github.com/AstraZeneca/chemicalx

however, people won't generally have permission to push branches to AstraZeneca/chemicalx directly. Instead, I think we should describe that users should first fork the repository, then push to a new branch on their fork, and then pull request to AstraZeneca/chemicalx. Another issue is that for the push to work, I think ssh instead of https should be used when cloning. Shall I update the guidelines according to the above?

KeyError: 'node_feature' from running deepdds_example.py

Please see the following log.

 File "deepdds.py", line 27, in <module>
    main()
  File "deepdds.py", line 14, in main
    results = pipeline(
  File "/storage/htc/nih-tcga/sc724/conda/synergy/lib/python3.8/site-packages/chemicalx/pipeline                                                        .py", line 155, in pipeline
    prediction = model(*model.unpack(batch))
  File "/storage/htc/nih-tcga/sc724/conda/synergy/lib/python3.8/site-packages/torch/nn/modules/m                                                        odule.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/storage/htc/nih-tcga/sc724/conda/synergy/lib/python3.8/site-packages/chemicalx/models/d                                                        eepdds.py", line 176, in forward
    features_left = self._forward_molecules(molecules_left)
  File "/storage/htc/nih-tcga/sc724/conda/synergy/lib/python3.8/site-packages/chemicalx/models/d                                                        eepdds.py", line 158, in _forward_molecules
    features = self.drug_conv(molecules, molecules.data_dict["node_feature"])["node_feature"]
KeyError: 'node_feature'

Add the DeepDDS model

  • Please read the paper first. It is here.
  • There is also code-release with the paper here.
  • After that read the contributing guidelines.
  • If there is an existing open-source version of the model please take a look.
  • ChemicalX is built on top of PyTorch 1.10. and torchdrug.
  • The library heavily builds on top on torchdrug and molecules in batches are PackedGraphs.
  • There is already a model class under ./chemicalx/models/
  • Context features, drug level features, and labels are all FloatTensors.
  • Look at the examples and tests under ./examples/ and ./tests/.
  • Add auxiliary layers as you see fit - please document these, add tests, and add these layers to the main readme.md if needed.
  • Add typing to the initialization and forward pass.
  • Non-data-dependent hyper-ammeters should have default values.
  • Please add tests under ./tests/ and make sure that your model/layer is tested with real data.
  • Write an example under ./examples/. What is the AUC on the test set? Is it reasonable?

Add the MR-GNN model

Dear @cthoyt,

  • Please read the paper first. It is here.
  • There is also code-release with the paper here.
  • After that read the contributing guidelines.
  • If there is an existing open-source version of the model please take a look.
  • ChemicalX is built on top of PyTorch 1.10. and torchdrug.
  • The library heavily builds on top on torchdrug and molecules in batches are PackedGraphs.
  • There is already a model class under ./chemicalx/models/
  • Context features, drug level features, and labels are all FloatTensors.
  • Look at the examples and tests under ./examples/ and ./tests/.
  • Add auxiliary layers as you see fit - please document these, add tests, and add these layers to the main readme.md if needed.
  • Add typing to the initialization and forward pass.
  • Non-data-dependent hyper-ammeters should have default values.
  • Please add tests under ./tests/ and make sure that your model/layer is tested with real data.
  • Write an example under ./examples/. What is the AUC on the test set? Is it reasonable?

Pipeline - Unexpected Labels

I tried the pipeline code, on your github page and it resulted in the following error

from chemicalx import pipeline
from chemicalx.models import DeepSynergy
from chemicalx.data import DrugCombDB

model = DeepSynergy(context_channels=112, drug_channels=256)
dataset = DrugCombDB()

results = pipeline(
    dataset=dataset,
    model=model,
    # Data arguments
    batch_size=5120,
    context_features=True,
    drug_features=True,
    drug_molecules=False,
    labels=True,
    # Training arguments
    epochs=100,
)

# Outputs information about the AUC-ROC, etc. to the console.
results.summarize()

# Save the model, losses, evaluation, and other metadata.
results.save("~/test_results/")


TypeError                                 Traceback (most recent call last)

[<ipython-input-9-b0c19c387d65>](https://localhost:8080/#) in <module>()
     16     labels=True,
     17     # Training arguments
---> 18     epochs=100,
     19 )
     20 

TypeError: pipeline() got an unexpected keyword argument 'labels'

Make dataset loaders on-the-fly

I think it would be better to have the dataset download and processing happen client-side, then use pystow to store the results in a reliable place. This would also allow the TWOSIDES and DrugBank datasets, which require random negative sampling, to be used with multiple random seeds, e.g. to investigate the robustness of results. Further, it would allow for a more idiomatic dataset loader that's extensible to new datasets

Depends on:

Add the DeepDrug Model

Dear @kajocina,

  • Please read the paper first. It is here.
  • There is also code-release with the paper here.
  • After that read the contributing guidelines.
  • If there is an existing open-source version of the model please take a look.
  • ChemicalX is built on top of PyTorch 1.10. and torchdrug.
  • A similar model is EPGCNDS which usesGraphConvolutions to generate drug representations. Take a look at the layer definition here. You should use the layers from torchdrug not the models.
  • The library heavily builds on top on torchdrug and molecules in batches are PackedGraphs.
  • There is already a model class under ./chemicalx/models/
  • Context features, drug level features, and labels are all FloatTensors.
  • Look at the examples and tests under ./examples/ and ./tests/.
  • Add auxiliary layers as you see fit - please document these, add tests, and add these layers to the main readme.md if needed.
  • Add typing to the initialization and forward pass.
  • Non-data-dependent hyper-ammeters should have default values.
  • Please add tests under ./tests/ and make sure that your model/layer is tested with real data.
  • Write an example under ./examples/. What is the AUC on the test set? Is it reasonable?

Add the CASTER model

  • Please read the paper first. It is here.
  • There is also code-release with the paper here.
  • After that read the contributing guidelines.
  • If there is an existing open-source version of the model please take a look.
  • ChemicalX is built on top of PyTorch 1.10. and torchdrug.
  • The library heavily builds on top on torchdrug and molecules in batches are PackedGraphs.
  • There is already a model class under ./chemicalx/models/
  • Context features, drug level features, and labels are all FloatTensors.
  • Look at the examples and tests under ./examples/ and ./tests/.
  • Add auxiliary layers as you see fit - please document these, add tests, and add these layers to the main readme.md if needed.
  • Add typing to the initialization and forward pass.
  • Non-data-dependent hyperparameters should have default values.
  • Please add tests under ./tests/ and make sure that your model/layer is tested with real data.
  • Write an example under ./examples/. What is the AUC on the test set? Is it reasonable?

Is this repo dead? Improve communication of its status

I'm under the impression that AstraZeneca isn't allocating resources to maintaining this repository or answering questions. Is this correct?

I don't feel comfortable answering questions or maintaining this as long as it lives in the AstraZeneca namespace and I'm not being paid for consulting.

I don't recall anyone else being active in the repository besides minor model-specific contributions pre-publication. If it's the case that AZ doesn't have any plans for this, then I think we should minimally put a notice on the README saying so and also potentially archive this repository.

'LabeledTriples' object is not iterable

got an error when trying to reproduce the snippet in the parper
line 14 prediction = model(batch.context_features,batch.drug_features_left,batch.drug_features_right)

TypeError: 'LabeledTriples' object is not iterable]()

Tensor's device mismatch

Hi! I have found a bug during the training of the caster model. It was caused by the torch.eye manipulation, simply it did not specify the device. When the Cuda is available, torch.eye will create the tensor on the CPU while the whole model is on the GPU.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.