astrazeneca / chemicalx Goto Github PK
View Code? Open in Web Editor NEWA PyTorch and TorchDrug based deep learning library for drug pair scoring. (KDD 2022)
Home Page: https://chemicalx.readthedocs.io
License: Apache License 2.0
A PyTorch and TorchDrug based deep learning library for drug pair scoring. (KDD 2022)
Home Page: https://chemicalx.readthedocs.io
License: Apache License 2.0
Typo within documentation of data_processing.rst.
Dear @andrejlamov,
DeepSynergy
which usesFeedForward
neural network to generate drug representations. Take a look at the layer definition here../chemicalx/models/
./examples/
and ./tests/
../tests/
and make sure that your model/layer is tested with real data../examples/
. What is the AUC on the test set? Is it reasonable?I also have one suggestion for future updates of this library perhaps. The current dataloaders, if I'm not mistaken, are not considering the different dataset split strategies. Recent works have highlighted the importance of evaluations on different dataset splits, e.g. split pairs, split drugs, split cell lines (for synergy), etc. It would be great to see this library also having such features.
Dear @hzcheney,
which uses
to generate drug representations. Take a look at the layer definition here../chemicalx/models/
./examples/
and ./tests/
../tests/
and make sure that your model/layer is tested with real data../examples/
. What is the AUC on the test set? Is it reasonable?EPGCNDS
which usesGraphConvolutions
to generate drug representations. Take a look at the layer definition here. You should use the layers from torchdrug
not the models../chemicalx/models/
./examples/
and ./tests/
../tests/
and make sure that your model/layer is tested with real data../examples/
. What is the AUC on the test set? Is it reasonable?BatchGenerator was removed from the central data namespace. This should not be like this.
Dear @sebastiandro,
which uses
to generate drug representations. Take a look at the layer definition here../chemicalx/models/
./examples/
and ./tests/
../tests/
and make sure that your model/layer is tested with real data../examples/
. What is the AUC on the test set? Is it reasonable?The softmax layer in Deep DDS Squashes makes the probabilities converge to one. Should be replaced with sigmoid.
The tutorial code here is not working, can you fix it? Also do you have any Jupyter notebooks coming up or available soon?
Tutorials with code that doesn't work
https://chemicalx.readthedocs.io/en/latest/notes/tutorial.html
For example, DeepSynergy
and MatchMaker
are requiring cell line information, and they are both implemented in the DrugBankDDI & TWOSIDES benchmarks where no cell line information is available at all (and TWOSIDES is even at the patient level), with DS reaching the highest performance among all methods. What then was the "cell line gene expression" component in both methods replaced within those tasks? Also, does this ensure a fair comparison?
Add black, flake8, isort, etc. code quality assurance. This will be very important if/when other people start contributing.
./chemicalx/models/
./examples/
and ./tests/
../tests/
and make sure that your model/layer is tested with real data../examples/
. What is the AUC on the test set? Is it reasonable?Hi , it looks like there are a few drug pairs within the same context are labelled inconsistently . For example , drug 59691338 and drug 11960529 in EFM192B.
In the contribution guide, we have
2. Clone this repo
git clone https://github.com/AstraZeneca/chemicalx
however, people won't generally have permission to push branches to AstraZeneca/chemicalx
directly. Instead, I think we should describe that users should first fork the repository, then push to a new branch on their fork, and then pull request to AstraZeneca/chemicalx
. Another issue is that for the push to work, I think ssh instead of https should be used when cloning. Shall I update the guidelines according to the above?
Please see the following log.
File "deepdds.py", line 27, in <module>
main()
File "deepdds.py", line 14, in main
results = pipeline(
File "/storage/htc/nih-tcga/sc724/conda/synergy/lib/python3.8/site-packages/chemicalx/pipeline .py", line 155, in pipeline
prediction = model(*model.unpack(batch))
File "/storage/htc/nih-tcga/sc724/conda/synergy/lib/python3.8/site-packages/torch/nn/modules/m odule.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/storage/htc/nih-tcga/sc724/conda/synergy/lib/python3.8/site-packages/chemicalx/models/d eepdds.py", line 176, in forward
features_left = self._forward_molecules(molecules_left)
File "/storage/htc/nih-tcga/sc724/conda/synergy/lib/python3.8/site-packages/chemicalx/models/d eepdds.py", line 158, in _forward_molecules
features = self.drug_conv(molecules, molecules.data_dict["node_feature"])["node_feature"]
KeyError: 'node_feature'
The triples are here Dataset.
Citation in the body and the CIF file.
./chemicalx/models/
./examples/
and ./tests/
../tests/
and make sure that your model/layer is tested with real data../examples/
. What is the AUC on the test set? Is it reasonable?Dear @cthoyt,
./chemicalx/models/
./examples/
and ./tests/
../tests/
and make sure that your model/layer is tested with real data../examples/
. What is the AUC on the test set? Is it reasonable?I tried the pipeline code, on your github page and it resulted in the following error
from chemicalx import pipeline
from chemicalx.models import DeepSynergy
from chemicalx.data import DrugCombDB
model = DeepSynergy(context_channels=112, drug_channels=256)
dataset = DrugCombDB()
results = pipeline(
dataset=dataset,
model=model,
# Data arguments
batch_size=5120,
context_features=True,
drug_features=True,
drug_molecules=False,
labels=True,
# Training arguments
epochs=100,
)
# Outputs information about the AUC-ROC, etc. to the console.
results.summarize()
# Save the model, losses, evaluation, and other metadata.
results.save("~/test_results/")
TypeError Traceback (most recent call last)
[<ipython-input-9-b0c19c387d65>](https://localhost:8080/#) in <module>()
16 labels=True,
17 # Training arguments
---> 18 epochs=100,
19 )
20
TypeError: pipeline() got an unexpected keyword argument 'labels'
I think it would be better to have the dataset download and processing happen client-side, then use pystow
to store the results in a reliable place. This would also allow the TWOSIDES and DrugBank datasets, which require random negative sampling, to be used with multiple random seeds, e.g. to investigate the robustness of results. Further, it would allow for a more idiomatic dataset loader that's extensible to new datasets
Depends on:
Dear @kajocina,
EPGCNDS
which usesGraphConvolutions
to generate drug representations. Take a look at the layer definition here. You should use the layers from torchdrug
not the models../chemicalx/models/
./examples/
and ./tests/
../tests/
and make sure that your model/layer is tested with real data../examples/
. What is the AUC on the test set? Is it reasonable?The original paper: https://arxiv.org/abs/1704.08432
./chemicalx/models/
./examples/
and ./tests/
../tests/
and make sure that your model/layer is tested with real data../examples/
. What is the AUC on the test set? Is it reasonable?It seems deepsynergy, deepdds and matchmaker use the groundtruth (which is called context in the dataset) as model input. So the models performance are extremely high.
I'm under the impression that AstraZeneca isn't allocating resources to maintaining this repository or answering questions. Is this correct?
I don't feel comfortable answering questions or maintaining this as long as it lives in the AstraZeneca namespace and I'm not being paid for consulting.
I don't recall anyone else being active in the repository besides minor model-specific contributions pre-publication. If it's the case that AZ doesn't have any plans for this, then I think we should minimally put a notice on the README saying so and also potentially archive this repository.
got an error when trying to reproduce the snippet in the parper
line 14 prediction = model(batch.context_features,batch.drug_features_left,batch.drug_features_right)
TypeError: 'LabeledTriples' object is not iterable]()
Hi! I have found a bug during the training of the caster model. It was caused by the torch.eye
manipulation, simply it did not specify the device. When the Cuda is available, torch.eye
will create the tensor on the CPU while the whole model is on the GPU.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.