bunnech / cellot Goto Github PK
View Code? Open in Web Editor NEWLearning Single-Cell Perturbation Responses using Neural Optimal Transport
License: BSD 3-Clause "New" or "Revised" License
Learning Single-Cell Perturbation Responses using Neural Optimal Transport
License: BSD 3-Clause "New" or "Revised" License
Hi,
In the sciplex dataset ood mode, if i want to predict the outcome of multiple drugs, how to set the parameter target
in the config file? (1) target: drug A, drug B ,... , drug N
. For example, if i want to predict the outcome of 3 drugs, respectively, target: drug A, drug B, drug C
? (2) or for every drug
i want to predict, i should train a new model
?
Best,
Hope to receive your reply.
Hi!
I wasn't able to find a way to generate predicted expression values using a trained CellOT model, do you have a script to do so?
Thanks,
Yan
With following run command:
python3 ./scripts/train.py --outdir ./results/scrna-sciplex3/drug-ruxolitinib/model-cellot --config ./configs/tasks/sciplex3.yaml --config ./configs/models/cellot.yaml --config.data.target ruxolitinib
I am getting following error:
Traceback (most recent call last):
File "/home/centos/anaconda3/envs/cellot/lib/python3.9/site-packages/anndata/_io/utils.py", line 177, in func_wrapper
return func(elem, *args, **kwargs)
File "/home/centos/anaconda3/envs/cellot/lib/python3.9/site-packages/anndata/_io/h5ad.py", line 527, in read_group
EncodingVersions[encoding_type].check(
File "/home/centos/anaconda3/envs/cellot/lib/python3.9/enum.py", line 408, in getitem
return cls.member_map[name]
KeyError: 'dict'
During handling of the above exception, another exception occurred:
I believe this is due to the fact that dataset has been created using a newer version of scanpy while cellot is trying to read it with an older version (scanpy==1.8.1). If I am attempting to upgrade scanpy to newer version, it fixing this particular issue but I believe this causing new errors.
I would appreciate more help in this regard.
Hi !
In ./scripts/evaluate.py, The r2 between the observed and predicted gene expression is calculated using "pd.Series.corr(mut, mui)". However, this function only returns the Pearson correlation coefficient (PCC). I is there anything wrong ?
I was hoping to use CellOT on full scRNA-seq data and was wondering what the training times for that should look like and if there is any way to accelerate training. I'm currently running scGen to get the autoencoder embeddings and I'm getting predicted runtimes of 594hrs on 1 GPU for 20k genes in 3k cells and 8hrs for 1k genes in 3k cells.
Thank you!
Hi, I would like to ask about this:
In https://github.com/bunnech/cellot/blob/main/configs/tasks/4i.yaml
we have source: control, but we do not have any key for target. My question is how the target/predicted samples are labeled?
Thank you
Hi,
I'm trying to run the simple 4i tutorial, but evaluate.py
crashes halfway. Unfortuantely, without more info on what each script is doing its difficult to troubleshoot this by oneself.
I'm running python /pathto/cellot/scripts/evaluate.py --outdir /pathto/scripts/cellot_run/ --setting iid --where data_space
Traceback (most recent call last):
File "/pathto/miniconda3/envs/cellot/lib/python3.9/site-packages/ml_collections/config_dict/config_dict.py", line 883, in __getitem__
field = self._fields[key]
KeyError: 'data'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/pathto/miniconda3/envs/cellot/lib/python3.9/site-packages/ml_collections/config_dict/config_dict.py", line 807, in __getattr__
return self[attribute]
File "/pathto/miniconda3/envs/cellot/lib/python3.9/site-packages/ml_collections/config_dict/config_dict.py", line 889, in __getitem__
raise KeyError(self._generate_did_you_mean_message(key, str(e)))
KeyError: "'data'"
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/pathto/cellot/scripts/evaluate.py", line 183, in <module>
app.run(main)
File "/pathto/miniconda3/envs/cellot/lib/python3.9/site-packages/absl/app.py", line 308, in run
_run_main(main, args)
File "/pathto/miniconda3/envs/cellot/lib/python3.9/site-packages/absl/app.py", line 254, in _run_main
sys.exit(main(argv))
File "/pathto/cellot/scripts/evaluate.py", line 173, in main
evals = pd.DataFrame(
File "/pathto/miniconda3/envs/cellot/lib/python3.9/site-packages/pandas/core/frame.py", line 563, in __init__
data = list(data)
File "/pathto/cellot/scripts/evaluate.py", line 62, in compute_evaluations
for ncells, nfeatures, treated, imputed in iterator:
File "/pathto/cellot/scripts/evaluate.py", line 118, in iterate_feature_slices
_, treateddf, imputed = load_conditions(
File "/pathto/cellot/cellot/utils/evaluate.py", line 271, in load_conditions
embedding = read_embedding_context(
File /pathto/cellot/cellot/utils/evaluate.py", line 159, in read_embedding_context
if "ae_emb" in config.data:
File "/pathto/miniconda3/envs/cellot/lib/python3.9/site-packages/ml_collections/config_dict/config_dict.py", line 809, in __getattr__
raise AttributeError(e)
AttributeError: "'data'"
Training finished without errors and the output directory /pathto/scripts/cellot_run/
looks like this:
config.yaml
cache:
last.pt model.pt scalars status
Any insights would be appreciated,
Best,
M
Hi guys,
long_description=open('READme.md').read()
in setup.py
throws an error. READme.md >> README.md fixes it.
Hi, thanks for this great job.
I can run the sciplex3 dataset with one target in the ood mode? But this will time-consuming as the sciplex3 have 188 drugs in total. So Whether can we train the model with multiple targets once?
Best
Hope to receive your reply.
Hi,
Thanks so much for the great tool! I was wondering if there's a way to use GPUs using the current training framework or if you were planning on adding GPU support in the near future?
Thanks!
Yan
Hi,
In the lupuspatients dataset, you set the groupby
parameter to condition
. In the sciplex3 dataset, you set the groupby
parameter to [cell_type, condition]
?
(1) In the lupuspatients dataset, why not set the groupby
parameter to [cell_type, condition]
as 8 cell type exists.
(2) Under what circumstances set the groupby
parameter to condition
, and set the groupby
parameter to [cell_type, condition]
?
Best.
Hi!
I am using CellOT in the OOD setting and I was wondering why you were using a test_size > 0 for split_cell_data_train_test in the ood setting? My thinking is that in OOD you would want to use all the data you can that's outside of the holdout group. Is it for evaluating the performance of the non-ood tasks or maybe something else I am missing?
Thank you so much!
In the code, I see that the forward function of ICNN is defined like this:
`
def forward(self, x):
z = self.sigma(0.2)(self.A[0](x))
z = z * z
for W, A in zip(self.W[:-1], self.A[1:-1]):
z = self.sigma(0.2)(W(z) + A(x))
y = self.W[-1](z) + self.A[-1](x)
return y
`
I think there are two places that are inconsistent with the formula in article.
(i) Why should we make z=z*z in the first layer?
(ii) Why no non-negative activation function is added to the last layer.
Thank you!
Hi!
Thank you for this interesting tool!
Rather than predicting unseen perturbation outcomes, I am interested in just finding a mapping between wild-type and perturbed cells in my training dataset. Is there a way to get this explicitly form the model? Or do I have to transport()
my wild-type cells and find the closest perturbed cell for each? Thank you!
After trained cellot, cae-4i, random, identity and scgen-4i models for 4i data, I tried running plot.py script by this command: python scripts/plot.py, I encountered error on file not found when plotting UMAP and KNN_MMD
This indicates umap.csv and knn_enrichment.csv are missing in evals_iid_data_space of results folder. Could you please tell me is there anything wrong? Is my command correct for running this plot script?
Thanks.
when i try to use anndata to read the file“hvg-train-only.h5ad”in scrna-crossspecies,i failed.
but i successfully read the file 4i-melanoma_cell_lines-8h.h5ad in cl-8h
The error message is as follows
AnnDataReadError: Above error raised while reading key '/layers' of type <class 'h5py._hl.group.Group'> from /.
I don't know why, looking forward to your reply,thanks.
Hello: I'm trying to reproduce results for cross species for CellOT (using rats as starting point), but ran into an issue.
The steps I tried are as follows:
model-scgen
as follows: --outdir ./results/scrna-crossspecies/mode-iid/model-scgen --config ./configs/tasks/crossspecies.yaml --config ./configs/models/scgen.yaml
--outdir ./results/scrna-crossspecies/mode-iid/model-cellot --config ./configs/tasks/crossspecies.yaml --config ./configs/models/cellot.yaml --config.data.ae_emb.path ./results/scrna-crossspecies/mode-iid/model-scgen
Once stored the result, I evaluated via the following: --outdir results/scrna-crossspecies/mode-iid/model-cellot --n_markers 50 --setting iid --where data_space
The results I get are:
'mmd': 0.4460518822912073, 'l2': 15.850724, 'r2': 0.4929862534733026
What have I done wrongly? For reference, I got the following for identity, which seems to make more sense:
'mmd': 0.20872688110292562, 'l2': 11.169688, 'r2': 0.7255046739934895
Thank you!
Hi, as mentioned in README.md
All scripts to reproduce the experiments in the i.i.d. (independent-and-identically-distributed), o.o.s. (out-of-sample), and o.o.d. (out-of-distribution) setting can be found in scripts/submit
I tried to find ./configs/tasks/oos-lupuspatients.yaml
in scripts/submit/oos-lupuspatients.sh
, but only found lupuspatients.yaml
and lupuspatients-ood.yaml
.
So is there any example for o.o.s?
Thanks.
After reading 4i data the preprocessed version provided
. I have not figured out the source/target distribution there.
I notice that the data are indexed by the drug and cell original, but no source/target labeled.
Please see attached screenshots.
Screenshot1: my code snippet
Screenshot2 Screenshot3: is the data obs and var, as you can see it is indexed by drug as row and cell original as column.
Screenshot4: is UMAP filtering the data by Trametinib but could not filter (source vs target)
I also found in the repository line 71 to line 93: https://github.com/bunnech/cellot/blob/main/cellot/data/cell.py
you where labeling the data as source and target, I am not sure how do you do that. I thought the data are already labeled.
I really appreciate any explanation.
Thank you
Dear Author,
I have taken interest in CellOT package and found it is interesting. After trying it for awhile. I can't get a function to generate prediction based on the train model.
For example, I want to have a different split used for testing and I want to make prediction based on that split instead of random split.
Is it possible to find the function?
Best regards,
Rom Uddamvathanak
The script in scripts/submit/iid.sh
doesn't seem to have any command to run the crossspecies
dataset. When I tried running
python ./scripts/train.py --outdir ./results/scrna-crossspecies/model-cellot --config ./configs/tasks/crossspecies.yaml --config ./configs/models/cellot.yaml
it returns the following result:
Traceback (most recent call last):
File "[repo_name]/./scripts/train.py", line 80, in <module>
main(sys.argv)
File "[repo_name]/./scripts/train.py", line 64, in main
train(outdir, config)
File "[repo_name]/cellot/train/train.py", line 129, in train_cellot
gl = compute_loss_g(f, g, source).mean()
File "[repo_name]/cellot/models/cellot.py", line 102, in compute_loss_g
transport = g.transport(source)
File "[repo_name]/cellot/networks/icnns.py", line 97, in transport
(output,) = autograd.grad(
File "/state/partition1/llgrid/pkg/anaconda/python-LLM-2023b/lib/python3.10/site-packages/torch/autograd/__init__.py", line 288, in grad
grad_outputs_ = _make_grads(t_outputs, grad_outputs_, is_grads_batched=is_grads_batched)
File "/state/partition1/llgrid/pkg/anaconda/python-LLM-2023b/lib/python3.10/site-packages/torch/autograd/__init__.py", line 71, in _make_grads
raise RuntimeError("Mismatch in shape: grad_output["
RuntimeError: Mismatch in shape: grad_output[0] has a shape of torch.Size([256, 1]) and output[0] has a shape of torch.Size([256, 1, 1]).
A similar outcome is produced when I tried running the following for GBM dataset.
python ./scripts/train.py --outdir ./results/scrna-gbm/model-cellot --config ./configs/tasks/gbm.yaml --config ./configs/models/cellot.yaml
Hi, can you point me to the 'Online Methods' referenced in the publication? Thanks!
Hello,
when I try to train Sciplex3, it requires the hvg-top1k-train-only.h5ad. But only hvg.h5ad exists in the download dataset. Is that any scripts to preprocess the hvg.h5ad? Or I could just rename the file?
thanks
Here is what I did after running all models (including evaluation part) on 4i/cisplatin
:
python3 ./scripts/plot.py --evaldir results/4i/drug-cisplatin/
The first task (plotting marginals) went fine, but the next thing (plotting umaps) gave the following error:
Plotting UMAPS.
Traceback (most recent call last):
File "[repo_name]/./scripts/plot.py", line 359, in <module>
app.run(main)
File "/state/partition1/llgrid/pkg/anaconda/python-LLM-2023b/lib/python3.10/site-packages/absl/app.py", line 308, in run
_run_main(main, args)
File "/state/partition1/llgrid/pkg/anaconda/python-LLM-2023b/lib/python3.10/site-packages/absl/app.py", line 254, in _run_main
sys.exit(main(argv))
File "[repo_name]/./scripts/plot.py", line 336, in main
plot_umaps(config_plotting, evaldir, outdir, setting, where)
File "[repo_name]/./scripts/plot.py", line 110, in plot_umaps
umaps[model] = load_single_umap(evaldir / f"model-{model}", setting, where)
File "[repo_name]/./scripts/plot.py", line 60, in load_single_umap
umaps = pd.read_csv(expdir / f"evals_{setting}_{where}" / "umap.csv", index_col=0)
File "/state/partition1/llgrid/pkg/anaconda/python-LLM-2023b/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 948, in read_csv
return _read(filepath_or_buffer, kwds)
File "/state/partition1/llgrid/pkg/anaconda/python-LLM-2023b/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 611, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "/state/partition1/llgrid/pkg/anaconda/python-LLM-2023b/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1448, in __init__
self._engine = self._make_engine(f, self.engine)
File "/state/partition1/llgrid/pkg/anaconda/python-LLM-2023b/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1705, in _make_engine
self.handles = get_handle(
File "/state/partition1/llgrid/pkg/anaconda/python-LLM-2023b/lib/python3.10/site-packages/pandas/io/common.py", line 863, in get_handle
handle = open(
FileNotFoundError: [Errno 2] No such file or directory: 'results/4i/drug-cisplatin/model-cellot/evals_iid_data_space/umap.csv'
Did I miss something (during the evaluation part) that (supposedly) produces the umap? The only thing produced during evaluation is imputed
file and the evals
file on all the metric. Thanks!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.