jackiehanlab / tosica Goto Github PK

View Code? Open in Web Editor NEW

123.0 123.0 23.0 11.85 MB

Transformer for One-Stop Interpretable Cell-type Annotation

License: MIT License

Python 1.38% Jupyter Notebook 98.62%

tosica's People

Contributors

Stargazers

Watchers

tosica's Issues

Quick question about prediction result

Hi, I find that in the output umaps of predict results, nearly more than 50% acinar cells are predicted as ductal cell, and also there is a new cell type in the delta part. Is it the problem of model design or the similarity of these cell types? Thanks a lot.

Data preprocessing

Hello,

Although I'm not an expert in this field, I have always been deeply fascinated by your work, especially your research on [TOSICA].

I understand that you have utilized some data preprocessing code and raw data in your [TOSICA] research. Given my current non-expert status in this field, I would greatly appreciate it if I could obtain some data preprocessing code and the raw data. This would enable me to better understand your work and initiate my learning process.

If possible, could you kindly share some data preprocessing code and your raw data with me? This would be immensely helpful, and I promise to use these resources with care, solely for the purposes of learning and research.

If you require any additional information or have any other requests, please feel free to let me know. I truly appreciate your time and assistance.

Once again, thank you for your outstanding work, and I look forward to your response.

Best regards,

Mask matrix issues

May I ask,What are the specific differences between the several GMT files provided in the code? What data are they applicable to separately? If I want to use a new dataset on this model, how should I create a GMT file?

The test data runs differently than the example

Hello,

Thank you for bringing such a good piece of software, I'm having a little problem with your software.

I ran TOSICA with test data, but in the new_adata after the prediction, 2874 cells were predicted to be different from the original celltype.

#ref data
ref_adata = sc.read('./demo_train.h5ad')
ref_adata = ref_adata[:,ref_adata.var_names]
print(ref_adata)
print(ref_adata.obs.Celltype.value_counts())

#query data
query_adata = sc.read('./demo_test.h5ad')
query_adata = query_adata[:,ref_adata.var_names]
print(query_adata)
print(query_adata.obs.Celltype.value_counts())

#Training
TOSICA.train(ref_adata, gmt_path='./GO_bp.gmt', label_name='Celltype',epochs=3,project='hGOBP_demo')

#Prediction
model_weight_path = './hGOBP_demo/model-0.pth'
new_adata = TOSICA.pre(query_adata, model_weight_path = model_weight_path,project='hGOBP_demo')

[bug] There is a bug when I install the software

I installed python 3.8, but I get an error running pip install .:
ERROR: Package 'TOSICA' requires a different Python: 3.6.15 not in '>=3.8'
Are there any bugs here？

TOSICA pre function gives very different accuracy than evaluate in train function

Dear TOSICA makers,

thanks for this great tool I was very amazed by the paper.

I tried classifying my cells and got a very high evaluation accuracy during the training. However when I use the trained model for predictions the accuracy suddenly drops to very low level! Why does the evaluate function in the fit_train function give different results then the prediect function in the pre.py file?

best
philipp

What will happen if the number or order of var_names is different？

If this requirement is mandatory, does it mean that both ref and predict datasets have to be aligned var_names first before each prediction of a new dataset?

model depth=2, cause more 8 times GPU usage than depth=1 in hPancreas dataset train

@JackieHanLab
problem: model depth=2, cause much more 8 times GPU usage than depth=1 in hPancreas dataset train, 16GPU not enough.
environment: python=3.9, pytorch=1.12.1, Tesla T4 16G GPU.

hPancreas dataset train, that's the demo_train.h5ad, strictly follow the tutorial config,
max_g=300, max_gs=300, mask_ratio=0.015, n_unannotated=1, batch_size=8, embed_dim=48, depth=1, num_heads=4, lr=0.001, epochs= 10, lrf=0.01
The GPU usage suddenly exceeds 16GPU during 8% of the first epoch. Even I adjust embed_dim=2 and num_heads=2, there is also OOM of 16 GPU.
error:
File "/home/user1/codes/TOSICA/TOSICA/customized_linear.py", line 52, in backward
grad_weight = grad_weight * mask
RuntimeError: CUDA out of memory. Tried to allocate 166.00 MiB (GPU 0; 14.76 GiB total capacity; 13.31 GiB already allocated; 157.75 MiB free; 13.94 GiB reserved in total by PyTorch)

Then I changed depth=1, and keep embed_dim=48 and num_heads=4, the GPU usage is no more than 2G GPU, no problem to train with epochs=30.
The depth the is transformer block, mostly is the self Attention parts.
Could you check why the depth=2 vs depth=1 cause more than 8times GPU usage? Did you use 16GPU to train with depth=2?
thanks a lot in advance!!

Code Consulting

Hi, what does the mask parameter in the transformer represent, is it a single cell matrix?

Questions about reading certain datasets

Hi, I look for some datasets you used in this paper and for one particular datasets,GSE1159677, I find some errors when I intend to use read_h5f based on either scanpy or pandas.

By using sc.read_10x_h5

Did you use seurat to read this file? I think the problem is caused by the sparse or storage. The count information here is a one d vector. Thanks a lot.

could you share the other 5 datasets detailed preprocessing codes

Dear JackieHanLab,

Could you share the other 5 dataset pre-process full codes, or detailed instructions on preprocess on each dataset?

I hope to reproduce your paper on all the 6 datasets, now, there is only the well pre-processed data "hPancreas" in https://figshare.com/projects/TOSICA_demo/158489

Now, I had already downloaded GSE152805_RAW.tar for hBone and GSE132042 for mAtlas, but indeed, I tried and failed to preprocess these data.
thanks very much in advance!!

Sharing code to reproduce manuscript figures

Hello! This is a super interesting method, I was wondering if you were planning on releasing the code used to generate the figures in the manuscript, or update the tutorial to show how to pull out population-specific pathways (eg Fig4)? Thank you so much for your help!

Memory leakage

Hi, I am sorry to bother you. When I tried to apply the TOSICA, I was confused in installing the environment. Here are details:

When I used packages (pytorch=1.7.1 torchvision=0.8.2 torchaudio=0.7.2 cudatoolkit=10.1), the training process would stucked while move the model to the GPU;
When I used the environment that included packages(pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.6), the consumption of the GPU would increase while training. I guess it could be the leakage of the memory.

Could you offer us some suggestions about these situation? Thanks for your reply.

TOSICA install problem

hi, i try to install TOSICA much times in different servers, never sucess : (
could you provide the complete lib .ymal, i noticed the TOSICA.ymal doesn't include scanpy.
hope you provide a complete ymal file for one stop : )

Problem with TOSICA metrics comparing

Hello! TOSICA is an interesting and great tools when annotation and integration.

here I have some problems comparing TOSICA result.
I want to compare TOSICA result with other methods (in scIB)
but when I try to use sciB:

I can't completely bulid the environment of scIB, 2. the scIB pipeline code doesn't clearly.
I noticed you plot a nice fig in TOSICA fig5.
so if you like share your scIB's requirement.ymal or requirement.txt file, and your run successful code, I would greatly appreciate it!

could you share what's hardware(GPU) used to train this model?

@JackieHanLab

thanks very much for you to share the codes, could you share what's hardware(mostly GPU) configuration used to train these models mentioned in paper?
what's the time used ?

thanks again!!

jackiehanlab / tosica Goto Github PK

tosica's People

Contributors

Stargazers

Watchers

Forkers

tosica's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs