jackiehanlab / tosica Goto Github PK
View Code? Open in Web Editor NEWTransformer for One-Stop Interpretable Cell-type Annotation
License: MIT License
Transformer for One-Stop Interpretable Cell-type Annotation
License: MIT License
Hello,
Although I'm not an expert in this field, I have always been deeply fascinated by your work, especially your research on [TOSICA].
I understand that you have utilized some data preprocessing code and raw data in your [TOSICA] research. Given my current non-expert status in this field, I would greatly appreciate it if I could obtain some data preprocessing code and the raw data. This would enable me to better understand your work and initiate my learning process.
If possible, could you kindly share some data preprocessing code and your raw data with me? This would be immensely helpful, and I promise to use these resources with care, solely for the purposes of learning and research.
If you require any additional information or have any other requests, please feel free to let me know. I truly appreciate your time and assistance.
Once again, thank you for your outstanding work, and I look forward to your response.
Best regards,
May I ask,What are the specific differences between the several GMT files provided in the code? What data are they applicable to separately? If I want to use a new dataset on this model, how should I create a GMT file?
Hello,
Thank you for bringing such a good piece of software, I'm having a little problem with your software.
I ran TOSICA with test data, but in the new_adata after the prediction, 2874 cells were predicted to be different from the original celltype.
#ref data
ref_adata = sc.read('./demo_train.h5ad')
ref_adata = ref_adata[:,ref_adata.var_names]
print(ref_adata)
print(ref_adata.obs.Celltype.value_counts())
#query data
query_adata = sc.read('./demo_test.h5ad')
query_adata = query_adata[:,ref_adata.var_names]
print(query_adata)
print(query_adata.obs.Celltype.value_counts())
#Training
TOSICA.train(ref_adata, gmt_path='./GO_bp.gmt', label_name='Celltype',epochs=3,project='hGOBP_demo')
#Prediction
model_weight_path = './hGOBP_demo/model-0.pth'
new_adata = TOSICA.pre(query_adata, model_weight_path = model_weight_path,project='hGOBP_demo')
I installed python 3.8, but I get an error running pip install .
:
ERROR: Package 'TOSICA' requires a different Python: 3.6.15 not in '>=3.8'
Are there any bugs here?
Dear TOSICA makers,
thanks for this great tool I was very amazed by the paper.
I tried classifying my cells and got a very high evaluation accuracy during the training. However when I use the trained model for predictions the accuracy suddenly drops to very low level! Why does the evaluate function in the fit_train function give different results then the prediect function in the pre.py file?
best
philipp
@JackieHanLab
problem: model depth=2, cause much more 8 times GPU usage than depth=1 in hPancreas dataset train, 16GPU not enough.
environment: python=3.9, pytorch=1.12.1, Tesla T4 16G GPU.
hPancreas dataset train, that's the demo_train.h5ad, strictly follow the tutorial config,
max_g=300, max_gs=300, mask_ratio=0.015, n_unannotated=1, batch_size=8, embed_dim=48, depth=1, num_heads=4, lr=0.001, epochs= 10, lrf=0.01
The GPU usage suddenly exceeds 16GPU during 8% of the first epoch. Even I adjust embed_dim=2 and num_heads=2, there is also OOM of 16 GPU.
error:
File "/home/user1/codes/TOSICA/TOSICA/customized_linear.py", line 52, in backward
grad_weight = grad_weight * mask
RuntimeError: CUDA out of memory. Tried to allocate 166.00 MiB (GPU 0; 14.76 GiB total capacity; 13.31 GiB already allocated; 157.75 MiB free; 13.94 GiB reserved in total by PyTorch)
Then I changed depth=1, and keep embed_dim=48 and num_heads=4, the GPU usage is no more than 2G GPU, no problem to train with epochs=30.
The depth the is transformer block, mostly is the self Attention parts.
Could you check why the depth=2 vs depth=1 cause more than 8times GPU usage? Did you use 16GPU to train with depth=2?
thanks a lot in advance!!
Hi, what does the mask parameter in the transformer represent, is it a single cell matrix?
Hi, I look for some datasets you used in this paper and for one particular datasets,GSE1159677, I find some errors when I intend to use read_h5f based on either scanpy or pandas.
By using sc.read_10x_h5
Did you use seurat to read this file? I think the problem is caused by the sparse or storage. The count information here is a one d vector. Thanks a lot.
Dear JackieHanLab,
Could you share the other 5 dataset pre-process full codes, or detailed instructions on preprocess on each dataset?
I hope to reproduce your paper on all the 6 datasets, now, there is only the well pre-processed data "hPancreas" in https://figshare.com/projects/TOSICA_demo/158489
Now, I had already downloaded GSE152805_RAW.tar for hBone and GSE132042 for mAtlas, but indeed, I tried and failed to preprocess these data.
thanks very much in advance!!
Hello! This is a super interesting method, I was wondering if you were planning on releasing the code used to generate the figures in the manuscript, or update the tutorial to show how to pull out population-specific pathways (eg Fig4)? Thank you so much for your help!
Hi, I am sorry to bother you. When I tried to apply the TOSICA, I was confused in installing the environment. Here are details:
Could you offer us some suggestions about these situation? Thanks for your reply.
hi, i try to install TOSICA much times in different servers, never sucess : (
could you provide the complete lib .ymal, i noticed the TOSICA.ymal doesn't include scanpy.
hope you provide a complete ymal file for one stop : )
Hello! TOSICA is an interesting and great tools when annotation and integration.
here I have some problems comparing TOSICA result.
I want to compare TOSICA result with other methods (in scIB)
but when I try to use sciB:
thanks very much for you to share the codes, could you share what's hardware(mostly GPU) configuration used to train these models mentioned in paper?
what's the time used ?
thanks again!!
Hi, can you share the demo_train.h5ad
data used in the test.ipynb?
I would like to use TOSICA for cell classification. Can you provide specific examples, especially how to construct a training set.
When import TOSICA I get AttributeError: module 'numpy' has no attribute 'long'. If I downgrade my numpy version I get other compatibility issues.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.