GithubHelp home page GithubHelp logo

tosica's People

Contributors

jackiehanlab avatar jiaweichengo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

tosica's Issues

Quick question about prediction result

image
Hi, I find that in the output umaps of predict results, nearly more than 50% acinar cells are predicted as ductal cell, and also there is a new cell type in the delta part. Is it the problem of model design or the similarity of these cell types? Thanks a lot.

Data preprocessing

Hello,

Although I'm not an expert in this field, I have always been deeply fascinated by your work, especially your research on [TOSICA].

I understand that you have utilized some data preprocessing code and raw data in your [TOSICA] research. Given my current non-expert status in this field, I would greatly appreciate it if I could obtain some data preprocessing code and the raw data. This would enable me to better understand your work and initiate my learning process.

If possible, could you kindly share some data preprocessing code and your raw data with me? This would be immensely helpful, and I promise to use these resources with care, solely for the purposes of learning and research.

If you require any additional information or have any other requests, please feel free to let me know. I truly appreciate your time and assistance.

Once again, thank you for your outstanding work, and I look forward to your response.

Best regards,

Mask matrix issues

May I ask,What are the specific differences between the several GMT files provided in the code? What data are they applicable to separately? If I want to use a new dataset on this model, how should I create a GMT file?

The test data runs differently than the example

Hello,

Thank you for bringing such a good piece of software, I'm having a little problem with your software.

I ran TOSICA with test data, but in the new_adata after the prediction, 2874 cells were predicted to be different from the original celltype.

#ref data
ref_adata = sc.read('./demo_train.h5ad')
ref_adata = ref_adata[:,ref_adata.var_names]
print(ref_adata)
print(ref_adata.obs.Celltype.value_counts())

#query data
query_adata = sc.read('./demo_test.h5ad')
query_adata = query_adata[:,ref_adata.var_names]
print(query_adata)
print(query_adata.obs.Celltype.value_counts())

#Training
TOSICA.train(ref_adata, gmt_path='./GO_bp.gmt', label_name='Celltype',epochs=3,project='hGOBP_demo')

#Prediction
model_weight_path = './hGOBP_demo/model-0.pth'
new_adata = TOSICA.pre(query_adata, model_weight_path = model_weight_path,project='hGOBP_demo')

c14d364b5bf24aef50a8d404436fd39

TOSICA pre function gives very different accuracy than evaluate in train function

Dear TOSICA makers,

thanks for this great tool I was very amazed by the paper.

I tried classifying my cells and got a very high evaluation accuracy during the training. However when I use the trained model for predictions the accuracy suddenly drops to very low level! Why does the evaluate function in the fit_train function give different results then the prediect function in the pre.py file?

best
philipp

model depth=2, cause more 8 times GPU usage than depth=1 in hPancreas dataset train

@JackieHanLab
problem: model depth=2, cause much more 8 times GPU usage than depth=1 in hPancreas dataset train, 16GPU not enough.
environment: python=3.9, pytorch=1.12.1, Tesla T4 16G GPU.

hPancreas dataset train, that's the demo_train.h5ad, strictly follow the tutorial config,
max_g=300, max_gs=300, mask_ratio=0.015, n_unannotated=1, batch_size=8, embed_dim=48, depth=1, num_heads=4, lr=0.001, epochs= 10, lrf=0.01
The GPU usage suddenly exceeds 16GPU during 8% of the first epoch. Even I adjust embed_dim=2 and num_heads=2, there is also OOM of 16 GPU.
error:
File "/home/user1/codes/TOSICA/TOSICA/customized_linear.py", line 52, in backward
grad_weight = grad_weight * mask
RuntimeError: CUDA out of memory. Tried to allocate 166.00 MiB (GPU 0; 14.76 GiB total capacity; 13.31 GiB already allocated; 157.75 MiB free; 13.94 GiB reserved in total by PyTorch)

Then I changed depth=1, and keep embed_dim=48 and num_heads=4, the GPU usage is no more than 2G GPU, no problem to train with epochs=30.
The depth the is transformer block, mostly is the self Attention parts.
Could you check why the depth=2 vs depth=1 cause more than 8times GPU usage? Did you use 16GPU to train with depth=2?
thanks a lot in advance!!

Code Consulting

Hi, what does the mask parameter in the transformer represent, is it a single cell matrix?

Questions about reading certain datasets

Hi, I look for some datasets you used in this paper and for one particular datasets,GSE1159677, I find some errors when I intend to use read_h5f based on either scanpy or pandas.
image

By using sc.read_10x_h5

Did you use seurat to read this file? I think the problem is caused by the sparse or storage. The count information here is a one d vector. Thanks a lot.

could you share the other 5 datasets detailed preprocessing codes

Dear JackieHanLab,

Could you share the other 5 dataset pre-process full codes, or detailed instructions on preprocess on each dataset?

I hope to reproduce your paper on all the 6 datasets, now, there is only the well pre-processed data "hPancreas" in https://figshare.com/projects/TOSICA_demo/158489

Now, I had already downloaded GSE152805_RAW.tar for hBone and GSE132042 for mAtlas, but indeed, I tried and failed to preprocess these data.
thanks very much in advance!!

Sharing code to reproduce manuscript figures

Hello! This is a super interesting method, I was wondering if you were planning on releasing the code used to generate the figures in the manuscript, or update the tutorial to show how to pull out population-specific pathways (eg Fig4)? Thank you so much for your help!

Memory leakage

Hi, I am sorry to bother you. When I tried to apply the TOSICA, I was confused in installing the environment. Here are details:

  1. When I used packages (pytorch=1.7.1 torchvision=0.8.2 torchaudio=0.7.2 cudatoolkit=10.1), the training process would stucked while move the model to the GPU;
  2. When I used the environment that included packages(pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.6), the consumption of the GPU would increase while training. I guess it could be the leakage of the memory.

Could you offer us some suggestions about these situation? Thanks for your reply.

TOSICA install problem

hi, i try to install TOSICA much times in different servers, never sucess : (
could you provide the complete lib .ymal, i noticed the TOSICA.ymal doesn't include scanpy.
hope you provide a complete ymal file for one stop : )

Problem with TOSICA metrics comparing

Hello! TOSICA is an interesting and great tools when annotation and integration.

here I have some problems comparing TOSICA result.
I want to compare TOSICA result with other methods (in scIB)
but when I try to use sciB:

  1. I can't completely bulid the environment of scIB, 2. the scIB pipeline code doesn't clearly.
    I noticed you plot a nice fig in TOSICA fig5.
    so if you like share your scIB's requirement.ymal or requirement.txt file, and your run successful code, I would greatly appreciate it!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.