I would like to use TOSICA for cell classification. Can you provide specific examples,

Thanks for you interest. Running demo is <a href="../tree/main/test/tutorial.ipynb

I will use HVGs to train the model. If all genes were used, there will be more p

use TOSICA for cell classification about tosica HOT 4 OPEN

suhuanhou commented on September 3, 2024

use TOSICA for cell classification

from tosica.

Comments (4)

JiaweiChenGo commented on September 3, 2024

Thanks for you interest.
Running demo is here. For training set construction, you can choose a well annotated dataset according to your research needs, then preprocess it by sc.pp.normalize_total, sc.pp.log1p and sc.pp.highly_variable_genes and save it as an AnnData object.

from tosica.

zclecle2 commented on September 3, 2024

Thanks for developing the tool for automatic cell type annotation!

I also want to ask about how to prepare the training set. Are the following codes enough for preparation, supposing that train_adata originally contains 35699 cells with 18010 genes:
sc.pp.normalize_total(train_adata, target_sum=1e4)
sc.pp.log1p(train_adata)
sc.pp.highly_variable_genes(train_adata).

Or do I need to filter the train_adata to contain only highly_variable_genes?
And is that ok if my train_adata are already normalized data such as one export from the data layer of Seurat object and I still let it go through the above 3 lines of code?
And do you have any suggestions on how to choose epochs and gmt_path to get better training and prediction results? What value should I pay attention to if I want to assess whether the training is good or not if I don't know the truth cell type for query data? Should I stop increasing epoch number if I see the accu value nearly flattens?
When I tried to train my own reference dataset, I found that the initial accu value is quite low (shown in the following image), is this normal? (train_adata originally contains 35699 cells with 13295 genes, with running the above 3 lines)

Appreciated your reply!

from tosica.

SteGruener commented on September 3, 2024

I would also be interested in answers to questions raised above.

from tosica.

JiaweiChenGo commented on September 3, 2024

I will use HVGs to train the model. If all genes were used, there will be more parameters need to train and the training process will be longer.
The 3 lines is used to normalize the data. Normalized data can be used as input for the model.
Yes, you can stop increasing epoch number when the accu value nearly flattens.
As we described in the paper Supplementary Figure 8, you can choose any knowledge mask depending on biological context or your research interests.
For the unknown query data, you can use UMAP to visualize the query and reference data in the TOSICA attention latent space to see if it is reasonable. And you can get the marker genes in each predicted cell type group to check the annotation.
When you use all genes, the model will be much larger and accu value will slowly increase.

from tosica.

use TOSICA for cell classification about tosica HOT 4 OPEN

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs