GithubHelp home page GithubHelp logo

Comments (4)

JiaweiChenGo avatar JiaweiChenGo commented on September 3, 2024

Thanks for you interest.
Running demo is here. For training set construction, you can choose a well annotated dataset according to your research needs, then preprocess it by sc.pp.normalize_total, sc.pp.log1p and sc.pp.highly_variable_genes and save it as an AnnData object.

from tosica.

zclecle2 avatar zclecle2 commented on September 3, 2024

Thanks for developing the tool for automatic cell type annotation!

I also want to ask about how to prepare the training set. Are the following codes enough for preparation, supposing that train_adata originally contains 35699 cells with 18010 genes:
sc.pp.normalize_total(train_adata, target_sum=1e4)
sc.pp.log1p(train_adata)
sc.pp.highly_variable_genes(train_adata).

Or do I need to filter the train_adata to contain only highly_variable_genes?
And is that ok if my train_adata are already normalized data such as one export from the data layer of Seurat object and I still let it go through the above 3 lines of code?
And do you have any suggestions on how to choose epochs and gmt_path to get better training and prediction results? What value should I pay attention to if I want to assess whether the training is good or not if I don't know the truth cell type for query data? Should I stop increasing epoch number if I see the accu value nearly flattens?
When I tried to train my own reference dataset, I found that the initial accu value is quite low (shown in the following image), is this normal? (train_adata originally contains 35699 cells with 13295 genes, with running the above 3 lines)
image
Appreciated your reply!

from tosica.

SteGruener avatar SteGruener commented on September 3, 2024

I would also be interested in answers to questions raised above.

from tosica.

JiaweiChenGo avatar JiaweiChenGo commented on September 3, 2024
  1. I will use HVGs to train the model. If all genes were used, there will be more parameters need to train and the training process will be longer.
  2. The 3 lines is used to normalize the data. Normalized data can be used as input for the model.
  3. Yes, you can stop increasing epoch number when the accu value nearly flattens.
  4. As we described in the paper Supplementary Figure 8, you can choose any knowledge mask depending on biological context or your research interests.
  5. For the unknown query data, you can use UMAP to visualize the query and reference data in the TOSICA attention latent space to see if it is reasonable. And you can get the marker genes in each predicted cell type group to check the annotation.
  6. When you use all genes, the model will be much larger and accu value will slowly increase.

from tosica.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.