GithubHelp home page GithubHelp logo

yixf-self / sccatch Goto Github PK

View Code? Open in Web Editor NEW

This project forked from zjufanlab/sccatch

0.0 0.0 0.0 45.81 MB

Automatic Annotation on Cell Types of Clusters from Single-Cell RNA Sequencing Data

Home Page: https://www.sciencedirect.com/science/article/pii/S2589004220300663

License: GNU General Public License v3.0

R 99.80% Rebol 0.20%

sccatch's Introduction

Updated scCATCH 2.0

R >3.5 installed with devtools

Automatic Annotation on Cell Types of Clusters from Single-Cell RNA Sequencing Data

Recent advance in single-cell RNA sequencing (scRNA-seq) has enabled large-scale transcriptional characterization of thousands of cells in multiple complex tissues, in which accurate cell type identification becomes the prerequisite and vital step for scRNA-seq studies. Currently, the common practice in cell type annotation is to map the highly expressed marker genes with known cell markers manually based on the identified clusters, which requires the priori knowledge and tends to be subjective on the choice of which marker genes to use. Besides, such manual annotation is usually time-consuming.

To address these problems, we introduce a single cell Cluster-based Annotation Toolkit for Cellular Heterogeneity (scCATCH) from cluster marker genes identification to cluster annotation based on evidence-based score by matching the identified potential marker genes with known cell markers in tissue-specific cell taxonomy reference database (CellMatch).

download CellMatch

CellMatch includes a panel of 353 cell types and related 686 subtypes associated with 184 tissue types, 20,792 cell-specific marker genes and 2,097 references of human and mouse.

The scCATCH mainly includes two function findmarkergenes and scCATCH to realize the automatic annotation for each identified cluster. Usage and Examples are detailed below.

Cite

DOI PMID:32062421

Shao et al., scCATCH:Automatic Annotation on Cell Types of Clusters from Single-Cell RNA Sequencing Data, iScience, Volume 23, Issue 3, 27 March 2020. doi: 10.1016/j.isci.2020.100882. PMID:32062421

News

1. scCATCH can handle large single-cell transcriptomic dataset containing more than 10,000 cells and more than 15 clusters.

2. scCATCH can also be used to annotate scRNA-seq data from tissue with cancer.

Install

source package scCATCH__2.0.tar.gz

# download the source package of scCATCH_2.0.tar.gz and install it
install.packages(pkgs = 'scCATCH_2.0.tar.gz')

or

# install devtools and install scCATCH
install.packages(pkgs = 'devtools')
devtools::install_github('ZJUFanLab/scCATCH')

Usage

library(scCATCH)

1. Cluster marker genes identification.

clu_markers <- findmarkergenes(object,
                               species = NULL,
                               cluster = 'All',
                               match_CellMatch = FALSE,
                               cancer = NULL,
                               tissue = NULL,
                               cell_min_pct = 0.25,
                               logfc = 0.25,
                               pvalue = 0.05)

Identify potential marker genes for each cluster from a Seurat object (>= 3.0.0) after the default log1p normalization and cluster analysis. The potential marker genes in each cluster are identified according to its expression level compared to it in every other clusters. Only significantly highly expressed one in all pair-wise comparison of the cluster will be selected as a potential marker gene for the cluster. Genes will be revised according to NCBI Gene symbols (updated in Jan. 10, 2020, https://www.ncbi.nlm.nih.gov/gene) and no matched genes and duplicated genes will be removed.

object Seurat object (>= 3.0.0) after the default log1p normalization and cluster analysis. Please ensure data is log1p normalized data and data has been clustered before running scCATCH pipeline.

species The specie of cells. The species must be defined. 'Human' or 'Mouse'.

cluster Select which clusters for potential marker genes identification. e.g. '1', '2', etc. The default is 'All' to find potential makrer genes for each cluster.

match_CellMatch For large datasets containg > 10,000 cells or > 15 clusters, it is strongly recommended to set match_CellMatch 'TRUE' to match CellMatch database first to include potential marker genes in terms of large system memory it may take.

cancer If match_CellMatch is set TRUE and the sample is from cancer tissue, then the cancer type may be defined. Select one or more related cancer types in 3.2 of Details for human and 3.2 of Details for mouse. The dafult is NULL for tissues without cancer.

tissue If match_CellMatch is set TRUE, then the tissue origin of cells must be defined. Select one or more related tissue types in Details. For tissues without cancer, please refer to 3.1.1 of Details for human tissue types and 3.2.1 of Details for mouse tissue types. For tissues with cancer, please refer to 3.1.2 of Details for human tissue types and 3.2.2 of Details for mouse tissue types.

cell_min_pct Include the gene detected in at least this many cells in each cluster. Default is 0.25.

logfc Include the gene with at least this fold change of average gene expression compared to every other clusters. Default is 0.25.

pvalue Include the significantly highly expressed gene with this cutoff of p value from wilcox test compared to every other clusters. Default is 0.05.

Output

clu_markers A list include a new data matrix wherein genes are revised by official gene symbols according to NCBI Gene symbols (updated in Jan. 10, 2020, https://www.ncbi.nlm.nih.gov/gene) and no matched genes and duplicated genes are removed as well as a data.frame containing potential marker genes of each selected cluster and the corresponding expressed cells percentage and average fold change for each cluster.

2. Cluster annotation

clu_ann <- scCATCH(object,
                   species = NULL,
                   cancer = NULL,
                   tissue = NULL)

Evidence-based score and annotation for each cluster by matching the potential marker genes generated from findmarkergenes with known cell marker genes in tissue-specific cell taxonomy reference database (CellMatch).

object The data.frame containing marker genes and the corresponding expressed cells percentage and average fold change for each cluster from the output of findmarkergenes.

species The species of cells. Select 'Human' or 'Mouse'.

cancer If the sample is from cancer tissue and you want to match cell marker genes of cancer tissues in CellMatch, then the cancer type may be defined. Select one or more related cancer types in 3.1.2 of Details for human and 3.2.2 of Details for mouse. The dafult is NULL for tissues without cancer.

tissue The tissue origin of cells. Select one or more related tissue types in Details. For tissues without cancer, please refer to 3.1.1 of Details for human tissue types and 3.2.1 of Details for mouse tissue types. For tissues with cancer, please refer to 3.1.2 of Details for human tissue types and 3.2.2 of Details for mouse tissue types.

Output

clu_ann A data.frame containing matched cell type for each cluster, related marker genes, evidence-based score and PMID.

3. Details

3.1.1 For Human tissue, tissue types are listed as follows:

Adipose tissue-related: Abdominal adipose tissue; Adipose tissue; Brown adipose tissue; Fat pad; Subcutaneous adipose tissue; Visceral adipose tissue; White adipose tissue.

Bladder-related: Bladder; Urine.

Blood-related: Blood; Peripheral blood; Plasma; Serum; Umbilical cord blood; Venous blood.

Bone-related: Anterior cruciate ligament; Bone; Bone marrow; Cartilage; Intervertebral disc; Meniscus; Nucleus pulposus; Osteoarthritic cartilage; Periosteum; Skeletal muscle; Spinal cord; Synovial fluid; Synovium.

Brain-related: Brain; Dorsolateral prefrontal cortex; Embryonic brain; Embryonic prefrontal cortex; Fetal brain; Hippocampus; Inferior colliculus; Midbrain; Sympathetic ganglion.

Breast-related: Breast; Mammary epithelium.

Embryo-related: Embryo; Embryoid body; Embryonic brain; Embryonic prefrontal cortex; Embryonic stem cell; Germ; Primitive streak.

Esophagus-related: Esophagus.

Eye-related: Cornea; Corneal endothelium; Corneal epithelium; Eye; Lacrimal gland; Limbal epithelium; Optic nerve; Retina; Retinal pigment epithelium; Sclerocorneal tissue.

Fetus-related: Amniotic fluid; Amniotic membrane; Fetal brain; Fetal gonad; Fetal kidney; Fetal liver; Fetal pancreas; Placenta; Umbilical cord; Umbilical cord blood; Umbilical vein.

Gonad-related: Corpus luteum; Fetal gonad; Foreskin; Gonad; Ovarian cortex; Ovarian follicle; Ovary; Seminal plasma; Testis.

Hair-related: Chorionic villus; Hair follicle; Scalp.

Heart-related: Heart; Myocardium.

Intestine-related: Colon; Colorectum; Gastrointestinal tract; Gut; Intestinal crypt; Intestine; Jejunum; Large intestine; Small intestinal crypt; Small intestine.

Kidney-related: Adrenal gland; Fetal kidney; Kidney; Renal glomerulus.

Liver-related: Fetal liver; Liver.

Lung-related:Airway epithelium; Alveolus; Bronchoalveolar system; Lung.

Lymph-related: Lymph; Lymph node; Lymphoid tissue.

Muscle-related: Muscle; Skeletal muscle.

Nose-related: Nasal concha; Nasal epithelium; Sinonasal mucosa.

Oral cavity-related: Laryngeal squamous epithelium; Oral mucosa; Salivary gland; Sputum; Submandibular gland; Thyroid; Tonsil; Vocal fold.

Ovary-related: Corpus luteum; Ovarian cortex; Ovarian follicle; Ovary; Oviduct.

Pancreas-related: Fetal pancreas; Pancreas; Pancreatic acinar tissue; Pancreatic islet.

Prostate-related: Prostate.

Skin-related: Dermis; Skin.

Spleen-related: Spleen; Splenic red pulp.

Stomach-related: Gastric corpus; Gastric epithelium; Gastric gland; Gastrointestinal tract; Pyloric gland; Stomach.

Testis-related: Testis.

Tooth-related: Deciduous tooth; Dental pulp; Gingiva; Molar; Periodontal ligament; Premolar; Tooth.

Uterus-related: Endometrium; Endometrium stroma; Myometrium; Uterus; Vagina.

Vessel-related: Adventitia; Antecubital vein; Artery; Blood vessel; Umbilical vein.

Others: Ascites; Epithelium; Ligament; Pluripotent stem cell; Thymus; Whartons jelly.

3.1.2 For Human tissue about cancer, cancer types and the corresponding tissue types are listed as follows:

Acute Myelogenous Leukemia: Blood.

Acute Myeloid Leukemia: Bone marrow.

Adenoid Cystic Carcinoma: Salivary gland.

Alveolar Cell Carcinoma: Serum.

Astrocytoma: Brain.

B-Cell Lymphoma: Lymphoid tissue.

Bladder Cancer: Bladder.

Brain Cancer: Blood vessel; Brain.

Breast Cancer: Blood: Breast; Mammary gland.

Cholangiocarcinoma: Liver; Serum.

Chronic Lymphocytic Leukemia: Blood.

Chronic Myeloid Leukemia: Blood.

CNS Primitive Neuroectodermal Tumor: Brain.

Colon Cancer: Blood; Colon; Serum.

Colorectal Cancer: Blood; Colon; Colorectum; Gastrointestinal tract; Intestine; Liver; Lung; Venous blood.

Cutaneous Squamous Cell Carcinoma: Skin.

Endometrial Cancer: Endometrium.

Ependymoma: Brain.

Esophageal Adenocarcinoma: Esophagus.

Fibroid: Myometrium.

Follicular Lymphoma: Lymph node.

Gallbladder Cancer: Gall bladder; Gastrointestinal tract.

Gastric Cancer: Blood; Peripheral blood; Serum; Stomach.

Glioblastoma: Blood; Brain.

Glioma: Blood vessel; Brain.

Gonadoblastoma: Embryo.

Head and Neck Cancer: Blood; Brain; Oral cavity.

Hepatoblastoma: Liver.

Hepatocellular Cancer: Blood; Bone marrow; Embryo; Liver.

High-grade glioma: Brain.

Infantile Hemangiomas: Placenta.

Intestinal Cancer: Gastrointestinal tract.

Intracranial Aneurysm: Brain.

Kaposi's Sarcoma: Lymph node.

Larynx Cancer: Larynx.

Leukemia: Bone marrow; Peripheral blood.

Lipoma: Adipose tissue.

Liver Cancer: Blood; Liver.

Lung Adenocarcinoma: Lung.

Lung Cancer: Blood; Lung.

Lung Squamous Cell Carcinoma: Lung.

Lymphoma: Blood; Brain; Kidney; Liver; Lymph; Lymph node.

Malignant Insulinoma: Pancreas.

Malignant Mesothelioma: Lung; Pleura.

Malignant Peripheral Nerve Sheath Tumor: Brain.

Medulloblastoma: Brain.

Melanoma: Blood; Peripheral blood; Skin.

Mucoepidermoid Carcinoma: Salivary gland.

Multiple Myeloma: Bone marrow; Peripheral blood.

Myeloma: Bone marrow.

Natural Killer Cell Lymphoma: Lymph node.

Nephroblastoma: Kidney.

Non-Small Cell Lung Cancer: Blood; Lung; Peripheral blood.

Oesophageal Cancer: Blood.

Oligodendroglioma: Brain.

Oral Cancer: Oral cavity.

Oral Squamous Cell Carcinoma: Oral cavity; Peripheral blood.

Osteosarcoma: Bone.

Ovarian Cancer: Ascites; Ovarian cortex; Ovary; Peripheral blood.

Pancreatic Cancer: Blood vessel; Pancreas.

Pancreatic Ductal Adenocarcinomas: Pancreas.

Papillary Thyroid Carcinoma: Thyroid.

Prostate Cancer: Blood; Peripheral blood; Prostate.

Renal Cell Carcinoma: Kidney; Serum.

Renal Clear Cell Carcinoma: Lymph node.

Retinoblastoma: Eye.

Salivary Gland Tumor: Parotid gland; Salivary gland.

Sarcoma: Muscle.

Small Cell Lung Cancer: Lung.

Testicular Germ Cell Tumor: Peripheral blood; Testis.

Thyroid Cancer: Thyroid.

Tongue Cancer: Tongue.

Uterine Leiomyoma: Uterus.

Vascular Tumour: Lymph node.

3.2.1 For Mouse tissue, tissue types are listed as follows:

Adipose tissue-related: Adipose tissue; White adipose tissue.

Bladder-related: Bladder.

Blood-related: Blood; Peripheral blood; Serum; Umbilical cord blood.

Bone-related: Bone; Bone marrow; Meniscus; Skeletal muscle; Spinal cord.

Brain-related: Brain; Cerebellum; Fetal brain; Hippocampus; Neural tube.

Breast-related: Mammary epithelium; Mammary gland.

Calvaria-related: Neonatal calvaria.

Ear-related: Cochlea; Inner Ear.

Embryo-related: Embryo; Embryoid body; Embryonic heart; Embryonic stem cell.

Esophagus-related: Esophagus.

Eye-related: Corneal epithelium; Eye; Ganglion cell layer of retina; Inner nuclear layer of retina; Lacrimal gland; Retina.

Fetus-related: Fetal brain; Fetal intestine; Fetal liver; Fetal lung; Fetal stomach; Placenta; Umbilical cord; Umbilical cord blood.

Gonad-related: Gonad; Ovary; Testis; Yolk sac.

Hair-related: Hair follicle.

Heart-related: Embryonic heart; Heart; Heart muscle; Neonatal heart.

Intestine-related: Colon; Colon epithelium; Fetal intestine; Gastrointestinal tract; Ileum; Intestinal crypt; Intestine; Mesenteric lymph node; Small intestine.

Kidney-related: Kidney; Mesonephros.

Liver-related: Fetal liver; Liver.

Lung-related: Bronchiole; Fetal lung; Lung; Trachea.

Lymph-related: Lymph node; Lymphoid tissue; Mesenteric lymph node; Peyer patch.

Muscle-related: Heart muscle; Muscle; Neonatal muscle; Skeletal muscle.

Neonate-related: Neonatal calvaria; Neonatal heart; Neonatal muscle; Neonatal pancreas; Neonatal rib; Neonatal skin.

Oral cavity-related: Submandibular gland; Taste bud.

Ovary-related: Ovary; Yolk sac.

Pancreas-related: Neonatal pancreas; Pancreas; Pancreatic islet.

Prostate-related: Prostate.

Skin-related: Dermis; Epidermis; Neonatal skin; Skin.

Spleen-related: Spleen.

Stomach-related: Fetal stomach; Gastrointestinal tract; Stomach.

Testis-related: Testis.

Uterus-related: Uterus.

Vessel-related: Aorta; Artery; Blood vessel; Carotid artery.

Others: Basilar membrane; Epithelium; Peritoneal cavity; Thymus.

3.2.2 For Mouse tissue about cancer, cancer types and the corresponding tissue types are listed as follows:

Breast Cancer: Lymph node; Breast; Lung.

Chronic Myeloid Leukemia: Blood.

Colon Cancer: Colon.

Colorectal Cancer: Lymph node; Colon; Colorectum.

Cutaneous Squamous Cell Carcinoma: Skin.

Hepatocellular Cancer: Blood; Liver.

Liver Cancer: Liver.

Lung Cancer: Lung.

Melanoma: Lung.

Pancreatic Cancer: Blood.

Papillary Thyroid Carcinoma: Thyroid.

Prostate Cancer: Prostate.

Renal Cell Carcinoma: Kidney.

Supratentorial Primitive Neuroectodermal Tumor: Brain.

Examples

# Step 1: prepare a Seurat object containing log1p normalized single-cell transcriptomic data matrix and the information of cell clusters.
# Note: please define the species for revising gene symbols. Human or Mouse. The default is to find potential marker genes for all clusters with the percentage of expressed cells (≥25%), using WRS test (P<0.05) and a log1p fold change of ≥0.25. These parameters are adjustable for users.

clu_markers <- findmarkergenes(object = mouse_kidney_203_Seurat,
                               species = 'Mouse'
                               cluster = 'All',
                               match_CellMatch = FALSE,
                               cancer = NULL,
                               tissue = NULL,
                               cell_min_pct = 0.25,
                               logfc = 0.25,
                               pvalue = 0.05)
                               
# Note: for large datasets, please set match_CellMatch as TRUE and provided tissue types. For tissue with cancer, users may provided the cancer types and corresponding tissue types. See Details. 
# Step 2: evidence-based scoring and annotaion for identified potential marker genes of each cluster generated from findmarkergenes function.

clu_ann <- scCATCH(object = clu_markers$clu_markers,
                   species = 'Mouse',
                   cancer = NULL,
                   tissue = 'Kidney')

# Users can also use scCATCH by selecting multiple cluster, cancer types, tissue types as follows:
clu_markers <- findmarkergenes(object = mouse_kidney_203_Seurat,
                               species = 'Mouse'
                               cluster = '1',
                               match_CellMatch = TRUE,
                               cancer = NULL,
                               tissue = 'Kidney',
                               cell_min_pct = 0.1,
                               logfc = 0.1,
                               pvalue = 0.01)
                               
clu_markers <- findmarkergenes(object = mouse_kidney_203_Seurat,
                               species = 'Mouse'
                               cluster = c('1','2'),
                               match_CellMatch = TRUE,
                               cancer = NULL,
                               tissue = c('Kidney','Mesonephros'))
Note: please select the right cancer type and the corresponding tissue type (See Details).

Issues

bug error

Solutions for possilble bugs and errors. Please refer to closed Issues1 and Issues2

Contributors

Xin Shao Xiaohui Fan

scCATCH was developed by Xin Shao. Should you have any questions, please contact Xin Shao at [email protected]

sccatch's People

Contributors

hopetop avatar multitalk avatar xuzhougeng avatar zjufanlab avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.