GithubHelp home page GithubHelp logo

cumol / safeclustering Goto Github PK

View Code? Open in Web Editor NEW

This project forked from yycunc/safeclustering

0.0 2.0 1.0 4.19 MB

SAFE (Single-cell Aggregated clustering From Ensemble): Cluster ensemble for single-cell RNA-seq data

R 100.00%

safeclustering's Introduction

SAFEclustering

SAFE (Single-cell Aggregated clustering From Ensemble): Cluster ensemble for single-cell RNA-seq data

Although several methods have been recently developed for clustering cell types using single-cell RNA-seq (scRNA-Seq) data, they utilize different characteristics of data and yield varying results in terms of both the number of clusters and actual cluster assignments. Here, we present SAFE-clustering, Single-cell Aggregated (From Ensemble) clustering, a flexible, accurate and robust method for clustering scRNA-Seq data. SAFE-clustering takes as input, results from multiple clustering methods, to build one consensus solution. SAFE-clustering currently embeds four state-of-the-art methods, SC3, CIDR, Seurat and t-SNE + k-means; and ensembles solutions from these four methods using three hypergraph-based partitioning algorithms.

SAFEclustering is maintained by Yuchen Yang [[email protected]]

News and Updates

July 24, 2018

  • Version 0.99.0 released
    • First offical release
    • Now it can only work on Mac and Linux platform

Installation

You can install SAFEclustering from github with:

install.packages("devtools)

devtools::install_github("yycunc/SAFEclustering")

Note that hypergraph partitioning algorithm (HGPA) is performed using the shmetis program (from the hMETIS package v. 1.5 (Karypis et al., IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 1999)), and meta-clustering algorithm (MCLA) and cluster-based similarity partitioning algorithm (CSPA) are performed using gpmetis program (from METIS v. 5.1.0 (Karypis and Kumar, SIAM Journal on Scientific Computing, 1998)). Please download the two programs corresponding to the operating systems you are using and put them in the working directory or provide the directory where these two programs are.

SAFEclustering Examples

Here we will provide examples using two datasets: one from Zheng et al., (Nature Communications, 2016) and the other from Biase et al., (Genome Research, 2014). Zheng dataset contains 500 human peripheral blood mononuclear cells (PBMCs) sequenced using GemCode platform, which consists of three cell types, CD56+ natural killer cells, CD19+ B cells and CD4+/CD25+ regulatory T cells. The original data can be downloaded from 10X GENOMICS website. The Biase dataset has 49 mouse embryo cells, which were sequenced by SMART-Seq and can be found at NCBI GEO:GSE57249.

Load the data

library("SAFEclustering")
data("data_SAFE")

Zheng dataset

Setup the input expression matrix

dim(data_SAFE$Zheng.expr)

data_SAFE$Zheng.expr[1:5, 1:5]

Perform individual clustering

Here we perform single-cell clustering using four popular methods, SC3, CIDR, Seurat and t-SNE + k-means, without filtering any genes or cells.

cluster.results <- individual_clustering(inputTags = data_SAFE$Zheng.expr, datatype = "count", 
mt_filter = FALSE, nGene_filter = FALSE, SC3 = TRUE, gene_filter = FALSE, CIDR = TRUE, 
nPC.cidr = NULL, Seurat = TRUE, nPC.seurat = NULL, resolution = 0.9, tSNE = TRUE, dimensions = 3, 
perplexity = 30, SEED = 123)

The function indiviual_clustering will output a matrix, where each row represents the cluster results of each method, and each colunm represents a cell. User can also extend SAFE-clustering to other scRNA-seq clustering methods, by putting all clustering results into a M * N matrix with M clustering methods and N cells.

cluster.results[1:4, 1:10]

Cluster ensemble

Using the clustering results generated in last step, we perform cluster ensemble using three partitioning algorithms meta-clustering algorithm (MCLA), hypergraph partitioning algorithm (HGPA) and cluster-based similarity partitioning algorithm (CSPA) (Strehl and Ghosh, Proceedings of AAAI 2002, Edmonto, Canada, 2002).

Here, the programs required, shmetis and gpmetis, are in the local working directory "~/Documents/single_cell_clustering".

cluster.ensemble <- SAFE(cluster_results = cluster.results, program.dir = "~/Documents/single_cell_clustering", 
MCLA = TRUE, CSPA = TRUE, HGPA = TRUE, SEED = 123)

Here is the list of ANMI results for esemble solution of each K and each partitioning algorithm.

## [1] "HGPA partitioning at K = 2: 2 clusters at ANMI = 0.00329903476904425"
## [1] "HGPA partitioning at K = 3: 3 clusters at ANMI = 0.278691668779803"
## [1] "HGPA partitioning at K = 4: 4 clusters at ANMI = 0.00392992505505839"
## [1] "HGPA partitioning at K = 5: 5 clusters at ANMI = 0.552234460801785"
## [1] "MCLA partitioning at K = 2: 2 clusters at ANMI = 0.568294023177534"
## [1] "MCLA partitioning at K = 3: 3 clusters at ANMI = 0.929094923585274"
## [1] "MCLA partitioning at K = 4: 4 clusters at ANMI = 0.872601957447147"
## [1] "MCLA partitioning at K = 5: 4 clusters at ANMI = 0.923346490477427"
## [1] "CSPA partitioning at K = 2: 2 clusters at ANMI = 0.53144399728197"
## [1] "CSPA partitioning at K = 3: 3 clusters at ANMI = 0.850151780486274"
## [1] "CSPA partitioning at K = 4: 4 clusters at ANMI = 0.665510270422344"
## [1] "CSPA partitioning at K = 5: 5 clusters at ANMI = 0.666022118059772"
## [1] "Optimal number of clusters is 3 with ANMI = 0.929094923585274"

Function SAFE will output a list for Average Normalized Mutual Information (ANMI) metric (Strehl and Ghosh Proceedings of AAAI 2002, Edmonto, Canada, 2002) between each ensemble solution and the individual solutions. The optimal clustering ensemble is selected from the ensemble solution with the highest ANMI value.

cluster.ensemble$Summary

cluster.ensemble$MCLA[1:10]

cluster.ensemble$MCLA_optimal_k

We can compare the clustering results to the true labels using the Adjusted Rand Index (ARI)

library(cidr)

# Cell labels of ground truth
head(data_SAFE$Zheng.celltype)

# Calculating ARI for cluster ensemble
adjustedRandIndex(cluster.ensemble$optimal_clustering, data_SAFE$Zheng.celltype)

Biase dataset

Setup the input expression matrix

dim(data_SAFE$Biase.expr.expr)

data_SAFE$Biase.expr[1:5, 1:5]

Perform individual clustering

Here we perform single-cell clustering using four popular methods, SC3, CIDR, Seurat and t-SNE + k-means, without filtering any genes or cells. Since there are only 49 cells in Biase dataset, the resolution parameter is set to 1.2 according to our benchmarking results.

cluster.results <- individual_clustering(inputTags = data_SAFE$Biase.expr, datatype = "FPKM",  
mt_filter = FALSE, nGene_filter = FALSE, SC3 = TRUE, gene_filter = FALSE, CIDR = TRUE, 
nPC.cidr = NULL, Seurat = TRUE, nPC.seurat = NULL, seurat_min_cell = 200, resolution_min = 1.2, 
tSNE = TRUE, dimensions = 3, tsne_min_cells = 200, tsne_min_perplexity = 10, SEED = 123)

Cluster ensemble

Using the clustering results, we perform cluster ensemble using all the three partitioning algorithms MCLA, HGPA and CSPA.

cluster.ensemble <- SAFE(cluster_results = cluster.results, program.dir = "~/Documents/single_cell_clustering", 
MCLA = TRUE, CSPA = TRUE, HGPA = TRUE, SEED = 123)

Here is the list of ANMI results for esemble solution of each K and each partitioning algorithm.

## [1] "HGPA partitioning at K = 2: 2 clusters at ANMI = 0.156896209024547"
## [1] "HGPA partitioning at K = 3: 3 clusters at ANMI = 0.59768416631598"
## [1] "HGPA partitioning at K = 4: 4 clusters at ANMI = 0.614176459706577"
## [1] "MCLA partitioning at K = 2: 2 clusters at ANMI = 0.784102309002763"
## [1] "MCLA partitioning at K = 3: 3 clusters at ANMI = 0.970539568368452"
## [1] "MCLA partitioning at K = 4: 4 clusters at ANMI = 0.971666531448806"
## [1] "CSPA partitioning at K = 2: 2 clusters at ANMI = 0.601004834004939"
## [1] "CSPA partitioning at K = 3: 3 clusters at ANMI = 0.622097187347639"
## [1] "CSPA partitioning at K = 4: 4 clusters at ANMI = 0.590251500201678"
## [1] "Optimal number of clusters is 4 with ANMI = 0.971666531448806"
cluster.ensemble$Summary

cluster.ensemble$MCLA[1:10]

cluster.ensemble$MCLA_optimal_k

safeclustering's People

Contributors

yuchenyanggroup avatar yycunc avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.