GithubHelp home page GithubHelp logo

simlr-1's Introduction

SIMLR (Single-cell Interpretation via Multi-kernel LeaRning)

Branch Stato CI Code Coverage
master Build Status codecov.io
development Build Status codecov.io

OVERVIEW

Single-cell RNA-seq technologies enable high throughput gene expression measurement of individual cells, and allow the discovery of heterogeneity within cell populations. Measurement of cell-to-cell gene expression similarity is critical to identification, visualization and analysis of cell populations. However, single-cell data introduce challenges to conventional measures of gene expression similarity because of the high level of noise, outliers and dropouts. We develop a novel similarity-learning framework, SIMLR (Single-cell Interpretation via Multi-kernel LeaRning), which learns an appropriate distance metric from the data for dimension reduction, clustering and visualization. SIMLR is capable of separating known subpopulations more accurately in single-cell data sets than do existing dimension reduction methods. Additionally, SIMLR demonstrates high sensitivity and accuracy on high-throughput peripheral blood mononuclear cells (PBMC) data sets generated by the GemCode single-cell technology from 10x Genomics.

SIMLR

SIMLR offers three main unique advantages over previous methods: (1) it learns a distance metric that best fits the structure of the data via combining multiple kernels. This is important because the diverse statistical characteristics due to large noise and dropout effect of single-cell data produced today do not easily fit specific statistical assumptions made by standard dimension reduction algorithms. The adoption of multiple kernel representations provides a better fit to the true underlying statistical distribution of the specific input scRNA-seq data set; (2) SIMLR addresses the challenge of high levels of dropout events that can significantly weaken cell-to-cell similarities even under an appropriate distance metric, by employing graph diffusion, which improves weak similarity measures that are likely to result from noise or dropout events; (3) in contrast to some previous analyses that pre-select gene subsets of known function, SIMLR is unsupervised, thus allowing de novo discovery from the data. We empirically demonstrate that SIMLR produces more reliable clusters than commonly used linear methods, such as principal component analysis (PCA), and nonlinear methods, such as t-distributed stochastic neighbor embedding (t-SNE), and we use SIMLR to provide 2-D and 3-D visualizations that assist with the interpretation of single-cell data derived from several diverse technologies and biological samples.

REFERENCE

The latest draft of thr manuscript related to SIMLR can be found as a preprint at http://biorxiv.org/content/early/2016/06/09/052225.

DOWNLOAD

We provide both the R and MATLAB implementations of SIMLR in the SIMLR branch, while the master (stable version) or the development (development version) branches provide the version of SIMLR available on Bioconductor.

RUNNING SIMLR R IMPLEMENTATION

We provide the R code to run SIMLR on 4 examples in the script main_examples.R. We now present a set of requirements to run the examples.

  1. Required R libraries. SIMLR requires 2 R packages to run, namely the Matrix package (see https://cran.r-project.org/web/packages/Matrix/index.html) to handle sparse matrices and the parallel package (see https://stat.ethz.ch/R-manual/R-devel/library/parallel/doc/parallel.pdf) for a parallel implementation of the kernel estimation.

Furthermore, to run the examples, we require the igraph package (see http://igraph.org/r/) to compute the normalized mutual informetion metric and the grDevices package (see https://stat.ethz.ch/R-manual/R-devel/library/grDevices/html/00Index.html) to color the plots.

All these packages, can be installed with the R built-in install.packages function.

  1. External C code. We make use of an external C program during the computations of SIMLR. The code is located in the R directory in the file projsplx_R.c. In order to compite the program, one needs to run on the shell the command R CMD SHLIB -c projsplx_R.c.

An OS X pre-compiled file is also provided. Note: if there are issues in compiling the .c file, try to remove the pre-compiled files (i.e., projsplx_R.o and projsplx_R.so).

  1. Example datasets. The 4 example datasets are provided in the directory data.

Specifically, the dataset of Test_1_mECS.RData refers to http://www.ncbi.nlm.nih.gov/pubmed/25599176, Test_2_Kolod.RData refers to http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4595712/, Test_3_Pollen.RData refers to http://www.ncbi.nlm.nih.gov/pubmed/25086649 and Test_4_Usoskin.RData refers to http://www.ncbi.nlm.nih.gov/pubmed/25420068.

RUNNING SIMLR MATLAB IMPLEMENTATION

We also provide the MATLAB code to run SIMLR on 4 examples in the script main_demo.m.

We make use of external C programs during the computations of SIMLR. The code is located in the MATLAB directory in the files Kbeta.cpp and projsplx_c.c. In order to compite the program, one needs to run on the MATLAB console the commands mex Kbeta.cpp and mex projsplx_R.c.

OS X pre-compiled files are also provided.

simlr-1's People

Contributors

danro9685 avatar luca-dex avatar junjiezhujason avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.