GithubHelp home page GithubHelp logo

scibetr's Introduction

scibetR

Pure R version of scibet, a portable and fast single cell type identifier. It takes longer than the original functions in scibet.

Installation Guide

Installing scibetR
To install scibetR, run:

if (!requireNamespace("devtools", quietly = TRUE)) install.packages("devtools")
devtools::install_github("zwj-tina/scibetR")

Tutorial

Library

library(ggplot2)
library(tidyverse)
library(scibetR)
library(viridis)
library(ggsci)

Load the data

For expression matrix (TPM), rows should be cells and columns should be genes but the last column should be "label" for each cell.

path_da <- "~/test.rds.gz"
expr <- readr::read_rds(path = path_da) 

E(ntropy)-test for supervised gene selection

etest_gene <- SelectGene_R(expr, k = 50)
etest_gene

scibetR: Single Cell Identifier Based on Entropy Test

  1. For reference set, rows should be cells, column should be genes and the last column should be "label" (TPM).
  2. For query set, rows should be cells and column should be genes (TPM). example:
tibble(
  ID = 1:nrow(expr),
  label = expr$label
) %>%
  dplyr::sample_frac(0.7) %>%
  dplyr::pull(ID) -> ID

train_set <- expr[ID,]      #construct reference set
test_set <- expr[-ID,]      #construct query set

prd <- SciBet_R(train_set, test_set[,-ncol(test_set)])

False positive control

Due to the incomplete nature of reference scRNA-seq data collection, cell types excluded from the reference dataset may be falsely predicted to be a known cell type. By applying a null dataset as background, SciBet controls the potential false positives while maintaining high prediction accuracy for cells with types covered by the reference dataset (positive cells). For the purposes of this example, these three datasets are used to get started.

null <- readr::read_rds('~/null.rds.gz')
reference <- readr::read_rds('~/reference.rds.gz')
query <- readr::read_rds('~/query.rds.gz')

For query set, “negative cells” account for more than 60%.

ori_label <- query$label
table(ori_label)

The confidence score of each query cell is calculated with the function conf_score_R.

query <- query[,-ncol(query)]
c_score <- conf_score_R(ref = reference, query = query, null_expr = null, gene_num = 500)

Entropy calculation

Compute expression entropy. expr,The expression dataframe. Rows should be cells and columns should be genes. window, The window size for expression value discretization. low The lower limit for normalizing expression entropy

ent_res <- Entropy_R(expr,window=120,low=2000)

return

# A tibble: 11,516 x 5
   gene  mean.expr entropy    fit norm_ent
   <chr>     <dbl>   <dbl>  <dbl>    <dbl>
 1 A2M       63.2    1.26  0.191    0.248 
 2 AAAS      73.9    1.03  0.210    0.204 
 3 AACS       8.73   0.412 0.0419   0.0813
 4 AAED1     37.8    0.631 0.136    0.124 
 5 AAGAB     65.7    1.18  0.196    0.233 
 6 AAK1      13.8    0.683 0.0622   0.135 
 7 AAMDC     13.1    0.414 0.0596   0.0817
 8 AAMP     159.     1.63  0.316    0.322 
 9 AAR2      49.3    0.952 0.163    0.188 
10 AARS      39.6    0.922 0.141    0.182 
# … with 11,506 more rows

LoadModel

x, A SciBet model in the format of a matrix as the trained reference.To facilitate matrix multiplication in the process, its rows are genes and columns are labels.

y <- LoadModel_R(x,genes=NULL,labels=Null)

return a function as

Bet_R <- function(expr, result="list"){
}

correction

add %>% in line 312 in Marker_heatmap() function

scibetr's People

Contributors

zwj-tina avatar

Stargazers

 avatar chao zhang avatar Xu Xizhan avatar allisonvmitch avatar Xue Huiwen avatar  avatar  avatar

Watchers

 avatar

Forkers

healthvivo

scibetr's Issues

Error in out[1:k, ] : incorrect number of dimensions

Hi, I met an error when using the package.
expr = as.data.frame(t(counts(ref))) # the reference dataset, rows are cells and columns are genes
expr = expr %>%
mutate(group_ref) # add the label column
que = as.data.frame(t(counts(sce))) # the query dataset, , rows are cells and columns are genes
prd <- SciBet_R(expr, que, k=100)

Error in out[1:k, ] : incorrect number of dimensions

I don't know what's wrong with my code, could you please help me solve the issue?

关于LoadModel_R 函数的问题

根据scibet文档,model的输入格式为行为样本,列为基因名,最后一列为label的矩阵。
但是根据loadmodel_R的描述,似乎需要进行转置变化,得到一个rownames = samplename ,colnames=genename的矩阵。

function (x, genes = NULL, labels = NULL)
{
prob <- x
if (is.null(genes))
genes <- rownames(x)
if (is.null(labels))
labels <- colnames(x)
function(expr, result = "list") {
have_genes <- intersect(genes, colnames(expr))
expra <- log1p(as.matrix(expr[, have_genes]))/log(2)
switch(result, list = Gambler_R(expra, prob[have_genes,
], FALSE), table = {
out <- Gambler_R(expra, prob[have_genes, ], TRUE)
rownames(out) <- have_genes
return(out)
})
}

性能明显比作者所说的高很多啊

我跑了好几个数据集,不是很大的数据,准确率居然也有98往上,有人知道这是为什么吗,3000-4000的数据集,有四个这样的数据。统计的acc是,看起来很优秀的一个方法啊,我跑了BERT也是,是对这种小的数据集有天然的优势吗?
image

Add sparse matrix support

test_j <- SciBet_R(mat, iD_m)
Error in expr$label : $ operator not defined for this S4 class
I try and meet this problem and I think these code change as follows could solve these problems.

train and test are both sparse matrix

train_label <- factor(CellType, levels = unique(CellType))

SciBet_R <- function (train, train_label, test, k = 1000, result = "list")
{
Learn_R(train, train_label, NULL, k)(test, result)
}

Learn_R <- function (expr, train_label, geneset = NULL, k = 1000)
{
labels <- train_label
if (is.null(geneset)) {
geneset <- SelectGene_R(expr, k, train_label)
}
labell <- levels(labels)
expr_select <- expr[, geneset]
label_total <- matrix(0, length(geneset), 1)
for (i in labell) {
label_TPM <- expr_select[labels == i, ]
label_mean <- colSums(log2(label_TPM + 1))
a <- matrix(label_mean, , 1)
colnames(a) <- i
label_total <- cbind(label_total, a)
}
label_total <- label_total[, -1]
rownames(label_total) <- geneset
label_t <- t(label_total)
prob <- log2(label_t + 1) - log2(rowSums(label_t) + length(geneset))
prob <- t(prob)
genes <- rownames(prob)
function(test, result = "list") {
have_genes <- intersect(genes, colnames(test))
testa <- log1p(as.matrix(test[, have_genes]))/log(2)
switch(result, list = Gambler_R(testa, prob[have_genes,
], FALSE), table = {
out <- Gambler_R(testa, prob[have_genes, ], TRUE)
rownames(out) <- have_genes
return(out)
})
}
}

SelectGene_R <- function (expr, k = 1000, train_label)
{
labels <- train_label
labels_set <- levels(labels)
label_total <- matrix(0, ncol(expr) - 1, 1)
for (i in labels_set) {
label_TPM <- expr[train_label == i, ][, -ncol(expr)]
label_mean <- colMeans(label_TPM)
a <- matrix(label_mean, , 1)
colnames(a) <- i
label_total <- cbind(label_total, a)
}
label_total <- label_total[, -1]
log_E <- log2(rowMeans(label_total + 1))
E_log <- rowMeans(log2(label_total + 1))
t_scores <- log_E - E_log
out <- cbind(label_total, t_scores)
rownames(out) <- colnames(expr)[-ncol(expr)]
out <- out[order(-out[, "t_scores"]), ]
select <- out[1:k, ]
out <- rownames(select)
return(out)
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.