sunduanchen / scissor Goto Github PK
View Code? Open in Web Editor NEWScissor package
License: GNU General Public License v3.0
Scissor package
License: GNU General Public License v3.0
Hello Duanchen,
Thank you for developing this tool to identify phenotype-relevant cell subsets. I have a question regarding the definition of the "Net" penalty. In the paper, the regularization item is shown as \lambda (\alpha |\beta|_1 + 1/2*\beta^T L \beta). But in APML0, they define the "Net" regularization as \lambda * \alpha * |\beta|_1 + 1/2 *\beta^T L \beta (\lambda is not applied for the Laplacian item). So, do you customize the solver to reflect this change? Thank you so much.
For reference, I paste the manual of APML0 here (https://cran.r-project.org/web/packages/APML0/APML0.pdf).
Thanks for developing this tool! I've been running scissor on some lung cancer data and the results are quite promising!
I have a question regarding using the survival data:
The survival is obviously highly confounded by the tumor stage. I get highly similar results when running scissor with survival data and with binary early stage/late stage data. Is there a way to include the tumor stage, or other covariates such as age, into the Cox-regression model, such that I can find cell-types that are associated with better survival independent of the tumor stage?
I have not been able to successfully implement a cut-off threshold while calling Scissor.
For example, if alpha = 0.05 leads to 40% of cells being selected by Scissor and I want to decrease the proportion of Scissor-selected cells
cutoff_threshold = 0.2
alpha_threshold = 0.05
info <- Scissor(bulk_dataset = duod_counts,
sc_dataset = seur_duod,
phenotype = phenotype,
alpha = alpha_threshold,
cutoff = cutoff_threshold,
tag = tag,
family = "gaussian",
Save_file = paste0('sbatch_outputs/grid_search/', date, alpha_threshold, "_", cutoff_threshold, protein, '.RData'))
The number and identity of cells selected is exactly the same as that chosen without the inclusion of the cutoff parameter.
Hi Doc. Sun,
I have a simple test to find out why I always get 0 Scissor cell even though a very small alpha is chosen. Then the test revealed what is described in the title.
I think that was not mentioned in your tutorial and may influences some fresh researcher like myself on using Scissor to explore individual data. So I open this issue for further disscussion.
Best regards,
JHovelly
Hi,
I have download the dataset, and want to try the scssior tool with the test data. However when I follow the Scissor_Tutorial, using the comannd like this : infos1 <- Scissor(bulk_dataset, sc_dataset, phenotype, alpha = 0.05,
family = "cox", Save_file = 'Scissor_LUAD_survival.RData')
I have encounted the erro:
Error in normalize.quantiles(dataset0) : ERROR; return code from pthread_create() is 22
I appreciate your time.
Best regards,
Qing
Hi,
Thanks for developing this great tool!
I am wondering what kind of microarray data should be used as input for Scissor. Will RMA normalized data without log transformed be acceptable? Thanks in advance!
Best,
Leo Lam
when i use the rawcount of RNA-seq data, the result is bad. Should i use log10(count+1) or log2(count+1) or TPM or FPKM? Which one is better? Thank you!
When I run Scissor function
res <- Scissor(as.matrix(expr), tumor, phenotype, tag = tag, alpha = 0.5,
family = "binomial", Save_file = "Scissor.RData")
error occurs
[1] "|**************************************************|"
[1] "Performing quality-check for the correlations"
[1] "The five-number summary of correlations:"
0% 25% 50% 75% 100%
0.3108529 0.5483579 0.5681650 0.5868047 0.6772004
[1] "|**************************************************|"
[1] "Current phenotype contains 7 No and 7 Yes samples."
[1] "Perform logistic regression on the given phenotypes:"
Error in matrix(NA, nrow = nfolds, ncol = numi2) :
invalid 'ncol' value (too large or NA)
In addition: Warning messages:
1: In matrix(sapply(outi, function(x) { :
data length [10] is not a sub-multiple or multiple of the number of rows [18023]
2: In matrix(sapply(outi, function(x) { :
data length [10] is not a sub-multiple or multiple of the number of rows [18023]
What happen in it? How to fix it?
Hi Duanchen,
Thanks for developing this interesting tools to discover important cell populations according to bulk data and scRNA-seq data! I met some issues and cannot run my own datasets: when I run Scissor on my created object, the error of "Scissor" step is "Error in array(x, c(length(x), 1L), if (!is.null(names(x))) list(names(x), : 'data' must be of a vector type, was 'NULL'"; if I use "Seurat_preprocessing" function, the R studio will be aborted when running "Scissor" function. So could you help me debug these issues please? Thanks so much!
Hello developers,
I got the following prompt when running the code:infos1 <- Scissor(bulkdata, scRNA, phenotype, alpha = 0.05,
family = "binomial", Save_file = 'Scissor_IPAH.RData').
Error in asMethod(object) :
Cholmod error 'out of memory' at file ../Core/cholmod_memory.c, line 146
How to sovle this problem?
Looking forward your reply
Thank you very much!
Hi,
Thank you so much for developing Scissor!
I prepaer to explore the Scissor when I read your paper.
I encounted an error in this step:
infos1 <- Scissor(bulk_dataset, sc_dataset, phenotype, alpha = 0.05, family = "cox", Save_file = 'Scissor_LUAD_survival.RData')
Error in normalize.quantiles(dataset0) : ERROR; return code from pthread_create() is 22
Any suggestions would be greatly appreciated! Thank you for your time!
Best regards,
skyfall !
Hi,
Thank you so much for developing Scissor!
I just read the paper and wanted to try the tool before testing in my own data.
I couldn't download the dataset for the test. I got this error:
Warning message in load(url(paste0(location, "scRNA-seq.RData"))):
“URL 'https://xialab.s3-us-west-2.amazonaws.com/Duanchen/Scissor_data/scRNA-seq.RData': Timeout of 60 seconds was reached”
Error in load(url(paste0(location, "scRNA-seq.RData"))): cannot read from connection
Traceback:
1. load(url(paste0(location, "scRNA-seq.RData")))
I appreciate your time.
Best regards,
Eijy
Thank you for developing such a powerful tool.
I run the Scissor function after loading files needed. But several times the function stopped in the state which is shown in the snapshot. And it only returned the result used for next step regression. I also found that the rstudio server crashed when this function is run in the stage as shown in the snapshot. I guessed there might be something unmatched.
Hello Scissor,
Thanks for developing this amazing package.
In the 'Detecting a hypoxic subpopulation related to worse survival' section of the Results in Nature Biotechnology paper, you input the clinical survival informaton of 471 TCGA-LUAD samples and found that 201 Scissor+ cells were associated with worse survival.
We're wondering how you define what survival is worse survival.如何定义什么是差预后?
What is the boundary between worse survival and good survival, living longer than 5 years as good survival, or the survival range of the top 50% people with the longest survival as good survival?差预后和好预后的界限在哪里?是否生存期>5年的定义为好预后?或者选择生存期最好的50%人群,将其表达矩阵作为好预后的特征矩阵,然后进行Pearson检验和COX回归?
If I manually make a matrix that only includes TCGA samples with very bad survival (like live shorter than 3 months), does Scissor still generate the good survival-related cells?如果我人为构建一个生存期均小于3个月的TCGA样本矩阵,Scissor是否仍将在单细胞数据中找到‘好预后’细胞和‘差预后’细胞?
Thanks!
Best,
YJ
Hi,
I am trying to use Scissor with family "binomial" on a dataset of bulk RNAseq with 43 samples (11 controls and 32 cases) and publicly available scRNAseq with 5092 cells. I receive the following error message
[1] "||"
[1] "Performing quality-check for the correlations"
[1] "The five-number summary of correlations:"
0% 25% 50% 75% 100%
0.3543059 0.4483379 0.4650750 0.4824858 0.6122901
[1] "||"
[1] "Current phenotype contains 11 healthy and 32 asthmatics samples."
[1] "Perform logistic regression on the given phenotypes:"
Error in matrix(NA, nrow = nfolds, ncol = numi2) :
invalid 'ncol' value (too large or NA)
In addition: Warning messages:
1: In Scissor(mat, sc_dataset, as.matrix(pheno), tag = tag, alpha = 0.2, :
NAs introduced by coercion
2: In matrix(sapply(outi, function(x) { :
data length [10] is not a sub-multiple or multiple of the number of rows [5093]
3: In matrix(sapply(outi, function(x) { :
data length [10] is not a sub-multiple or multiple of the number of rows [5093]
Could you please help?
Thank you
This looks like a recurrent problem where sparse matrices cannot be handled.
Any suggestions?
Hi!
Thank you for the nice tools.
In scissor, we use normalize.quantiles to normalize bulk and sc data.
While I am wondering if we can normalize original bulk and sc count data.Thanks a lot!
I am having an issue using Scissor to annotate bulk RNA sequencing (of paired tumor & normal samples) with single-cell tumor sequencing.
Here is my code:
## import bulk RNA seq data
bulk <- as.matrix(read.csv("gene_count_matrix.csv"))
# convert the bulk dataset to matrix
bulk_dataset <- as.matrix(bulk_dataset)
## import bulk phenotype data
pheno <- read.table("Scissor_pheno.txt")
## import single-cell multiome seq data
sc <- Read10X_h5('C:/path/to/filtered_feature_bc_matrix.h5')
sc <- sc$`Gene Expression`
# perform seurat pre-processing
sc_dataset <- Seurat_preprocessing(sc)
## run Scissor function
phenotype <- pheno
tag <- c('Normal', 'Tumor')
info <- Scissor(bulk_dataset, sc_dataset, phenotype, tag=tag, alpha=0.5, family="binomial")
This results in the following error:
Error in normalize.quantiles(dataset0) :
vector types do not match in copyVector
I have tried several different things (including manual installation of preprocessCore, using BiocManager::install("preprocessCore", configure.args="--disable-threading", force = TRUE)
...but nothing has worked. I also had issues with the normalize.quantiles() before I coerced the bulk dataset into a matrix.
Can someone advise on if I've made an error, and/or how to mitigate this issue?
I used the colorectal cancer sc_dataset and bulk data.I uesd the DFI infomation and cox regression.But when I used the function reliability.test(),the pvalue is na,and sometimes statistic is nan,I check the function,and find that due to sample distribution,I sample the same status all is 0.So I want to ask you how to solve this problem.I choose family = binomial or modify the function test_cox according to the annotation,annotating code index0,choose index1 and index2 as following.Scissor is a very important tool for me,thank you very much .I also sent email to you,sorry to bother you.
test_cox <- function(X, Y, network, alpha, cell_num, n = 100, nfold = 10){
set.seed(1)
m1 <- sum(Y[,2] == 1)
m2 <- sum(Y[,2] == 0)
#index0 <- sample(cut(seq(m1+m2), breaks = 10, labels = F))
index1 <- sample(cut(seq(m1), breaks = nfold, labels = F))
index2 <- sample(cut(seq(m2), breaks = nfold, labels = F))
print("|**************************************************|")
print("Perform cross-validation on X with true label")
c_index_test_real <- NULL
pb1 <- progress_bar$new(total = nfold)
for (j in 1:nfold){
#c_index <- which(index0 == j)
c_index <- c(which(Y[,2] == 1)[which(index1 == j)], which(Y[,2] == 0)[which(index2 == j)])######I changed here
X_train <- X[-c_index,]
Y_train <- Y[-c_index,]
fit <- NULL
while (is.null(fit$fit)){
set.seed(123)
fit <- APML1(X_train, Y_train, family = "cox", penalty = "Net", alpha = alpha, Omega = network, nlambda = 100)
}
index <- which.min(abs(fit$fit$nzero - cell_num))
Coefs <- as.numeric(fit$Beta[,index])
Cell1 <- Coefs[which(Coefs > 0)]
Cell2 <- Coefs[which(Coefs < 0)]
X_test <- X[c_index,]
Y_test <- Y[c_index,]
test_data <- data.frame(cbind(Y_test, X_test%*%Coefs))
colnames(test_data) <- c("OS_time", "Status", "Prediction")
res.cox <- coxph(Surv(OS_time, Status) ~ Prediction, data = test_data)
c_index_test_real[j] <- concordance(res.cox)$concordance
pb1$tick()
Sys.sleep(1 / 100)
if (j == nfold) cat("Finished!\n")
}
print("|**************************************************|")
print("Perform cross-validation on X with permutated label")
c_index_test_back <- list()
pb2 <- progress_bar$new(total = n)
for (i in 1:n){
set.seed(i+100)
c_index_test_back[[i]] <- matrix(0, nfold, 1, dimnames = list(paste0("Testing_", 1:nfold), "Concordance"))
Y2 <- Y[sample(nrow(Y)),]
for (j in 1:nfold){
#c_index <- which(index0 == j)
c_index <- c(which(Y2[,2] == 1)[which(index1 == j)], which(Y2[,2] == 0)[which(index2 == j)])######I changed here
X_train <- X[-c_index,]
Y_train <- Y2[-c_index,]
fit <- NULL
while (is.null(fit$fit)){
set.seed(123)
fit <- APML1(X_train, Y_train, family = "cox", penalty = "Net", alpha = alpha, Omega = network, nlambda = 100)
}
index <- which.min(abs(fit$fit$nzero - cell_num))
Coefs <- as.numeric(fit$Beta[,index])
Cell1 <- Coefs[which(Coefs > 0)]
Cell2 <- Coefs[which(Coefs < 0)]
X_test <- X[c_index,]
Y_test <- Y2[c_index,]
test_data <- data.frame(cbind(Y_test, X_test%*%Coefs))
colnames(test_data) <- c("OS_time", "Status", "Prediction")
res.cox <- coxph(Surv(OS_time, Status) ~ Prediction, data = test_data)
c_index_test_back[[i]][j] <- concordance(res.cox)$concordance
}
pb2$tick()
Sys.sleep(1 / 100)
if (i == n) cat("Finished!\n")
}
statistic <- mean(c_index_test_real)
background <- NULL
for (i in 1:n){
background[i] <- mean(c_index_test_back[[i]][,1])
}
p <- sum(background > statistic)/n
print(sprintf("Test statistic = %s", formatC(statistic, format = "f", digits = 3)))
print(sprintf("Reliability significance test p = %s", formatC(p, format = "f", digits = 3)))
return(list(statistic = statistic,
p = p,
c_index_test_real = c_index_test_real,
c_index_test_back = c_index_test_back))
}
Hi,
Thank you for developing this awesome tool. Though the demo data went through, I encountered an error while running my data. Here is the code I used and the error message:
infos1 <- Scissor(bulk_dataset, sce, phenotype, alpha = 0.05,
+ family = "cox", Save_file = 'survival.RData')
Error in asMethod(object) :
Cholmod error 'problem too large' at file ../Core/cholmod_dense.c, line 102
Too many cells in the sce
object might caused this error? I have 22432 features and 47284 cells in the sce
object.
Can you solved for my, please?
Thank you again!
Best,
Jianxiang
Hi,
Thank you for developing Scissor.
I ran about 130,000 cells and received a memory error message.
'Error: cannot allocate vector of size 132.0 Gb'
Then, I am thinking about running the cells in subsets and then combining the results instead of running all 130,000 cells at once.
Is it okay to combine the results after performing them by subset?
I'm worried that the results may differ from when I perform all the cells at once.
Thank you for your help.
Sincerely,
Ji-Hye Choi
Thank you for your algorithm,
I always get 0 Scissor cells, no matter how I change the parameter. I found that in this part of code
fit0 <- APML1(X, Y, family = family, penalty = "Net",
alpha = alpha[i], Omega = network, nlambda = 100,
nfolds = min(10, nrow(X)))
fit1 <- APML1(X, Y, family = family, penalty = "Net",
alpha = alpha[i], Omega = network, lambda = fit0$lambda.min)
fit0$lambda.min get NULL, so I changed it to min(fit0$fit$lambda) and successfully solved this problem.
All the best,
J Hovelly
Hi there,
When I ran the reliability.test in my own data, I encountered an error as below:
Error in roc.default(Y_test, score_test, direction = "<", quiet = T): 'response' must have two levels
Traceback:
1. reliability.test(X, Y, network, alpha = 0.05, family = "binomial",
. cell_num = numbers, n = 10, nfold = 10)
2. test_logit(X, Y, network, alpha, cell_num, n, nfold)
3. roc(Y_test, score_test, direction = "<", quiet = T)
4. roc.default(Y_test, score_test, direction = "<", quiet = T)
5. stop("'response' must have two levels")
May I ask your help for resolving the issue?
Thanks,
Logan
my scRNA is not tumor data,there is no bulk data, I see that your demo that need scRNA data, bulk data and phenotype , so what should i do if i want to cite scissor?
Thank you for your excellent algorithm,
Can I use metacell data, which means a metacell is a sum of dozens of single cells, as the imput. In my opinion, metacell data can improve the correlation between single cells data and bulk data because dropout effect can be decreased. It is appreciated if you could answer my question!
Best wishes!
J Hovelly.
Hello Scissor,
My scRNA-seq workflow is based on Scanpy. After Scanpy proceeded with my data, I saved it into h5ad file. Then, I use R Markdown to convert my hd5ad file into a Seurat object. Then run Scissor on it. However, I got Error in array(x, c(length(x), 1L), if (!is.null(names(x))) list(names(x), : 'data' must be of a vector type, was 'NULL'
.
Could you please help me with this issue? We'll cite your paper and make the acknowledgment when we publish our data. I've shared the h5ad file and its converted rds file with your gmail.
Thanks!
Best,
YJ
attach R Markdown code
# python code in R Markdown
import numpy as np
import pandas as pd
import scanpy as sc
ACT_sub2 = sc.read('C:/Users/Park_Lab/Documents/ACT_sub2.h5ad')
sc.pl.umap(ACT_sub2, color='leiden', legend_loc='right margin', title='', frameon=False, use_raw=False)
# all below is R code in R Markdown
library(reticulate)
Sys.setenv(LANG = "en_US")
library(Seurat)
library(ggplot2)
library(sctransform)
library(dplyr)
library(patchwork)
library(DT)
memory.limit(512000)
options(future.globals.maxSize = 256000*1024^2)
# Get the expression matrix
exprs <- t(py$ACT_sub2$X)
colnames(exprs) <- py$ACT_sub2$obs_names$to_list()
rownames(exprs) <- py$ACT_sub2$var_names$to_list()
# Create the Seurat object
seurat <- CreateSeuratObject(exprs)
# Set the expression assay
seurat <- SetAssayData(seurat, "data", exprs)
# Add observation metadata
seurat <- AddMetaData(seurat, py$ACT_sub2$obs)
# Add fetaure metadata
seurat[["RNA"]][["n_cells"]] <- py$ACT_sub2$var["n_cells"]
# Add embedding
embedding <- py$ACT_sub2$obsm["X_umap"]
rownames(embedding) <- py$ACT_sub2$obs_names$to_list()
colnames(embedding) <- c("umap_1", "umap_2")
seurat[["umap"]] <- CreateDimReducObject(embedding, key = "umap_")
seurat
ACT_sub2 <- SetIdent(seurat, value = "leiden")
DimPlot(ACT_sub2, reduction = "umap")
# run Scissor
phenotype <- TCGA_phenotype_input
tag <- c('0', '1') # name the phenotype 0 (uninterested) first, then phenotype 1 (interested)
GeneMatrix_MGI_input <- as.matrix(GeneMatrix_MGI) # conver gene martix (data.frame) to 'matrix'
infos <- Scissor(bulk_dataset=GeneMatrix_MGI_input, sc_dataset=ACT_sub2, phenotype=phenotype, tag = tag, alpha = 0.3, family = "binomial", Save_file = "ACT_sub2_Scissor.RData") # alpha should be tested to make Scissor selected cells<20%
Hello developers,
I have some problem when i use scissor.I try to use scissor to choose cancer cell related to EGFR mutation in LUAD samples.I choose random several samples as a whole rds as input file and i have a result. Meanwhile, i use these samples dependent rds and have others result. Its confusing that the two result are quite different and scissor choose more than 50 positive and negative cells in each dependent rds.
Which type of rds should i use as the input file for scissor?
Looking forward your reply
Thank you very much!
I got a error when running Scissor function:
Error: cannot allocate vector of size 24.6 Gb
But the memory size of my computer is 64 Gb.
I am using R 4.2.3
Can anyone help to figure out what happened there and any solutions to fix it?
Hello, I'm testing Scissor with several continuous dependent variables.
info <- Scissor(bulk_dataset = bulk_counts,
sc_dataset = seur_dataset,
phenotype = phenotype,
alpha = NULL,
cutoff = 0.2,
tag = phenotype,
family = "gaussian",
Save_file = 'info.RData') `
My phenotypes are named numeric vectors as follows, a (successful) example:
A_07_L01_01 A_07_L01_03 A_07_L01_04 B_01_L01_04 B_01_L01_05 B_02_L01_08 B_02_L01_09 B_02_L01_10 C_06_L01_01
13.00 12.00 17.20 12.40 6.57 12.80 8.02 23.50 13.10
C_06_L01_02 C_07_L01_04 C_07_L01_06 B_01_L01_01 B_01_L01_02 B_01_L01_03 B_02_L01_06 B_02_L01_07 C_06_L01_03
13.60 8.29 6.29 14.40 16.00 19.30 8.82 11.20 15.90
A_03_L01_01 A_03_L01_02 A_03_L01_03 A_03_L01_04 A_08_L01_05 A_08_L01_06 A_08_L01_07
15.60 12.50 17.50 26.40 25.40 17.60 30.70
Most of the time, the above Scissor command runs successfully. Occasionally, it does not and I receive the following error:
[1] "||"
[1] "Performing quality-check for the correlations"
[1] "The five-number summary of correlations:"
0% 25% 50% 75% 100%
0.01839713 0.26478574 0.32455696 0.37402355 0.52673473
[1] "||"
Error in Scissor(bulk_dataset = bulk_counts, sc_dataset = seur_dataset, :
The length differs between tags and phenotypes. Please check Scissor inputs and selected regression type.
For context, the below phenotype led to an error:
A_07_L01_01 A_07_L01_03 A_07_L01_04 B_01_L01_04 B_01_L01_05 B_02_L01_08 B_02_L01_09 B_02_L01_10 C_06_L01_01
10.20 11.40 10.10 7.02 3.42 6.34 8.67 13.30 4.98
C_06_L01_02 C_07_L01_04 C_07_L01_06 B_01_L01_01 B_01_L01_02 B_01_L01_03 B_02_L01_06 B_02_L01_07 C_06_L01_03
6.05 4.61 5.52 8.41 15.60 13.30 6.44 8.73 7.41
A_03_L01_01 A_03_L01_02 A_03_L01_03 A_03_L01_04 A_08_L01_05 A_08_L01_06 A_08_L01_07
9.02 8.87 14.10 9.90 9.79 10.30 11.90
Thanks for your advice.
Hello,
I've been trying to run scissor with a TCGA dataset and while the survival analyses work fine, when I try to use the binomial family to assess a phenotype it selects 0 cells. I tried it with a couple different phenotypes of interest and still kept getting 0%.
I'm wondering if there's another way to analyze the data (the tutorial didn't really go over using Gaussian analyses so I'm unsure how to proceed) or if the issue is the phenotype data itself that is just not statistically powerful enough (maybe I need more samples).
my phenotype of interest has 126 + and 27 - (153 TCGA samples total) for a sc data set of 7500 cells. scissor analysis with survival using family = cox gets me about 21% cells selected (alpha = 0.05). But when I input the phenotype data using family = binomial I always get 0% no matter the alpha.
Let me know if you need any more info/code from me (I'm fairly new to bioinformatics)
Thanks!
Hello Scissor,
I'm wondering whether the alpha value should be the same between the below 2 functions. According to your tutorial, we need to optimize the best alpha in Scissor()
. Do we need to change it accordingly in reliability.test()
?
Scissor(bulk_dataset, sc_dataset, phenotype, tag = tag, alpha = 0.3, family = "binomial")
reliability.test(X, Y, network, alpha = 0.05, family = "binomial", cell_num = numbers, n = 100, nfold = 10)
Thanks!
Best,
YJ
Hello @sunduanchen
We performed several rounds of Scissor analysis, but we still don't know what is the best rule for determining the best percentage of Scissor selected cells?
For example, we set alpha=0.2
and got The percentage of selected cell is: 4.299%
. Then set alpha=0.15
and got The percentage of selected cell is: 8.929%
. Both alpha=0.2
and alpha=0.15
passed the reliability significance test (we didn't do Cell level evaluations). So, which alpha value shall we use? We cannot decide because both of them are right.
Any suggestions will be appreciated!
Thanks!
Best,
YJ
Hi!
Thanks for the great tools!
When I read the Scissor Tutorial of section "Reliability significance test", I was confused about "result1 <- reliability.test(X, Y, network, alpha = 0.05, family = "cox", cell_num = numbers, n = 10, nfold = 10)".
Where can I get the value of "X","Y", and "network"? I only know "numbers" are from "infos1" variable.
Best,
Faming
Hi, Scissor is a really useful and important tool of single cell data analysis, I really believe our team can found something new using your gorgeous R package.
I can run your tutorial data successfully, however, I got a error when using my data:
infos4 <- Scissor(bulkdata, ScData, phenotype, tag = tag, alpha = 0.5,
family = "binomial", Save_file = "Scissor_disease.RData")
- Error in Scissor(bulkdata, ScData, phenotype, tag = tag, alpha = 0.5, :
- There is no common genes between the given single-cell and bulk samples.
Can you help to figure out what happened there? Thanks a lot!
Thanks for your wonderful packages. However, it seems unable to used at an integerated seurat object. After read the presubmited issues, I had know that the key parameters were 'data' and ‘RNA_snn”;
So,Could I change the "CCA_snn" (aquired by an integrated Seurat object) into 'RNA_snn' and use the RNA assy aquire 'data' slot?
Wishs for your reply. Thanks!
Hi!
Thank you for the nice tools.
I was wondering what should the tag be like if I use Scissor for a continuous phenotype, like tumor size in the follow image?
I use:
Z = table(pheno$TumorSizeMax)
tag = names(Z)
The code works, but I get neither Scissor positive nor negative cells.
Is the tag correct for linear regression?
Thanks very much!
Hello, I have the following problem when dealing with my data:
Cholmod error 'problem too large' at file .. /Core/cholmod_dense.c, line 102
infos4 <- Scissor(bulk_dataset, sc_dataset, phenotype, tag = tag, alpha = 0.5,
family = "binomial",
Save_file = "Scissor_LUAD_TP53_mutation.RData")
Error in array(x, c(length(x), 1L), if (! is.null(names(x))) list(names(x), :
'data' must be of a vector type, was 'NULL'
I don't know if it's because my data is integrated. I want to try to run
sc_dataset <-Seurat_preprocessing(sc_dataset, verbose = F)
I still get errors
Error in as.vector(data) :
no method for coercing this S4 class to a vector
I was tring using Scissor with gaussian family, my input phenotype is a numeric vector(length is 176), As I am concerned, the tag may be should be a name (whose length should be 1). And it gives the length error as the title suggests.
I turned to the source code found that
if (family == "gaussian") {
Y <- as.numeric(phenotype)
z <- table(Y)
if (length(z) != length(tag)) {
stop("The length differs between tags and phenotypes. Please check Scissor inputs and selected regression type.")
}
else {
tmp <- paste(z, tag)
print(paste0("Current phenotype contains ",
paste(tmp[1:(length(z) - 1)], collapse = ", "),
", and ", tmp[length(z)], " samples."))
print("Perform linear regression on the given phenotypes:")
}
}
I am confused by the applying the table function for phenotype. It does make sense in logistic regression. However, I am not sure why use it in gaussian kernel.
Really appreciate any explaination
Hi,
Thank you for developing this great tool.
Do you recommend log2TPM or TPM for bulk RNA seq data? I think this results in different phenotype association.
Thanks!
Hello Scissor,
I'm running Scissor, but no matter what alpha I used, it always came out 0 selected cells.
Could you please help me with this issue?
Thanks!
Best,
Yuanjian
infos <- Scissor_change(bulk_dataset=GeneMatrix_HGNC_input, sc_dataset=Merge2.combined, phenotype=phenotype, tag = tag, alpha = seq(1,5,1)/10000000, cutoff = 0.01, family = "binomial", Save_file = "Merge2_Scissor.RData")
[1] "|**************************************************|"
[1] "Performing quality-check for the correlations"
[1] "The five-number summary of correlations:"
0% 25% 50% 75% 100%
-0.04986036 0.14333164 0.23372394 0.35075314 0.79686469
[1] "|**************************************************|"
[1] "Current phenotype contains 48 Wild type and 17 Mutated samples."
[1] "Perform logistic regression on the given phenotypes:"
[1] "alpha = 1e-07"
[1] "Scissor identified 0 Scissor+ cells and 0 Scissor- cells."
[1] "The percentage of selected cell is: 0.000%"
Got a error when running your tutorial using your data:
Error in normalize.quantiles(dataset0) : ERROR; return code from pthread_create() is 22
I am using R 4.1.1
Can you help to figure out what happened there?
Hi, I've been trying to run Seurat_preprocessing on my own sc dataset but it keeps giving me this error:
sc_dataset <- Seurat_preprocessing(sc_dataset, verbose = F)
Warning in storage.mode(from) <- "double" : NAs introduced by coercion
restarting interrupted promise evaluation Error in $<-.data.frame
(*tmp*
, "variance.expected", value = 0) :
replacement has 1 row, data has 0
I'm fairly new to bioinformatics so any help would be greatly appreciated!
Hi
I followed the instructions to install the scissor package and reported an error
Warning message:
In i.p(...) :
installation of package ‘C:/Users/18435/AppData/Local/Temp/RtmpULhMMS/file3a305c6965d3/Scissor_2.0.0.tar.gz’ had non-zero exit status
can you help me resolve the problem?
Thank you very much for your elegant tool!
I have a multiple subgroup and want to use Scissor for identifing different groups related cells, how should I use this tool?
Thank you again!
Hi,
Thanks for developping this interesting tool.
It seems that the error:
Error in normalize.quantiles(dataset0) : vector types do not match in copyVector
is quite common.
I have tried :
BiocManager::install("preprocessCore", configure.args="--disable-threading", force = TRUE)
But it is not working...
Could you please give some help on ow to debug it ?
Many thanks.
Jaime.
Hi, I tried to install Scissor on windows and linux R platforms, but ran into the following installation problems:
#Linux
install.packages("sunduanchen-Scissor-311560a.tar.gz",repos=NULL,type="source",dependencies=T)
Installing package into '/root/R/x86_64-pc-linux-gnu-library/4.1'
(as 'lib' is unspecified)
Warning in untar2(tarfile, files, list, exdir, restore_times) :
skipping pax global extended headers
#Windows
devtools::install_github('sunduanchen/Scissor')
Downloading GitHub repo sunduanchen/Scissor@HEAD
Error in utils::download.file(url, path, method = method, quiet = quiet, :
download from 'https://api.github.com/repos/sunduanchen/Scissor/tarball/HEAD' failed
install.packages("sunduanchen-Scissor-311560a.tar.gz",repos=NULL,type="source",dependencies=T)
Warning in untar2(tarfile, files, list, exdir, restore_times) :
skipping pax global extended headers
*** arch - i386
C:\rtools40\mingw32\bin\nm.exe: RcppExports.o: file format not recognized
C:\rtools40\mingw32\bin\nm.exe: Scissor.o: file format not recognized
C:/rtools40/mingw32/bin/g++ -std=gnu++11 -shared -s -static-libgcc -o Scissor.dll tmp.def RcppExports.o Scissor.o -LC:/PROGRA1/R/R-411.3/bin/i386 -lR
RcppExports.o: file not recognized: file format not recognized
collect2.exe: error: ld returned 1 exit status
no DLL was created
ERROR: compilation failed for package 'Scissor'
Can you help solve this problem? Thanks!
Hi!
Thank you for the nice tools.
I was using scissor for a big dataset, it seems the "as.matrix" in "network <- as.matrix(sc_dataset@graphs$RNA_snn)" didn't suit for a big sparse matrix, so I changed the ‘as.matrix’ to suit for big dataset, the matrix I got is the same as using "as.matrix" when the data is not too large.
The Omega seems to be right, However, after the 'W=OmegaC(Omega, sgn1); W$loc=W$loc+1', the W seems to be all zeros. In fact I was not familiar with Rcpp, so I was wondering whether there is something like 'as.matrix' in 'OmegaC' which is not suitable for big dataset?
Can you help to figure out what happend there, Thanks a lot!
When I run in our own Seurat object, it occurs error
infos1 <- Scissor(bulk_dataset, tumor, phenotype, alpha = 0.05,
family = "cox", Save_file = './processed/Scissor_survival.RData')
Error in array(x, c(length(x), 1L), if (!is.null(names(x))) list(names(x), :
'data' must be of a vector type, was 'NULL'
But when we use Seurat_preprocessing before, it run success
tumor<- Seurat_preprocessing(seurat@assays$SCT@counts, verbose = F)
infos1 <- Scissor(bulk_dataset, tumor, phenotype, alpha = 0.05,
family = "cox", Save_file = './processed/Scissor_survival.RData')
Is there any bug in applying Scissor function to our own seurat object?
Hi guy, how to use CAMERA (function name: cameraPR) in the limma R package (v3.42.2) in the Pathway enrichment analysis?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.