sunduanchen / scissor Goto Github PK

View Code? Open in Web Editor NEW

175.0 3.0 31.0 29.22 MB

Scissor package

License: GNU General Public License v3.0

R 30.67% C 0.10% C++ 69.22%

single-cell-analysis bioinformatics-tool data-mining

scissor's People

Contributors

Stargazers

Watchers

scissor's Issues

Inconsistent definition of "Net" penalty between Scissor and APML0

Hello Duanchen,

Thank you for developing this tool to identify phenotype-relevant cell subsets. I have a question regarding the definition of the "Net" penalty. In the paper, the regularization item is shown as \lambda (\alpha |\beta|_1 + 1/2*\beta^T L \beta). But in APML0, they define the "Net" regularization as \lambda * \alpha * |\beta|_1 + 1/2 *\beta^T L \beta (\lambda is not applied for the Laplacian item). So, do you customize the solver to reflect this change? Thank you so much.

For reference, I paste the manual of APML0 here (https://cran.r-project.org/web/packages/APML0/APML0.pdf).

Including covariates in coxph regression

Thanks for developing this tool! I've been running scissor on some lung cancer data and the results are quite promising!

I have a question regarding using the survival data:
The survival is obviously highly confounded by the tumor stage. I get highly similar results when running scissor with survival data and with binary early stage/late stage data. Is there a way to include the tumor stage, or other covariates such as age, into the Cox-regression model, such that I can find cell-types that are associated with better survival independent of the tumor stage?

cutoff parameter does not limit number of cells chosen

I have not been able to successfully implement a cut-off threshold while calling Scissor.

For example, if alpha = 0.05 leads to 40% of cells being selected by Scissor and I want to decrease the proportion of Scissor-selected cells

cutoff_threshold = 0.2
alpha_threshold = 0.05

info <- Scissor(bulk_dataset = duod_counts,
sc_dataset = seur_duod,
phenotype = phenotype,
alpha = alpha_threshold,
cutoff = cutoff_threshold,
tag = tag,
family = "gaussian",
Save_file = paste0('sbatch_outputs/grid_search/', date, alpha_threshold, "_", cutoff_threshold, protein, '.RData'))

The number and identity of cells selected is exactly the same as that chosen without the inclusion of the cutoff parameter.

Not absolute monotonic anticorrelation between alpha and percentage

Hi Doc. Sun,
I have a simple test to find out why I always get 0 Scissor cell even though a very small alpha is chosen. Then the test revealed what is described in the title.

I think that was not mentioned in your tutorial and may influences some fresh researcher like myself on using Scissor to explore individual data. So I open this issue for further disscussion.

Best regards,
JHovelly

Error in normalize.quantiles(dataset0) : ERROR; return code from pthread_create() is 22

Hi,
I have download the dataset, and want to try the scssior tool with the test data. However when I follow the Scissor_Tutorial, using the comannd like this : infos1 <- Scissor(bulk_dataset, sc_dataset, phenotype, alpha = 0.05,
family = "cox", Save_file = 'Scissor_LUAD_survival.RData')

I have encounted the erro:
Error in normalize.quantiles(dataset0) : ERROR; return code from pthread_create() is 22

I appreciate your time.
Best regards,
Qing

Input data for microarray

Hi,

Thanks for developing this great tool!
I am wondering what kind of microarray data should be used as input for Scissor. Will RMA normalized data without log transformed be acceptable? Thanks in advance!

Best,
Leo Lam

how to process the RNA-seq data? log10(count+1) or log2(count+1) or TPM or FPKM?

when i use the rawcount of RNA-seq data, the result is bad. Should i use log10(count+1) or log2(count+1) or TPM or FPKM? Which one is better? Thank you!

Error in Scissor

When I run Scissor function

res <- Scissor(as.matrix(expr), tumor, phenotype, tag = tag, alpha = 0.5, 
                  family = "binomial", Save_file = "Scissor.RData")

error occurs

[1] "|**************************************************|"
[1] "Performing quality-check for the correlations"
[1] "The five-number summary of correlations:"
       0%       25%       50%       75%      100% 
0.3108529 0.5483579 0.5681650 0.5868047 0.6772004 
[1] "|**************************************************|"
[1] "Current phenotype contains 7 No and 7 Yes samples."
[1] "Perform logistic regression on the given phenotypes:"
Error in matrix(NA, nrow = nfolds, ncol = numi2) : 
  invalid 'ncol' value (too large or NA)
In addition: Warning messages:
1: In matrix(sapply(outi, function(x) { :
  data length [10] is not a sub-multiple or multiple of the number of rows [18023]
2: In matrix(sapply(outi, function(x) { :
  data length [10] is not a sub-multiple or multiple of the number of rows [18023]

What happen in it? How to fix it?

R Session Aborted when running Scissor

Hi Duanchen,
Thanks for developing this interesting tools to discover important cell populations according to bulk data and scRNA-seq data! I met some issues and cannot run my own datasets: when I run Scissor on my created object, the error of "Scissor" step is "Error in array(x, c(length(x), 1L), if (!is.null(names(x))) list(names(x), : 'data' must be of a vector type, was 'NULL'"; if I use "Seurat_preprocessing" function, the R studio will be aborted when running "Scissor" function. So could you help me debug these issues please? Thanks so much!

Error in asMethod(object) : Cholmod error 'out of memory' at file ../Core/cholmod_memory.c, line 146

Hello developers,
I got the following prompt when running the code:infos1 <- Scissor(bulkdata, scRNA, phenotype, alpha = 0.05,
family = "binomial", Save_file = 'Scissor_IPAH.RData').

Error in asMethod(object) :
Cholmod error 'out of memory' at file ../Core/cholmod_memory.c, line 146

How to sovle this problem?
Looking forward your reply

Thank you very much!

Error in normalize.quantiles(dataset0) :ERROR; return code from pthread_create() is 22

Hi,
Thank you so much for developing Scissor!
I prepaer to explore the Scissor when I read your paper.
I encounted an error in this step:
infos1 <- Scissor(bulk_dataset, sc_dataset, phenotype, alpha = 0.05, family = "cox", Save_file = 'Scissor_LUAD_survival.RData')
Error in normalize.quantiles(dataset0) : ERROR; return code from pthread_create() is 22
Any suggestions would be greatly appreciated! Thank you for your time!
Best regards,
skyfall !

Test data link does not work

Hi,

Thank you so much for developing Scissor!
I just read the paper and wanted to try the tool before testing in my own data.

I couldn't download the dataset for the test. I got this error:

Warning message in load(url(paste0(location, "scRNA-seq.RData"))):
“URL 'https://xialab.s3-us-west-2.amazonaws.com/Duanchen/Scissor_data/scRNA-seq.RData': Timeout of 60 seconds was reached”
Error in load(url(paste0(location, "scRNA-seq.RData"))): cannot read from connection
Traceback:

1. load(url(paste0(location, "scRNA-seq.RData")))

I appreciate your time.

Best regards,

Eijy

Crash happens when performing Scissor on Rstudio Server

Thank you for developing such a powerful tool.
I run the Scissor function after loading files needed. But several times the function stopped in the state which is shown in the snapshot. And it only returned the result used for next step regression. I also found that the rstudio server crashed when this function is run in the stage as shown in the snapshot. I guessed there might be something unmatched.

How to interpret what the worse survival is and what good survival is?

Hello Scissor,
Thanks for developing this amazing package.
In the 'Detecting a hypoxic subpopulation related to worse survival' section of the Results in Nature Biotechnology paper, you input the clinical survival informaton of 471 TCGA-LUAD samples and found that 201 Scissor+ cells were associated with worse survival.

We're wondering how you define what survival is worse survival.如何定义什么是差预后？
What is the boundary between worse survival and good survival, living longer than 5 years as good survival, or the survival range of the top 50% people with the longest survival as good survival?差预后和好预后的界限在哪里？是否生存期>5年的定义为好预后？或者选择生存期最好的50%人群，将其表达矩阵作为好预后的特征矩阵，然后进行Pearson检验和COX回归？
If I manually make a matrix that only includes TCGA samples with very bad survival (like live shorter than 3 months), does Scissor still generate the good survival-related cells?如果我人为构建一个生存期均小于3个月的TCGA样本矩阵，Scissor是否仍将在单细胞数据中找到‘好预后’细胞和‘差预后’细胞？

Thanks!
Best,
YJ

invalid 'ncol' value (too large or NA)

Hi,

I am trying to use Scissor with family "binomial" on a dataset of bulk RNAseq with 43 samples (11 controls and 32 cases) and publicly available scRNAseq with 5092 cells. I receive the following error message

[1] "||"
[1] "Performing quality-check for the correlations"
[1] "The five-number summary of correlations:"
0% 25% 50% 75% 100%
0.3543059 0.4483379 0.4650750 0.4824858 0.6122901
[1] "||"
[1] "Current phenotype contains 11 healthy and 32 asthmatics samples."
[1] "Perform logistic regression on the given phenotypes:"
Error in matrix(NA, nrow = nfolds, ncol = numi2) :
invalid 'ncol' value (too large or NA)
In addition: Warning messages:
1: In Scissor(mat, sc_dataset, as.matrix(pheno), tag = tag, alpha = 0.2, :
NAs introduced by coercion
2: In matrix(sapply(outi, function(x) { :
data length [10] is not a sub-multiple or multiple of the number of rows [5093]
3: In matrix(sapply(outi, function(x) { :
data length [10] is not a sub-multiple or multiple of the number of rows [5093]

Could you please help?

Thank you

Cholmod error 'problem too large' at file ../Core/cholmod_dense.c, line 102

This looks like a recurrent problem where sparse matrices cannot be handled.
Any suggestions?

About normalize bulk and sc data

Hi!
Thank you for the nice tools.
In scissor, we use normalize.quantiles to normalize bulk and sc data.
While I am wondering if we can normalize original bulk and sc count data.Thanks a lot!

Error: in normalize.quantiles(dataset0): vector types do not match in copyVector

I am having an issue using Scissor to annotate bulk RNA sequencing (of paired tumor & normal samples) with single-cell tumor sequencing.

Here is my code:

## import bulk RNA seq data
bulk <- as.matrix(read.csv("gene_count_matrix.csv"))
# convert the bulk dataset to matrix
bulk_dataset <- as.matrix(bulk_dataset)

## import bulk phenotype data
pheno <- read.table("Scissor_pheno.txt")

## import single-cell multiome seq data
sc <- Read10X_h5('C:/path/to/filtered_feature_bc_matrix.h5')
sc <- sc$`Gene Expression`
# perform seurat pre-processing 
sc_dataset <- Seurat_preprocessing(sc)

## run Scissor function 
phenotype <- pheno
tag <- c('Normal', 'Tumor')
info <- Scissor(bulk_dataset, sc_dataset, phenotype, tag=tag, alpha=0.5, family="binomial")

This results in the following error:

Error in normalize.quantiles(dataset0) : 
  vector types do not match in copyVector

I have tried several different things (including manual installation of preprocessCore, using BiocManager::install("preprocessCore", configure.args="--disable-threading", force = TRUE)
...but nothing has worked. I also had issues with the normalize.quantiles() before I coerced the bulk dataset into a matrix.

Can someone advise on if I've made an error, and/or how to mitigate this issue?

some questions about the function reliability.test()

I used the colorectal cancer sc_dataset and bulk data.I uesd the DFI infomation and cox regression.But when I used the function reliability.test(),the pvalue is na,and sometimes statistic is nan,I check the function,and find that due to sample distribution,I sample the same status all is 0.So I want to ask you how to solve this problem.I choose family = binomial or modify the function test_cox according to the annotation,annotating code index0,choose index1 and index2 as following.Scissor is a very important tool for me,thank you very much .I also sent email to you,sorry to bother you.
test_cox <- function(X, Y, network, alpha, cell_num, n = 100, nfold = 10){
set.seed(1)
m1 <- sum(Y[,2] == 1)
m2 <- sum(Y[,2] == 0)
#index0 <- sample(cut(seq(m1+m2), breaks = 10, labels = F))
index1 <- sample(cut(seq(m1), breaks = nfold, labels = F))
index2 <- sample(cut(seq(m2), breaks = nfold, labels = F))

print("|**************************************************|")
print("Perform cross-validation on X with true label")
c_index_test_real <- NULL
pb1 <- progress_bar$new(total = nfold)
for (j in 1:nfold){
    #c_index <- which(index0 == j)
    c_index <- c(which(Y[,2] == 1)[which(index1 == j)], which(Y[,2] == 0)[which(index2 == j)])######I changed here
    X_train <- X[-c_index,]
    Y_train <- Y[-c_index,]
    fit <- NULL
    while (is.null(fit$fit)){
        set.seed(123)
        fit <- APML1(X_train, Y_train, family = "cox", penalty = "Net", alpha = alpha, Omega = network, nlambda = 100)
    }
    index <- which.min(abs(fit$fit$nzero - cell_num))
    Coefs <- as.numeric(fit$Beta[,index])
    Cell1 <- Coefs[which(Coefs > 0)]
    Cell2 <- Coefs[which(Coefs < 0)]

    X_test <- X[c_index,]
    Y_test <- Y[c_index,]
    test_data <- data.frame(cbind(Y_test, X_test%*%Coefs))
    colnames(test_data) <- c("OS_time", "Status", "Prediction")
    res.cox <- coxph(Surv(OS_time, Status) ~ Prediction, data = test_data)
    c_index_test_real[j] <- concordance(res.cox)$concordance

    pb1$tick()
    Sys.sleep(1 / 100)
    if (j == nfold) cat("Finished!\n")
}

print("|**************************************************|")
print("Perform cross-validation on X with permutated label")
c_index_test_back <- list()
pb2 <- progress_bar$new(total = n)
for (i in 1:n){
    set.seed(i+100)
    c_index_test_back[[i]] <- matrix(0, nfold, 1, dimnames = list(paste0("Testing_", 1:nfold),  "Concordance"))
    Y2 <- Y[sample(nrow(Y)),]
    for (j in 1:nfold){
        #c_index <- which(index0 == j)
        c_index <- c(which(Y2[,2] == 1)[which(index1 == j)], which(Y2[,2] == 0)[which(index2 == j)])######I changed here
        X_train <- X[-c_index,]
        Y_train <- Y2[-c_index,]
        fit <- NULL
        while (is.null(fit$fit)){
            set.seed(123)
            fit <- APML1(X_train, Y_train, family = "cox", penalty = "Net", alpha = alpha, Omega = network, nlambda = 100)
        }
        index <- which.min(abs(fit$fit$nzero - cell_num))
        Coefs <- as.numeric(fit$Beta[,index])
        Cell1 <- Coefs[which(Coefs > 0)]
        Cell2 <- Coefs[which(Coefs < 0)]

        X_test <- X[c_index,]
        Y_test <- Y2[c_index,]
        test_data <- data.frame(cbind(Y_test, X_test%*%Coefs))
        colnames(test_data) <- c("OS_time", "Status", "Prediction")
        res.cox <- coxph(Surv(OS_time, Status) ~ Prediction, data = test_data)
        c_index_test_back[[i]][j] <- concordance(res.cox)$concordance
    }
    pb2$tick()
    Sys.sleep(1 / 100)
    if (i == n) cat("Finished!\n")
}
statistic  <- mean(c_index_test_real)
background <- NULL
for (i in 1:n){
    background[i] <- mean(c_index_test_back[[i]][,1])
}
p <- sum(background > statistic)/n

print(sprintf("Test statistic = %s", formatC(statistic, format = "f", digits = 3)))
print(sprintf("Reliability significance test p = %s", formatC(p, format = "f", digits = 3)))

return(list(statistic = statistic,
            p = p,
            c_index_test_real = c_index_test_real,
            c_index_test_back = c_index_test_back))

}

Cholmod error 'problem too large' at file ../Core/cholmod_dense.c, line 102

Hi,

Thank you for developing this awesome tool. Though the demo data went through, I encountered an error while running my data. Here is the code I used and the error message:

infos1 <- Scissor(bulk_dataset, sce, phenotype, alpha = 0.05, 
+                   family = "cox", Save_file = 'survival.RData')
Error in asMethod(object) : 
  Cholmod error 'problem too large' at file ../Core/cholmod_dense.c, line 102

Too many cells in the sce object might caused this error? I have 22432 features and 47284 cells in the sce object.
Can you solved for my, please?

Thank you again!
Best,
Jianxiang

Can I merge Scissor results after performing them by subset?

Hi,
Thank you for developing Scissor.

I ran about 130,000 cells and received a memory error message.
'Error: cannot allocate vector of size 132.0 Gb'

Then, I am thinking about running the cells in subsets and then combining the results instead of running all 130,000 cells at once.

Is it okay to combine the results after performing them by subset?
I'm worried that the results may differ from when I perform all the cells at once.

Thank you for your help.
Sincerely,
Ji-Hye Choi

fit0$lambda.min is NULL,causing 0 Scissor cells detected.

Thank you for your algorithm,
I always get 0 Scissor cells, no matter how I change the parameter. I found that in this part of code

fit0 <- APML1(X, Y, family = family, penalty = "Net", 
              alpha = alpha[i], Omega = network, nlambda = 100, 
              nfolds = min(10, nrow(X)))
fit1 <- APML1(X, Y, family = family, penalty = "Net", 
              alpha = alpha[i], Omega = network, lambda = fit0$lambda.min)

fit0$lambda.min get NULL, so I changed it to min(fit0$fit$lambda) and successfully solved this problem.

All the best,
J Hovelly

Error in reliability.test

Hi there,

When I ran the reliability.test in my own data, I encountered an error as below:

Error in roc.default(Y_test, score_test, direction = "<", quiet = T): 'response' must have two levels
Traceback:

1. reliability.test(X, Y, network, alpha = 0.05, family = "binomial", 
 .     cell_num = numbers, n = 10, nfold = 10)
2. test_logit(X, Y, network, alpha, cell_num, n, nfold)
3. roc(Y_test, score_test, direction = "<", quiet = T)
4. roc.default(Y_test, score_test, direction = "<", quiet = T)
5. stop("'response' must have two levels")

May I ask your help for resolving the issue?

Thanks,
Logan

how to use Scissor to deal with those scRNA data without bulk data

my scRNA is not tumor data,there is no bulk data, I see that your demo that need scRNA data, bulk data and phenotype , so what should i do if i want to cite scissor?

Can I use metacells data to estimate Scissor cells?

Thank you for your excellent algorithm,
Can I use metacell data, which means a metacell is a sum of dozens of single cells, as the imput. In my opinion, metacell data can improve the correlation between single cells data and bulk data because dropout effect can be decreased. It is appreciated if you could answer my question!
Best wishes!
J Hovelly.

Scanpy-proceeded data cannot be run on Scissor

Hello Scissor,
My scRNA-seq workflow is based on Scanpy. After Scanpy proceeded with my data, I saved it into h5ad file. Then, I use R Markdown to convert my hd5ad file into a Seurat object. Then run Scissor on it. However, I got Error in array(x, c(length(x), 1L), if (!is.null(names(x))) list(names(x), : 'data' must be of a vector type, was 'NULL'.
Could you please help me with this issue? We'll cite your paper and make the acknowledgment when we publish our data. I've shared the h5ad file and its converted rds file with your gmail.
Thanks!
Best,
YJ

attach R Markdown code

# python code in R Markdown
import numpy as np
import pandas as pd
import scanpy as sc
ACT_sub2 = sc.read('C:/Users/Park_Lab/Documents/ACT_sub2.h5ad')
sc.pl.umap(ACT_sub2, color='leiden', legend_loc='right margin', title='', frameon=False, use_raw=False)

# all below is R code in R Markdown
library(reticulate)
Sys.setenv(LANG = "en_US")
library(Seurat)
library(ggplot2)
library(sctransform)
library(dplyr)
library(patchwork)
library(DT)
memory.limit(512000)
options(future.globals.maxSize = 256000*1024^2)

# Get the expression matrix
exprs <- t(py$ACT_sub2$X)
colnames(exprs) <- py$ACT_sub2$obs_names$to_list()
rownames(exprs) <- py$ACT_sub2$var_names$to_list()
# Create the Seurat object
seurat <- CreateSeuratObject(exprs)
# Set the expression assay
seurat <- SetAssayData(seurat, "data", exprs)
# Add observation metadata
seurat <- AddMetaData(seurat, py$ACT_sub2$obs)
# Add fetaure metadata
seurat[["RNA"]][["n_cells"]] <- py$ACT_sub2$var["n_cells"]
# Add embedding
embedding <- py$ACT_sub2$obsm["X_umap"]
rownames(embedding) <- py$ACT_sub2$obs_names$to_list()
colnames(embedding) <- c("umap_1", "umap_2")
seurat[["umap"]] <- CreateDimReducObject(embedding, key = "umap_")
seurat
ACT_sub2 <- SetIdent(seurat, value = "leiden")
DimPlot(ACT_sub2, reduction = "umap")

# run Scissor
phenotype <- TCGA_phenotype_input
tag <- c('0', '1')    # name the phenotype 0 (uninterested) first, then phenotype 1 (interested)
GeneMatrix_MGI_input  <- as.matrix(GeneMatrix_MGI)    # conver gene martix (data.frame) to 'matrix'
infos <- Scissor(bulk_dataset=GeneMatrix_MGI_input, sc_dataset=ACT_sub2, phenotype=phenotype, tag = tag, alpha = 0.3, family = "binomial", Save_file = "ACT_sub2_Scissor.RData")    # alpha should be tested to make Scissor selected cells<20%

Some problem about the input file for scissor with mutation information

Hello developers,
I have some problem when i use scissor.I try to use scissor to choose cancer cell related to EGFR mutation in LUAD samples.I choose random several samples as a whole rds as input file and i have a result. Meanwhile, i use these samples dependent rds and have others result. Its confusing that the two result are quite different and scissor choose more than 50 positive and negative cells in each dependent rds.

Which type of rds should i use as the input file for scissor?
Looking forward your reply

Thank you very much!

Error: cannot allocate vector of size 24.6 Gb

I got a error when running Scissor function:
Error: cannot allocate vector of size 24.6 Gb
But the memory size of my computer is 64 Gb.
I am using R 4.2.3

Can anyone help to figure out what happened there and any solutions to fix it?

Error with some gaussian regression phenotypes

Hello, I'm testing Scissor with several continuous dependent variables.

info <- Scissor(bulk_dataset = bulk_counts,
sc_dataset = seur_dataset,
phenotype = phenotype,
alpha = NULL,
cutoff = 0.2,
tag = phenotype,
family = "gaussian",
Save_file = 'info.RData') `

My phenotypes are named numeric vectors as follows, a (successful) example:
A_07_L01_01 A_07_L01_03 A_07_L01_04 B_01_L01_04 B_01_L01_05 B_02_L01_08 B_02_L01_09 B_02_L01_10 C_06_L01_01
13.00 12.00 17.20 12.40 6.57 12.80 8.02 23.50 13.10
C_06_L01_02 C_07_L01_04 C_07_L01_06 B_01_L01_01 B_01_L01_02 B_01_L01_03 B_02_L01_06 B_02_L01_07 C_06_L01_03
13.60 8.29 6.29 14.40 16.00 19.30 8.82 11.20 15.90
A_03_L01_01 A_03_L01_02 A_03_L01_03 A_03_L01_04 A_08_L01_05 A_08_L01_06 A_08_L01_07
15.60 12.50 17.50 26.40 25.40 17.60 30.70

Most of the time, the above Scissor command runs successfully. Occasionally, it does not and I receive the following error:
[1] "||"
[1] "Performing quality-check for the correlations"
[1] "The five-number summary of correlations:"
0% 25% 50% 75% 100%
0.01839713 0.26478574 0.32455696 0.37402355 0.52673473
[1] "||"
Error in Scissor(bulk_dataset = bulk_counts, sc_dataset = seur_dataset, :
The length differs between tags and phenotypes. Please check Scissor inputs and selected regression type.

For context, the below phenotype led to an error:
A_07_L01_01 A_07_L01_03 A_07_L01_04 B_01_L01_04 B_01_L01_05 B_02_L01_08 B_02_L01_09 B_02_L01_10 C_06_L01_01
10.20 11.40 10.10 7.02 3.42 6.34 8.67 13.30 4.98
C_06_L01_02 C_07_L01_04 C_07_L01_06 B_01_L01_01 B_01_L01_02 B_01_L01_03 B_02_L01_06 B_02_L01_07 C_06_L01_03
6.05 4.61 5.52 8.41 15.60 13.30 6.44 8.73 7.41
A_03_L01_01 A_03_L01_02 A_03_L01_03 A_03_L01_04 A_08_L01_05 A_08_L01_06 A_08_L01_07
9.02 8.87 14.10 9.90 9.79 10.30 11.90

Thanks for your advice.

0 selected cells

Hello,

I've been trying to run scissor with a TCGA dataset and while the survival analyses work fine, when I try to use the binomial family to assess a phenotype it selects 0 cells. I tried it with a couple different phenotypes of interest and still kept getting 0%.

I'm wondering if there's another way to analyze the data (the tutorial didn't really go over using Gaussian analyses so I'm unsure how to proceed) or if the issue is the phenotype data itself that is just not statistically powerful enough (maybe I need more samples).

my phenotype of interest has 126 + and 27 - (153 TCGA samples total) for a sc data set of 7500 cells. scissor analysis with survival using family = cox gets me about 21% cells selected (alpha = 0.05). But when I input the phenotype data using family = binomial I always get 0% no matter the alpha.

Let me know if you need any more info/code from me (I'm fairly new to bioinformatics)

Thanks!

Shall the alpha value be the same between Scissor() and reliability.test() functions?

Hello Scissor,
I'm wondering whether the alpha value should be the same between the below 2 functions. According to your tutorial, we need to optimize the best alpha in Scissor(). Do we need to change it accordingly in reliability.test()?
Scissor(bulk_dataset, sc_dataset, phenotype, tag = tag, alpha = 0.3, family = "binomial")
reliability.test(X, Y, network, alpha = 0.05, family = "binomial", cell_num = numbers, n = 100, nfold = 10)
Thanks!
Best,
YJ

ERROR RUN Scissor

When I run in our own data, it occurs error!!!!!!

I want to know how to solve this error

How to determine the best percentage of Scissor selected cells?

Hello @sunduanchen
We performed several rounds of Scissor analysis, but we still don't know what is the best rule for determining the best percentage of Scissor selected cells?
For example, we set alpha=0.2 and got The percentage of selected cell is: 4.299%. Then set alpha=0.15 and got The percentage of selected cell is: 8.929%. Both alpha=0.2 and alpha=0.15 passed the reliability significance test (we didn't do Cell level evaluations). So, which alpha value shall we use? We cannot decide because both of them are right.
Any suggestions will be appreciated!
Thanks!
Best,
YJ

What the mean of "X","Y", and "network" variables in reliability.test function

Hi!
Thanks for the great tools!
When I read the Scissor Tutorial of section "Reliability significance test", I was confused about "result1 <- reliability.test(X, Y, network, alpha = 0.05, family = "cox", cell_num = numbers, n = 10, nfold = 10)".
Where can I get the value of "X","Y", and "network"? I only know "numbers" are from "infos1" variable.
Best,
Faming

Error in Scissor: There is no common genes between the given single-cell and bulk samples

Hi, Scissor is a really useful and important tool of single cell data analysis, I really believe our team can found something new using your gorgeous R package.

I can run your tutorial data successfully, however, I got a error when using my data:

infos4 <- Scissor(bulkdata, ScData, phenotype, tag = tag, alpha = 0.5,
family = "binomial", Save_file = "Scissor_disease.RData")

Error in Scissor(bulkdata, ScData, phenotype, tag = tag, alpha = 0.5, :

There is no common genes between the given single-cell and bulk samples.

Can you help to figure out what happened there? Thanks a lot!

Some ideas about use our own seurat object

Thanks for your wonderful packages. However, it seems unable to used at an integerated seurat object. After read the presubmited issues, I had know that the key parameters were 'data' and ‘RNA_snn”；
So，Could I change the "CCA_snn" (aquired by an integrated Seurat object) into 'RNA_snn' and use the RNA assy aquire 'data' slot?
Wishs for your reply. Thanks！

Tag for continuous phenotype

Hi!
Thank you for the nice tools.
I was wondering what should the tag be like if I use Scissor for a continuous phenotype, like tumor size in the follow image?

I use:
Z = table(pheno$TumorSizeMax)
tag = names(Z)
The code works, but I get neither Scissor positive nor negative cells.
Is the tag correct for linear regression?
Thanks very much!

There is a problem in the running data

Hello, I have the following problem when dealing with my data:

First of all, when I use all the data, my single cell data is 150,000, and I get the following error, I want to know if Scissor has a requirement on the number of cells. I see your proposed solution but it doesn't work for me.

Cholmod error 'problem too large' at file .. /Core/cholmod_dense.c, line 102

While I was pulling out some cells for testing, I encountered the following problem while running the scissor step

infos4 <- Scissor(bulk_dataset, sc_dataset, phenotype, tag = tag, alpha = 0.5,
family = "binomial",
Save_file = "Scissor_LUAD_TP53_mutation.RData")

Error in array(x, c(length(x), 1L), if (! is.null(names(x))) list(names(x), :
'data' must be of a vector type, was 'NULL'

I don't know if it's because my data is integrated. I want to try to run

sc_dataset <-Seurat_preprocessing(sc_dataset, verbose = F)

I still get errors

Error in as.vector(data) :
no method for coercing this S4 class to a vector

There is another question I would like to consult. If I only use disease grouping as the phenotype condition, can I do not need BULK data for implementation

The length differs between tags and phenotypes. Please check Scissor inputs and selected regression type.

I was tring using Scissor with gaussian family, my input phenotype is a numeric vector(length is 176), As I am concerned, the tag may be should be a name (whose length should be 1). And it gives the length error as the title suggests.

I turned to the source code found that

    if (family == "gaussian") {
      Y <- as.numeric(phenotype)
      z <- table(Y)
      if (length(z) != length(tag)) {
        stop("The length differs between tags and phenotypes. Please check Scissor inputs and selected regression type.")
      }
      else {
        tmp <- paste(z, tag)
        print(paste0("Current phenotype contains ", 
          paste(tmp[1:(length(z) - 1)], collapse = ", "), 
          ", and ", tmp[length(z)], " samples."))
        print("Perform linear regression on the given phenotypes:")
      }
    }

I am confused by the applying the table function for phenotype. It does make sense in logistic regression. However, I am not sure why use it in gaussian kernel.

Really appreciate any explaination

log2TPM or TPM?

Hi,

Thank you for developing this great tool.

Do you recommend log2TPM or TPM for bulk RNA seq data? I think this results in different phenotype association.

Thanks!

The percentage of selected cell is: 0.000%

Hello Scissor,
I'm running Scissor, but no matter what alpha I used, it always came out 0 selected cells.
Could you please help me with this issue?
Thanks!
Best,
Yuanjian

infos <- Scissor_change(bulk_dataset=GeneMatrix_HGNC_input, sc_dataset=Merge2.combined, phenotype=phenotype, tag = tag, alpha = seq(1,5,1)/10000000, cutoff = 0.01, family = "binomial", Save_file = "Merge2_Scissor.RData")
[1] "|**************************************************|"
[1] "Performing quality-check for the correlations"
[1] "The five-number summary of correlations:"
         0%         25%         50%         75%        100% 
-0.04986036  0.14333164  0.23372394  0.35075314  0.79686469 
[1] "|**************************************************|"
[1] "Current phenotype contains 48 Wild type and 17 Mutated samples."
[1] "Perform logistic regression on the given phenotypes:"
[1] "alpha = 1e-07"
[1] "Scissor identified 0 Scissor+ cells and 0 Scissor- cells."
[1] "The percentage of selected cell is: 0.000%"

Error in normalize.quantiles(dataset0) : ERROR; return code from pthread_create() is 22

Got a error when running your tutorial using your data:
Error in normalize.quantiles(dataset0) : ERROR; return code from pthread_create() is 22
I am using R 4.1.1

Can you help to figure out what happened there?

Seurat_preprocessing error

Hi, I've been trying to run Seurat_preprocessing on my own sc dataset but it keeps giving me this error:

sc_dataset <- Seurat_preprocessing(sc_dataset, verbose = F)
Warning in storage.mode(from) <- "double" : NAs introduced by coercion
restarting interrupted promise evaluation Error in $<-.data.frame(*tmp*, "variance.expected", value = 0) :
replacement has 1 row, data has 0

I'm fairly new to bioinformatics so any help would be greatly appreciated!

The scissor package cannot be installed

Hi
I followed the instructions to install the scissor package and reported an error
Warning message:
In i.p(...) :
installation of package ‘C:/Users/18435/AppData/Local/Temp/RtmpULhMMS/file3a305c6965d3/Scissor_2.0.0.tar.gz’ had non-zero exit status

can you help me resolve the problem?

Can Scissor used for multiple subgroup ?

Thank you very much for your elegant tool!
I have a multiple subgroup and want to use Scissor for identifing different groups related cells, how should I use this tool?

Thank you again!

Error in normalize.quantiles(dataset0) : vector types do not match in copyVector

Hi,
Thanks for developping this interesting tool.
It seems that the error:
Error in normalize.quantiles(dataset0) : vector types do not match in copyVector
is quite common.
I have tried :
BiocManager::install("preprocessCore", configure.args="--disable-threading", force = TRUE)
But it is not working...
Could you please give some help on ow to debug it ?
Many thanks.
Jaime.

Scissor installation error

Hi, I tried to install Scissor on windows and linux R platforms, but ran into the following installation problems:

#Linux

install.packages("sunduanchen-Scissor-311560a.tar.gz",repos=NULL,type="source",dependencies=T)
Installing package into '/root/R/x86_64-pc-linux-gnu-library/4.1'
(as 'lib' is unspecified)
Warning in untar2(tarfile, files, list, exdir, restore_times) :
skipping pax global extended headers

installing source package 'Scissor' ...
** using staged installation
** libs
make: Nothing to be done for 'all'.
installing to /root/R/x86_64-pc-linux-gnu-library/4.1/00LOCK-sunduanchen-Scissor-311560a/00new/Scissor/libs
** R
** inst
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
** building package indices
** installing vignettes
** testing if installed package can be loaded from temporary location
Error: package or namespace load failed for 'Scissor' in dyn.load(file, DLLpath = DLLpath, ...):
unable to load shared object '/root/R/x86_64-pc-linux-gnu-library/4.1/00LOCK-sunduanchen-Scissor-311560a/00new/Scissor/libs/Scissor.so':
/root/R/x86_64-pc-linux-gnu-library/4.1/00LOCK-sunduanchen-Scissor-311560a/00new/Scissor/libs/Scissor.so: invalid ELF header
Error: loading failed
Execution halted
ERROR: loading failed
removing '/root/R/x86_64-pc-linux-gnu-library/4.1/Scissor'
Warning message:
In install.packages("sunduanchen-Scissor-311560a.tar.gz", repos = NULL, :
installation of package 'sunduanchen-Scissor-311560a.tar.gz' had non-zero exit status

#Windows

devtools::install_github('sunduanchen/Scissor')
Downloading GitHub repo sunduanchen/Scissor@HEAD
Error in utils::download.file(url, path, method = method, quiet = quiet, :
download from 'https://api.github.com/repos/sunduanchen/Scissor/tarball/HEAD' failed
install.packages("sunduanchen-Scissor-311560a.tar.gz",repos=NULL,type="source",dependencies=T)
Warning in untar2(tarfile, files, list, exdir, restore_times) :
skipping pax global extended headers

installing source package 'Scissor' ...
** using staged installation
** libs

*** arch - i386
C:\rtools40\mingw32\bin\nm.exe: RcppExports.o: file format not recognized
C:\rtools40\mingw32\bin\nm.exe: Scissor.o: file format not recognized
C:/rtools40/mingw32/bin/g++ -std=gnu++11 -shared -s -static-libgcc -o Scissor.dll tmp.def RcppExports.o Scissor.o -LC:/PROGRA~~1/R/R-41~~1.3/bin/i386 -lR
RcppExports.o: file not recognized: file format not recognized
collect2.exe: error: ld returned 1 exit status
no DLL was created
ERROR: compilation failed for package 'Scissor'

removing 'C:/Program Files/R/R-4.1.3/library/Scissor'
[0x7FFD9E9BE824] ANOMALY: use of REX.w is meaningless (default operand size is 64)
[0x7FFD9E9BE824] ANOMALY: use of REX.w is meaningless (default operand size is 64)
[0x7FFD9E9BE824] ANOMALY: use of REX.w is meaningless (default operand size is 64)
Warning message:
In install.packages("sunduanchen-Scissor-311560a.tar.gz", repos = NULL, :
installation of package ‘sunduanchen-Scissor-311560a.tar.gz’ had non-zero exit status

Can you help solve this problem? Thanks!

Error for big dataset

Hi!
Thank you for the nice tools.
I was using scissor for a big dataset, it seems the "as.matrix" in "network <- as.matrix(sc_dataset@graphs$RNA_snn)" didn't suit for a big sparse matrix, so I changed the ‘as.matrix’ to suit for big dataset, the matrix I got is the same as using "as.matrix" when the data is not too large.
The Omega seems to be right, However, after the 'W=OmegaC(Omega, sgn1); W$loc=W$loc+1', the W seems to be all zeros. In fact I was not familiar with Rcpp, so I was wondering whether there is something like 'as.matrix' in 'OmegaC' which is not suitable for big dataset?
Can you help to figure out what happend there, Thanks a lot!

Run error in our own Seurat object

When I run in our own Seurat object, it occurs error

infos1 <- Scissor(bulk_dataset, tumor, phenotype, alpha = 0.05, 
                  family = "cox", Save_file = './processed/Scissor_survival.RData')
Error in array(x, c(length(x), 1L), if (!is.null(names(x))) list(names(x),  : 
  'data' must be of a vector type, was 'NULL'

But when we use Seurat_preprocessing before, it run success

tumor<- Seurat_preprocessing(seurat@assays$SCT@counts, verbose = F)
infos1 <- Scissor(bulk_dataset, tumor, phenotype, alpha = 0.05, 
                  family = "cox", Save_file = './processed/Scissor_survival.RData')

Is there any bug in applying Scissor function to our own seurat object?

limma cameraPR

Hi guy, how to use CAMERA (function name: cameraPR) in the limma R package (v3.42.2) in the Pathway enrichment analysis?

sunduanchen / scissor Goto Github PK

scissor's People

Contributors

Stargazers

Watchers

Forkers

scissor's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs