cozygene / bisque Goto Github PK

View Code? Open in Web Editor NEW

68.0 68.0 20.0 121 KB

An R toolkit for estimation of cell composition from bulk expression data

R 100.00%

bisque's People

Contributors

Stargazers

Watchers

Forkers

sqsun marcalva flywind2 rujiadai chenpeizhan hzongyao grst mil2041 interlinkedtherapeutics pelzko genomicsnx jiahuaqu feeeengym metamaden helianthuszhu yuling-zhao nbahti

bisque's Issues

bisque in conda-forge / bioconda

Hi,

Would it be possible for you to deposit bisque in conda-forge / bioconda ?

Best,

Francisco

Input format

Dear authors,

I am new to scRNAseq analysis. I read the Bisque paper and want to learn this powerful tool. I have bulk RNAseq data and collected published scRNA seq data of my cells of interest. However, I am stuck at the very beginning about the input format requirement. Do you mind addressing some data for practice in the repository? For example, what does the input matrix of bulk RNAseq counts look like, how did you get character vectors of cell type labels and individual labels?

I understand this could be extra work for you. I hope this can be added and more beginners like me will benefit.

Thanks for your help in advance!

Best,
Yuling

CountsToCPM after extracting markers

Thanks for the new approach! In ReferenceBasedDecomposition, if markers are provided, only sc and bulk data of those markers are taken to CountsToCPM. Does it make sense to calculate CPM for (say hundreds of) marker genes only? Or it makes more sense to calculate CPM for all genes and then take the subset of markers?

bisque/R/reference_based.R

Lines 298 to 315 in ef5bae0

 if (base::is.null(markers)) { 

 markers <- Biobase::featureNames(sc.eset) 

 } 

 else { 

 markers <- base::unique(base::unlist(markers)) 

 } 

 genes <- GetOverlappingGenes(sc.eset, bulk.eset, markers, verbose) 

 sc.eset <- 

 Biobase::ExpressionSet(assayData=Biobase::exprs(sc.eset)[genes,], 

 phenoData=sc.eset@phenoData) 

 bulk.eset <- 

 Biobase::ExpressionSet(assayData=Biobase::exprs(bulk.eset)[genes,], 

 phenoData=bulk.eset@phenoData) 

 if (verbose) { 

 base::message("Converting single-cell counts to CPM and ", 

 "filtering zero variance genes.") 

 } 

 sc.eset <- CountsToCPM(sc.eset)

Include covariates for residual CPM

Hi all, thanks for developing this wonderful tool--it is SO much faster than existing methods and very intuitive!

I have a feature request: is it possible to add optional covariate parameters to the primary inference functions? The idea is to use these features to residualize the CPMs in the bulk data, before matching with reference panel data (or within the reference-free approach).

Off hand, one approach would be to estimate logCPMs, regress out covariates, then take exp of the residuals to place values back into CPM-space.

thanks!

SeuratToExpressionSet is not up to date with latest Seurat version

Hello,

I just noticed that your function SeuratToExpressionSet() assumes by default that the Seurat object is of version 2, and that version 3 can be specified by the user.

bisque/R/utils.R

Lines 47 to 48 in f8ca469

 SeuratToExpressionSet <- function(seurat.object, delimiter, position, 

 version = c("v2", "v3")) {

Seurat recently released their version 4, so it might be useful to update this function.

Looking at this part of your code,

bisque/R/utils.R

Lines 55 to 65 in f8ca469

 if (version == "v2") { 

 get.cell.names <- function(obj) obj@cell.names 

 get.ident <- function(obj) obj@ident 

 get.raw.data <- function(obj) obj@raw.data 

 } 

 else if (version == "v3") { 

 get.cell.names <- function(obj) base::colnames(obj) 

 get.ident <- function(obj) Seurat::Idents(object=obj) 

 get.raw.data <- function(obj) Seurat::GetAssayData(object = obj, 

 slot = "counts") 

 }

I guess it should be sufficient to modify line 60 by:

else if (version %in% c("v3", "v4")) {

and line 48 by:

                                  version = c("v4", "v3", "v2")) {

with the versions in this order, as we can expect that most recent users will work with v4 or v3 rather than v2.

By the way, the documentation might also need to be updated accordingly here

bisque/R/utils.R

Lines 13 to 14 in f8ca469

 #' @param seurat.object Seurat object with attributes \emph{raw.data}, 

 #' \emph{ident}, and \emph{cell.names}

as the current Seurat objects don't have those slots anymore.

Hope this can help!

Counts input for bulk RNAseq dataset

Hi,

Thanks for this super useful R package! Just wanted to clarify what input the bulk RNAseq data needs to be in before running BISQUE against a single cell reference profile? Do we pass the raw count matrix onto BISQUE or log/VST-transformed data?

Thanks for the help!

Error in out[[t]]$coefs : $ operator is invalid for atomic vectors

I have a problem when using the software：

Running CIBERSORT ... Error in out[[t]]$coefs : $ operator is invalid for atomic vectors
In addition: Warning message:
In parallel::mclapply(1:svn_itor, res, mc.cores = svn_itor) :
all scheduled cores encountered errors in user code。

I tried to figure it out, and I find that the model generated in CIBERSORT.R is empty.The function is as follows：

res <- function(i){
if(i==1){nus <- 0.25}
if(i==2){nus <- 0.5}
if(i==3){nus <- 0.75}
model<-e1071::svm(X,y,type="nu-regression",kernel="linear",nu=nus,scale=F)
model
}

Error in predict.svm(ret, xhold, decision.values = TRUE) :
Model is empty!
As there are two ariables：X， y, and I found that y is a vector of length 133 and all values are 0。Can you give me any advice to solve it?

Deconvolution for a specific part of bulk tissue RNAseq using scRNAseq data?

Hello!
I want to use BisqueRNA for deconvolution of a specific part of bulk tissue.
The bulk tissue has 4 part biologically: part A, B, C, D, while their cell type composition is totally different. And only part A were measure in bulk RNA-seq. However, the whole tissue (with part A, B, C, D all included) is used in sc RNA-seq. And I am trying to predict the cell type composition in part A afterwards.
I wonder if I am doing right, and if the composition of a part of tissue will affect the deconvolution performance in BisqueRNA.
thanks in advance for your help!

SeuratToExpressionSet error with seurat v5

Hi,

Thank you for the great tool. I noticed that the SeuratToExpressionSet doesn't seem to work with seurat v5 objects. I was wondering if there is a work around or fix for this? Thank you!!

sc.eset <- BisqueRNA::SeuratToExpressionSet(ref_obj, delimiter="-", position=2, version="v3")

Split sample names by "-" and checked position 2. Found 5 individuals.
Example: "AAACCTGAGGGTCTCC-1" corresponds to individual "1".
Error in GetAssayData():
! GetAssayData doesn't work for multiple layers in v5 assay.

leave-one-out cross validation

Hello,

I wonder if you could provide the codes you used for leave-one-out cross validation in your simulations.

Thanks.

sim.data average proportions taking into account variation between replicates

Hello there!
I am using bisque with pretty great success, our correlation values look awesome!!
However I have noticed something that I can't seem to figure out a workaround for.
Our single cell dataset has 4 replicates included in it, each with naturally varying proportions of each cell type. However, in order to generate the simulation data to obtain an r2 correlation value between actual and predicted proportions, the input requires a single proportion number for each cell type rather than taking into account the already occurring variation in proportion across replicates.
I have not seen a way to incorporate a list of proportions across replicates into this simulation data, but maybe I'm missing something!
Thank you,
Emma

Bulk already in CPM

Hi!

Great package! I would like to run Bisque with some public bulk RNA-seq datasets that are, unfortunately, only available as CPM. Is there any way I can feed it the CPM normalized data and skip the CPM normalization performed by Bisque?

Thanks for your time and help,
Inés

how to prepare single cell input from multiple subjects as an input?

Hi there,
I am interested to use bisque for single-cell deconvolution on bulk RNA sequenced individuals.

The tutorial/methods require to prepare single-cell input where multiple subjects' data are present in it. I have .Rds file per subject for the single-cell.
How do I prepare input object from these per-subject single-cell raw counts?

best,

Vignettes and also general questions

Hello and thank you for this wonderful package.

I was attempting to look at the vignette after installing Bisque RNA successfully, and when running browseVignettes("BisqueRNA")
I got No vignettes found by browseVignettes("BisqueRNA")
Would you happen to know what might be causing this?

Additionally, I was wondering if it would be hypothetically possible to extract cell proportions of bulk samples from single cell reference data of the different cell types. Say for example I had a bulk sample of some combination of four types of cells, could I extract the proportions using the expected RNAseq results of a single cell reference taken as an average from bulk studies? In this case, there would only be one sample for the single cell, and one bulk sample to deconvolute? From crudely testing, it seems the issue is at least two subjects are needed, and I'm not sure if there is a minimum number of single cell data required for each type.

Thanks!

Use FPKM or CPM value of bulk RNA seq as input matrix

Thanks for this amazing tool !!
i have one question regarding this tool, is it possible to use FPKM value of bulk RNA seq as an input matrix instead of using gene count matrix as paper suggested?

Thanks.

Bugie

Weird results ?

Hello !

I got very interested in your method after reading your publication. I was before relying on MuSiC to accomplish the same task. From the publication it seems that you can estimate cell type proportion with better accuracy.

So I decided to test it in a very simple way : I have a scRNA-seq with 16 subjects. I used 8 subjects as a single-cell reference, and the 8 others to construct pseudo bulks (simply by summing up counts in each cells for each gene). I used music and bisque to find the known cell type proportions in these 8 pseudo bulks. What happened is that I am always getting higher correlation and lower error with music. Do you think I might be doing something wrong that prevents me to use the full capacity of BisqueRNA ?

I lauched the following command :

BisqueRNA::ReferenceBasedDecomposition(pbulk.sub, sc_aml, use.overlap=F)

So the sc_aml is an ExpressionSet with raw counts at the single-cell level and pbulk.sub is for the pseudo-bulks I generated by summing up counts. I prefer to test it for the case where we don't have the paired single-cell / bulk, as this looks more similar to the real cases I have.

For exemple, do you think that using marker genes might improve the results ? Any leads ?

Bulk deconvolution using public scRNA seq data?

Hi guys,

Kudos for developing this package. I'm really interested to test it on our data. I've had a couple of gos and I'm experiencing some issues though.
I'm looking to determine the cell fractions from PBMCs by supplying a publicly a available single cell RNA seq dataset as reference (e.g. the pbmc_3k dataset from the Seurat tutorial), where we have clustered the cells and determined their putative cell types. What I've noticed is that I get nearly identical bulk.props for all the samples, which doesn't seem right.
I've tried it on one of our datasets as well as a publicly available one (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE115259) and I get nearly exactly the same bulk.prop values across the board.

library(tidyverse)
library(Seurat)
library(Biobase)
library(SingleCellExperiment)
library(BisqueRNA)
matrix_to_expressionSet <- function(mat, ...){
  if(!is.matrix(mat)) warning(deparse(substitute(mat)), " is not a matrix") else {
    featureData <- rownames(mat)
    featureData <- as(as.data.frame(featureData), "AnnotatedDataFrame")
    rownames(featureData) <- rownames(mat)
    phenoData <- colnames(mat)
    phenoData <- as(as.data.frame(phenoData), "AnnotatedDataFrame")
    rownames(phenoData) <- colnames(mat)
    ExpressionSet(assayData = mat, phenoData = phenoData, featureData = featureData) }}
# quick run-through of pbmc scRNA seq dataset
pbmc.data <- Read10X(data.dir = "pbmc3k/filtered_gene_bc_matrices/hg19/")
pbmc <- CreateSeuratObject(counts = pbmc.data, project = "pbmc3k", min.cells = 3, min.features = 200)
pbmc[["percent.mt"]] <- PercentageFeatureSet(pbmc, pattern = "^MT-")
pbmc <- subset(pbmc, subset = nFeature_RNA > 200 & nFeature_RNA < 2500 & percent.mt < 5)
pbmc <- NormalizeData(pbmc, normalization.method = "LogNormalize", scale.factor = 10000)
pbmc <- FindVariableFeatures(pbmc, selection.method = "vst", nfeatures = 2000)
all.genes <- rownames(pbmc)
pbmc <- ScaleData(pbmc, features = all.genes)
pbmc <- RunPCA(pbmc, features = VariableFeatures(object = pbmc))
pbmc <- JackStraw(pbmc, num.replicate = 100)
pbmc <- ScoreJackStraw(pbmc, dims = 1:20)
pbmc <- FindNeighbors(pbmc, dims = 1:10)
pbmc <- FindClusters(pbmc, resolution = 0.5)
new.cluster.ids <- c("Naive CD4 T", "Memory CD4 T", "CD14+ Mono", "B", "CD8 T", "FCGR3A+ Mono", 
                     "NK", "DC", "Platelet")
names(new.cluster.ids) <- levels(pbmc)
pbmc <- RenameIdents(pbmc, new.cluster.ids)
# Export `Seurat` object to `ExpressionSet`
pbmc_sce <- Seurat::as.SingleCellExperiment(pbmc)
counts(pbmc_sce) <- as.matrix(counts(pbmc_sce))
logcounts(pbmc_sce) <- as.matrix(logcounts(pbmc_sce))
single.cell.expression.set <- as(object = as(object = pbmc_sce, Class = "SummarizedExperiment"), Class = "ExpressionSet")
rownames(single.cell.expression.set) <- pbmc@assays$RNA@data %>% rownames
single.cell.expression.set$cellType <- single.cell.expression.set$ident
single.cell.expression.set$SubjectName <- "Boris" #since we're clowning around :)
# Get bulk data: 
GSE115259_raw <- list.files("~/files/GSE115259/", full.names = TRUE) %>% 
  map(.f = ~read_tsv(.x)) %>% 
  set_names(list.files("~/files/GSE115259/") %>% str_remove(pattern = "_gtf_annotated_genes.results.txt.gz"))
GSE115259_eset <- map(.x = seq_along(GSE115259_raw), .f = ~GSE115259_raw[[.x]] %>% dplyr::mutate(!!quo_name(names(GSE115259_raw[.x])) := GSE115259_raw[[.x]]$expected_count) %>% dplyr::select(annotation.gene_id, starts_with("GSM"))) %>% 
  purrr::reduce(full_join) %>% 
  column_to_rownames(var = "annotation.gene_id") %>% 
  as.matrix %>% 
  matrix_to_expressionSet
# Change ensembl id rownames to symbols
tmp_tbl <- rownames(GSE115259_eset) %>% 
  enframe(name = NULL, value = "ensgene") %>%
  dplyr::mutate(in_anno = 1:n()) %>% 
  left_join(annotables::grch38) %>% 
  dplyr::select(ensgene, symbol, in_anno) %>%
  dplyr::mutate(symbol = case_when(is.na(symbol) ~ ensgene,
                                   TRUE ~ symbol)) %>%
  dplyr::distinct(in_anno, .keep_all = TRUE) %>% 
  dplyr::mutate(is_unque = isUnique(symbol))
GSE115259_eset <- tmp_tbl %>% 
  dplyr::filter(is_unque) %>% 
  pull(ensgene) %>% 
  GSE115259_eset[.]
rownames(GSE115259_eset) <- tmp_tbl %>% 
  dplyr::filter(is_unque) %>%
  pull(symbol)
# Run ReferenceBasedDecomposition
GSE115259_res <- ReferenceBasedDecomposition(bulk.eset = GSE115259_eset, sc.eset = single.cell.expression.set, markers = pbmc@[email protected], use.overlap = FALSE, verbose = TRUE)
GSE115259_res$bulk.props

I'm guessing that this could be because the scRNA seq dataset is not from any of the bulk RNA seq samples, however I imagine this would be the major use case for running deconvolution as long as the reference samples are from the same cell types?

Thanks,
Miha

Errors in MarkerBasedDecomposition

Hi Brandon,

I want to use the BisqueRNA package to infer the cell types of my bulk data. I have my own markers and am using MarkerBasedDecomposition function. However, with different cutoffs of min_gene, I got different errors.

res <- MarkerBasedDecomposition(bulk.eset, markers = mymarker)
The error was "Clusters must have a minimum of 5 unique marker genes"

res <- MarkerBasedDecomposition(bulk.eset, markers = mymarker, min_gene = 1)
The error was "Error in base::eigen(varcov):infinite or missing values in 'x'"

res <- MarkerBasedDecomposition(bulk.eset, markers = mymarker, min_gene = 2)
The error was "Error in 1:ng: argument of length 0".

I attached the bulk data and markers I used.
exampleData.rda.zip

Please help me out.

Thanks,
Frank

Lack of "ground truth"markers and estimation

Hi,

I've been using Bisque to deconvolute a bulk RNAseq dataset using known marker genes, obtained from a scRNAseq experiment. While the workflow is easy to follow and useful, I have 2 questions:

Can I obtain expressions of these marker genes in the bulk RNAseq data, by multiplying the obtained estimates with the normalised counts?
How do I determine the accuracy of the estimated proportions in the absence of ground truth? It may sound rhetorical but I was wondering the approaches to this issue.

Thoughts?

Best,
Sandeep

Zero expression in selected genes

Hello,

I am trying to use the bisqueRNA package to estimate cell type proportions from bulk RNAseq data based on a snRNA reference. However, when I try to use "res <- BisqueRNA::ReferenceBasedDecomposition(bulk.eset, sc.eset, markers = NULL, use.overlap = FALSE)" I end up with the following message:

Decomposing into 9 cell types.
Using 15578 genes in both bulk and single-cell expression.
Converting single-cell counts to CPM and filtering zero variance genes.
Error in CountsToCPM(sc.eset) :
Zero expression in selected genes for 161109 cells
Calls: -> CountsToCPM ->

Any ideas of what could be causing this issue and what can I do to solve it?

Thank you so much,
Helena

Adipose dataset

Hi,

Thanks for the package. I read the paper and find no reference about the adipose dataset. Is there a ID to access or request the dataset? I appreciate your answer.

Best,
Wennan

Error: Zero genes left for decomposition

Hi Brandon -- Any insight into the error below? I'm trying out your reference-based deconvolution using two scRNA Seurat objects with defined cell types.

counts <- readRDS(file.path("Outputs", "all", "raw-counts.rds")) # 3 bulk RNA samples
sc1 <- readRDS("~/SEURAT-OBJ1.rds")
sc2 <- readRDS("~/SEURAT-OBJ2.rds")
sc <- merge(sc1, y = sc2, add.cell.ids = c("Donor1", "Donor2"))
sc$ID <- ifelse(is.na([email protected]$SIMPLIFIED_ID), [email protected]$INTEGRATED_ID_SIMPLIFIED, [email protected]$SIMPLIFIED_ID)

table(sc$ID)
Endothelial  Epithelial  Fibroblast     Myeloid       Tcell 
       1515        6043        4127        1226         101 



bulk_eset <- Biobase::ExpressionSet(assayData = as.matrix(counts))
sc_eset <- BisqueRNA::SeuratToExpressionSet(sc, delimiter="_", position=1, version="v3")
stopifnot(identical(sampleNames(sc_eset), colnames(sc)))
sc_eset$cellType <- as.vector(sc_eset$cellType)
sc_eset$cellType <- factor(sc$ID, levels = c("Endothelial", "Epithelial", "Fibroblast", "Myeloid", "Tcell"))
res <- BisqueRNA::ReferenceBasedDecomposition(bulk_eset, sc_eset, markers=NULL, use.overlap=FALSE)

Decomposing into 5 cell types.
Using 22744 genes in both bulk and single-cell expression.
Converting single-cell counts to CPM and filtering zero variance genes.
Filtered 188 zero variance genes.
Converting bulk counts to CPM and filtering unexpressed genes.
Filtered 0 unexpressed genes.
Generating single-cell based reference from 14707 cells.

Inferring bulk transformation from single-cell alone.
Applying transformation to bulk samples and decomposing.
Dropped an additional 22556 genes for which a transformation could not be learned.
Error in BisqueRNA::ReferenceBasedDecomposition(bulk_eset, sc_eset, markers = NULL,  : 
  Zero genes left for decomposition.

Session info

> sessionInfo()
R version 3.6.3 (2020-02-29)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.3 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so

locale:
 [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8        LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8   
 [6] LC_MESSAGES=C.UTF-8    LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C           LC_TELEPHONE=C        
[11] LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] BisqueRNA_1.0.4             DESeq2_1.26.0               SummarizedExperiment_1.16.1 DelayedArray_0.12.3        
 [5] BiocParallel_1.20.1         matrixStats_0.57.0          Biobase_2.46.0              GenomicRanges_1.38.0       
 [9] GenomeInfoDb_1.22.1         IRanges_2.20.2              S4Vectors_0.24.4            BiocGenerics_0.32.0        

loaded via a namespace (and not attached):
  [1] backports_1.1.10       Hmisc_4.4-1            plyr_1.8.6             igraph_1.2.5           lazyeval_0.2.2        
  [6] splines_3.6.3          listenv_0.8.0          ggplot2_3.3.2          digest_0.6.25          htmltools_0.5.0       
 [11] magrittr_1.5           checkmate_2.0.0        memoise_1.1.0          tensor_1.5             cluster_2.1.0         
 [16] ROCR_1.0-11            globals_0.13.0         annotate_1.64.0        jpeg_0.1-8.1           colorspace_1.4-1      
 [21] blob_1.2.1             rappdirs_0.3.1         ggrepel_0.8.2          xfun_0.17              dplyr_1.0.2           
 [26] crayon_1.3.4           RCurl_1.98-1.2         jsonlite_1.7.1         genefilter_1.68.0      spatstat_1.64-1       
 [31] spatstat.data_1.4-3    survival_3.2-3         zoo_1.8-8              glue_1.4.2             polyclip_1.10-0       
 [36] gtable_0.3.0           zlibbioc_1.32.0        XVector_0.26.0         leiden_0.3.3           future.apply_1.6.0    
 [41] abind_1.4-5            scales_1.1.1           DBI_1.1.0              miniUI_0.1.1.1         Rcpp_1.0.5            
 [46] viridisLite_0.3.0      xtable_1.8-4           htmlTable_2.1.0        reticulate_1.16        foreign_0.8-75        
 [51] bit_4.0.4              rsvd_1.0.3             Formula_1.2-3          htmlwidgets_1.5.1      httr_1.4.2            
 [56] RColorBrewer_1.1-2     ellipsis_0.3.1         Seurat_3.2.1           ica_1.0-2              pkgconfig_2.0.3       
 [61] XML_3.99-0.3           nnet_7.3-14            uwot_0.1.8             deldir_0.1-29          locfit_1.5-9.4        
 [66] tidyselect_1.1.0       rlang_0.4.7            reshape2_1.4.4         later_1.1.0.1          AnnotationDbi_1.48.0  
 [71] munsell_0.5.0          tools_3.6.3            generics_0.0.2         RSQLite_2.2.0          pacman_0.5.3          
 [76] ggridges_0.5.2         stringr_1.4.0          fastmap_1.0.1          goftest_1.2-2          knitr_1.30            
 [81] bit64_4.0.5            fitdistrplus_1.1-1     purrr_0.3.4            RANN_2.6.1             pbapply_1.4-3         
 [86] future_1.19.1          nlme_3.1-149           mime_0.9               compiler_3.6.3         rstudioapi_0.11       
 [91] plotly_4.9.2.1         png_0.1-7              spatstat.utils_1.17-0  tibble_3.0.3           geneplotter_1.64.0    
 [96] stringi_1.5.3          lattice_0.20-41        Matrix_1.2-18          vctrs_0.3.4            pillar_1.4.6          
[101] lifecycle_0.2.0        lmtest_0.9-38          RcppAnnoy_0.0.16       data.table_1.13.0      cowplot_1.1.0         
[106] bitops_1.0-6           irlba_2.3.3            httpuv_1.5.4           patchwork_1.0.1.9000   R6_2.4.1              
[111] latticeExtra_0.6-29    promises_1.1.1         KernSmooth_2.23-17     gridExtra_2.3          codetools_0.2-16      
[116] MASS_7.3-53            sctransform_0.3        GenomeInfoDbData_1.2.2 mgcv_1.8-31            grid_3.6.3            
[121] rpart_4.1-15           tidyr_1.1.2            Rtsne_0.15             shiny_1.5.0            base64enc_0.1-3

Typo in SemisupervisedTransformBulk

The shrink.scale variable in the SemisupervisedTransformBulk function needs a minor adjustment. Currently, it divides sqrt(base::sum((Y.train[gene,,drop=T]-Y.center)^2) over n and then adds 1 to it, while it should divide sqrt(base::sum((Y.train[gene,,drop=T]-Y.center)^2) over (n-1), as mentioned in the paper. So, the modified variable should be
shrink.scale <- base::sqrt(base::sum((Y.train[gene,,drop=T]-Y.center)^2)/(n+1))

Cannot download raw data

I cannot download raw data from https://www.synapse.org/ although I have logged in with my own account.

I saw this information:
You do not have download access for this item.

Do I have another way to access this raw data and have a real test of your algorithm?

Min number of markers in eset?

I have run the marker-based decomposition successfully with >1000 markers in the list with between 3 and 600 per cell type . I am now running the same eset with between 5-10 markers per cell type (85 in total) and I get the error:

Error in BisqueRNA::MarkerBasedDecomposition(bulk.eset = alvFC_CTR.norm.eset.ENSG, : No overlapping genes between markers and bulk.eset

This is despite running

length(intersect(ENSG.unique_slim$gene, featureNames(alvFC_CTR.norm.eset.ENSG)))#81

and getting 81 overlapping genes

So I am wondering is there an lower limit on number of overlapping genes?

How to interpreter Bisque result

Hi!
I am using Bisque to perform deconvolution of bulk RNA-seq data. I applied the Marker-based decomposition approach using a list of genes Marker without the foldChange and the function:

BisqueRNA::MarkerBasedDecomposition(bulk.eset, markers, weighted=F)

I obtained the following result:

                sample1       sample2     sample3 
cell_type1      -2.29         1.20         -0.14
cell_type2       1.1            0.22         -3.14

I did not know how to interpreter the result that I obtained.
For example what does it mean if I have a negative value? If into a sample I have negative values for both cell types what does it indicate?

Thank you for your help.

By,

Concetta

Microarray data?

Hi,

Thank you for developing such a useful software - really helpful. I am wondering if we could also utilize bisque for deconvolution of published microarray datasets. Do you think it is possible? If so, what is your recommendation for preprocessing of microarray data?

Best,
Yoshi

Negative proportion values

Hi Justin,

We've been extensively testing Bisque for our deconvolution use case, specifically Marker based Decomposition. From our results and also from your vignette, the bulk.props matrix has negative values. How to interpret these results? Surely the cell-type proportions can't be negative?

Apologies for a naive question and raising an issue for it.

Best,
Sandeep

MarkerBasedDecomposition overlapping genes error

Hello,

I'm trying to run the MarkerBasedDecomposition function, but I keep getting the error: Error in MarkerBasedDecomposition(eNorm_bulk, cell_markers, weighted = F) : No overlapping genes between markers and bulk.eset

I've throughly checked the al genes within cell_markers are present in the bulk expression matrix, and are in the same order.

My cell_markers dataframe

My bulk expression matrix (from an ExpressionSet assay, stored under eNorm_bulk@assayData[["exprs"]] )

I'm not sure how to proceed.

Thank you!

Prepare for upcoming Seurat v5 release

I am opening this issue as a notification because BisqueRNA is listed here as a package that relies (depends/imports/suggests) on Seurat. As you may know, we recently released Seurat v5 as a beta in March of this year, with new updates for spatial, multimodal, and massively scalable analysis. For more information on updates and improvements, check out our website https://satijalab.org/seurat/.
We are now preparing to release Seurat v5 to CRAN, and plan to submit it on October 23rd. While we have tried our best to keep things backward-compatible, it is possible that updates to Seurat and SeuratObject might break your existing functionality or tests. We wanted to reach out before the new version is on CRAN, so that there's time to report issues/incompatibilities and prepare you for any changes in your code base that might be necessary.

We apologize for any disruption or inconvenience, but hope that the improvements to Seurat v5 will benefit your users going forward.
To test the upcoming release, you can install Seurat from the seurat5 branch using the instructions available on this page: https://satijalab.org/seurat/articles/install.

Thank you!
Seurat v5 team

visualization

Hi, thank you for providing such a useful tool, I now have a question about how I can achieve visualization when I have the results 'res', as shown in Figure 4A.

BisqueRNA Functions not available for usage

Great tool for bulk tissue deconvolution, its a great package and estimates really well!

May I ask if some of the BisqueRNA functions will be made available for usage? e.g. GenerateSCReference and SemisupervisedTransformBulk functions

Or if these functions cannot be made available for usage, could I suggest adding an additional output from the ReferenceBasedMethod that returns the transformed bulk counts? This can help users to plot distributions of the transformed datasets and run statistical tests

SeuratToExpressionSet

hi,when I run the code 'sc.eset <- BisqueRNA::SeuratToExpressionSet(seurat, delimiter="-", position=2, version="v3")',I get the errow below
sc.eset <- BisqueRNA::SeuratToExpressionSet(seurat, delimiter="-", position=2, version="v3")
Split sample names by "-" and checked position 2. Found 18 individuals.
Example: "AAACCCAAGAAGCTCG-1" corresponds to individual "1".
Error in asMethod(object) :
Cholmod error 'problem too large' at file ../Core/cholmod_dense.c, line 102

Can I use TCGA's TPM data format for bulk RNA as input to bisque?

Looking forward to the author's answer, thanks!

Error in Generating single-cell based reference

Hi @brandonjew ,

I am trying to use reference-based decomposition to deconvolute bulk samples.
res <- BisqueRNA::ReferenceBasedDecomposition(bulk.eset, sc.eset, markers=NULL, use.overlap=FALSE)

I am getting the following error:

Decomposing into 7 cell types.
Using 8783 genes in both bulk and single-cell expression.
Converting single-cell counts to CPM and filtering zero variance genes.
Filtered 39 zero variance genes.
Converting bulk counts to CPM and filtering unexpressed genes.
Filtered 0 unexpressed genes.
Generating single-cell based reference from 5000 cells.

Error in sc.props[base::colnames(sc.ref), , drop = F] : 
  subscript out of bounds

Is it because of the cell type naming issues ?
Checked the column names for sc reference input:
10X_P7_9_CACACAAAGTAGGTGC"
"10X_P7_9_CACACAAAGTCCGGTC" "10X_P7_9_CACACAATCCGAGCCA" "10X_P7_9_CACACAATCTTTACGT" "10X_P7_9_CACACCTCAGGATCGA" "10X_P7_9_CACACCTGTCTAGAGG" "10X_P7_9_CACACCTTCCAATGGT" "10X_P7_9_CACACTCAGTCGATAA" "10X_P7_9_CACACTCCACCTCGTT"

Thank you for your help !!

	if (base::is.null(markers)) {
	markers <- Biobase::featureNames(sc.eset)
	}
	else {
	markers <- base::unique(base::unlist(markers))
	}
	genes <- GetOverlappingGenes(sc.eset, bulk.eset, markers, verbose)
	sc.eset <-
	Biobase::ExpressionSet(assayData=Biobase::exprs(sc.eset)[genes,],
	phenoData=sc.eset@phenoData)
	bulk.eset <-
	Biobase::ExpressionSet(assayData=Biobase::exprs(bulk.eset)[genes,],
	phenoData=bulk.eset@phenoData)
	if (verbose) {
	base::message("Converting single-cell counts to CPM and ",
	"filtering zero variance genes.")
	}
	sc.eset <- CountsToCPM(sc.eset)

	SeuratToExpressionSet <- function(seurat.object, delimiter, position,
	version = c("v2", "v3")) {

	if (version == "v2") {
	get.cell.names <- function(obj) obj@cell.names
	get.ident <- function(obj) obj@ident
	get.raw.data <- function(obj) obj@raw.data
	}
	else if (version == "v3") {
	get.cell.names <- function(obj) base::colnames(obj)
	get.ident <- function(obj) Seurat::Idents(object=obj)
	get.raw.data <- function(obj) Seurat::GetAssayData(object = obj,
	slot = "counts")
	}

	#' @param seurat.object Seurat object with attributes \emph{raw.data},
	#' \emph{ident}, and \emph{cell.names}

cozygene / bisque Goto Github PK

bisque's People

Contributors

Stargazers

Watchers

Forkers

bisque's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs