danko-lab / bayesprism Goto Github PK
View Code? Open in Web Editor NEWA Fully Bayesian Inference of Tumor Microenvironment composition and gene expression
A Fully Bayesian Inference of Tumor Microenvironment composition and gene expression
Dear Danko,
I sincerely admire your research. I have a question: how to improve the speed of embedding learning? Is it through scRNA data reduction?
Thank you very much for your attention and consideration.
Hi,
I am interested in deconvolution of PBMC and Whole blood samples using BayesPrism.
I have the data with two different levels of annotation (8 cell types and 32 cell subtypes).
However, I got an error with the following:
"one or more cell states belong to multiple cell types".
Could you give me an example of names for "cell type labels" and "cell state labels" ?
Or Is there a way to circumvent this error?
Thank you in advance.
Kind regards,
Seoyeon
Hey guys,
Do you think BayesPrism can be used for deconvolution of spatial transcriptomics datasets, to get cell-type specific expression profiles?
When I use
new.prism(
reference=sc.dat.filtered.pc,
mixture=bk.dat,
input.type="count.matrix",
cell.type.labels = cell.type.labels,
cell.state.labels = cell.state.labels,
key="tumor",
outlier.cut=0.01,
outlier.fraction=0.1,
)
I met an error:
one or more cell states belong to multiple cell types
the cell.state.labels here are the patient names, did I made some mistakes?
Hello,
I was wondering what the software license would be for BayesPrism?
Best,
Chang
Hi guys,
I am using BayesPrim to deconvolve a breast cancer dataset, in which there are 5 sub-types of cancer epithelial cells.
I'm more interested in the cancer sub-type populations (cell.states) than the cancer population itself (cell.types).
Just wondering if there's any way I can get the updated cell.states fractions? Or perhaps an option to specific multiple keys for cancer populations in new.prism()
?
Cheers,
Khoa.
Hello!
After I launched the experiment, it reported an error message: (I've no idea what's "Error: Stop [err5]" meaning for~)
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
100 240 100 120 100 120 1333 1333 --:--:-- --:--:-- --:--:-- 2637
Loading required package: BayesPrism
Loading required package: snowfall
Loading required package: snow
Loading required package: NMF
Loading required package: pkgmaker
Loading required package: registry
Loading required package: rngtools
Loading required package: cluster
NMF - BioConductor layer [OK] | Shared memory capabilities [OK] | Cores 127/128
Warning messages:
1: replacing previous import 'gplots::lowess' by 'stats::lowess' when loading 'BayesPrism'
2: replacing previous import 'BiocParallel::register' by 'NMF::register' when loading 'BayesPrism'
Error: Stop [err5]
Execution halted
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
100 242 100 120 100 122 2033 2067 --:--:-- --:--:-- --:--:-- 4172
Hi, really appreciate for developing this excellent tool.
Although raw counts are recommended data type as input, I noticed that TPM data was used in the reference article as well:
"For instances where only TPM normalized data were available the scRNA-seq reference for HNSCC (scHNSCC), we summed TPM normalized reads."
Is it OK if I use TPM data as input for both scRNA and bulk data? If feasible, does the TPM data need to be log transformed?
Many thanks,
Todd
Hi
Thanks for developing this.
1.
I have a basic doubt regarding what cell state and cell label means.
So, per my understanding cell label means what cell it is such as astrocyte, glia, etc
What does the cell state mean here, should i cluster the single cell data such that the cells represent the state? Such that cells with similar transcriptional state represent similar state. Such that astrocyte and astrocyte like cell belong to state 1 and so on ?
To circumvent this I made the cell label as cell state and ran the program, is this okay?
Warning: input seems to be log-transformed. Please double check your input. Log transformation should be avoided
However I am choosing the counts slot of seurat object which seems to contain raw counts.
Is there any way I can check if this is the issue with the dataset or why exactly this error comes up?
Thanks again!
Hi,
I would like to learn to run BayesPrism on some bulk RNA seq data as part of a research project. However, whenever I try to open the tutorial_deconvolution.html and tutorial_embedding_learning.html files (by clicking on View Raw) I get a lot of text which I do not think is how the tutorial is intended to look
I want to deconvolute non-tumor tissue, my cell.type.labels is same with cell.type.labels, and the "key" of new.prism is set to NULL, is it correct?
Hello,
I was wondering if this could potentially be used for scATAC-seq reference peaks to deconvolute bulk ATAC-seq expression?
Specifically I was wondering the constraints on the input matrix for such a way to have it work? I'm assuming I need to convert the peaks into an integer matrix.
Best,
Chang
I'm bulk RNA-seq data and sc-RNA data with Ensembl Ids. And using cell type Fibroblasts.
> dim(bk.dat)
[1] 546 19988
> dim(sc.dat)
[1] 1159 19828
> sort(table(cell.type.labels))
Fibroblast
1159
> sort(table(cell.state.labels))
cell.state.labels
fb-4 fb-1 fb-3 fb-2 fb-0
83 148 177 237 514
Using the above data Constructed prism object like below:
> myPrism <- new.prism(
+ reference=sc.dat.filtered.pc,
+ mixture=bk.dat,
+ input.type="count.matrix",
+ cell.type.labels = cell.type.labels,
+ cell.state.labels = cell.state.labels,
+ key="Fibroblast",
+ outlier.cut=0.01,
+ outlier.fraction=0.1,
+ )
number of cells in each cell state
cell.state.labels
fb-4 fb-1 fb-3 fb-2 fb-0
83 148 177 237 514
Number of outlier genes filtered from mixture = 9
Aligning reference and mixture...
Nornalizing reference...
Warning message:
In validityMethod(object) : Warning: pseudo.min does not match min(phi)
Then ran the Bayesprism like below:
> bp.res <- run.prism(prism = myPrism, n.cores=50)
Run Gibbs sampling...
Current time: 2022-12-02 21:03:14
Estimated time to complete: 1hrs 2mins
Estimated finishing time: 2022-12-02 22:04:15
Start run...
Explicit sfStop() is missing: stop now.
Stopping cluster
snowfall 1.84-6.2 initialized (using snow 0.4-4): parallel execution on 50 CPUs.
Stopping cluster
Update the reference matrix ...
snowfall 1.84-6.2 initialized (using snow 0.4-4): parallel execution on 50 CPUs.
Error in checkForRemoteErrors(val) :
one node produced an error: subscript out of bounds
So, first I saw Explicit sfStop() is missing: stop now.
then at the end of the run I saw the following error:
Error in checkForRemoteErrors(val) :
one node produced an error: subscript out of bounds
Could you please help, how to resolve this error? thank you.
I am wondering whether I can use single nuclei RNA-seq data as input. Compared with scRNA-seq, snRNA-seq tends to capture the mRNA inside the nuclei. I think this might introduce some potential biases.
Hello,
there are some guidelines in the tutorial to annotate the reference scRNA-seq data, for example:
But I could not find any suggested workflow to annotate the reference data.
What would be the ideal way to generate the cell label and cell state annotations? Are there any suggested tools I could use?
Thank you and best regards
Hi, very useful tool!
I wonder if the ratio of cell types in the input data is needed to maintain their true biological fraction, or just a certain number of cells is enough?
Hello,
This is an open thread to share experience. I would like to know what the best inferred cell types are, and what does it depend on. Let's say you have a reference single cell data, and a test bulk data. Do you think that the most abundant cell types in reference or test are better inferred ? In my experience, not necessarily.
Thanks !
Hello @tinyi and Team,
I'm getting similar errors while running plot.cor.phi
and plot.scRNA.outlier
functions. I've followed the vignette (Tutorial: bulk RNA-seq deconvolution using BayesPrism by Tinyi Chu) step-by-step to generate the input files correctly. Please find the commands, respective errors, and some cells giving information about the input files that I've prepared for BayesPrism.
Can you please look into this and help me resolve it?
plot.cor.phi(input = sc.dat,
input.labels = cell.state.labels,
title = "cell state correlation",
# specify pdf.prefix if need to output to pdf
# pdf.prefix = "BayesPrism.crc.cor.cs",
cexRow = 0.2, cexCol = 0.2, min.exp = 3,
margins = c(2,2))
Error in h(simpleError(msg, call)): error in evaluating the argument 'j' in selecting a method for function '[': 'x' must be an array of at least two dimensions
Traceback:
- plot.cor.phi(input = sc.dat, input.labels = cell.state.labels,
. title = "cell state correlation", cexRow = 0.2, cexCol = 0.2,
. min.exp = 3, margins = c(2, 2))- input[, colSums(input) >= min.exp]
- colSums(input)
- stop("'x' must be an array of at least two dimensions")
- .handleSimpleError(function (cond)
. .Internal(C_tryCatchHelper(addr, 1L, cond)), "'x' must be an array of at least two dimensions",
. base::quote(colSums(input)))- h(simpleError(msg, call))
sc.stat <- plot.scRNA.outlier(
input = sc.dat, #make sure the colnames are gene symbol or ENSMEBL ID
cell.type.labels = cell.type.labels,
species = "hs", #currently only human(hs) and mouse(mm) annotations are supported
return.raw = TRUE #return the data used for plotting.
# pdf.prefix = "BayesPrism.crc.sc.stat" # specify pdf.prefix if need to output to pdf
)
Error in colSums(ref[labels == label.i, , drop = F]): 'x' must be an array of at least two dimensions
Traceback:
- plot.scRNA.outlier(input = sc.dat, cell.type.labels = cell.type.labels,
. species = "hs", return.raw = TRUE)- collapse(ref = input, labels = cell.type.labels)
- do.call(rbind, lapply(labels.uniq, function(label.i) colSums(ref[labels ==
. label.i, , drop = F])))- lapply(labels.uniq, function(label.i) colSums(ref[labels == label.i,
. , drop = F]))- FUN(X[[i]], ...)
- colSums(ref[labels == label.i, , drop = F])
- stop("'x' must be an array of at least two dimensions")
For reference:
class(bk.dat)
class(sc.dat)
class(cell.type.labels)
class(cell.state.labels)
'matrix''array'
'dgCMatrix'
'character'
'character'
dim(bk.dat)
dim(sc.dat)
length(cell.type.labels)
length(cell.state.labels)
59218184
61015018184
610150
610150
head(bk.dat)
head(sc.dat)
head(cell.type.labels)
head(cell.state.labels)
RNU12-2P EFCAB8 TRIM75P GTPBP6 EFCAB12 A1BG A1CF A2M A2ML1 A4GALT ... ZWILCH ZWINT ZXDA ZXDB ZXDC ZYG11A ZYG11B ZYX ZZEF1 ZZZ3 TCGA.3L.AA1B.01 1.9342 2.4178 0.4836 1033.850 0.0000 22.1470 220.987 15911.50 0.4836 118.9560 ... 403.641 629.594 71.0832 461.315 1105.42 3.3849 543.037 6259.19 1358.32 798.356 TCGA.4N.A93T.01 0.4838 2.4190 0.0000 1817.610 1.4514 171.2680 100.629 1494.33 0.4838 22.2545 ... 186.686 442.187 39.6710 366.715 1149.49 0.4838 290.760 4653.12 1220.13 333.817 TCGA.4T.AA8H.01 2.9245 2.9245 0.0000 719.430 0.7311 20.9980 174.008 1333.57 36.5564 16.0848 ... 520.782 1033.080 31.4385 349.479 1083.53 0.0000 669.713 4460.61 3002.01 530.068 TCGA.5M.AAT4.01 2.1515 2.1515 0.8606 879.948 1.7212 6.4587 151.463 2424.26 6.8847 75.7315 ... 468.408 1629.090 54.6472 542.169 1374.35 0.4303 445.353 4190.19 1093.37 574.441 TCGA.5M.AAT5.01 0.9892 8.9030 0.0000 934.819 1.4838 14.8384 255.715 2398.34 0.9892 41.5475 ... 663.533 838.864 29.1822 428.335 1240.98 3.4623 550.504 3878.26 1016.43 413.002 TCGA.5M.AAT6.01 1.3125 4.5937 0.0000 605.049 3.9374 49.8017 0.000 7231.65 2.6249 161.4340 ... 600.771 1338.720 45.9365 335.337 1056.54 13.7810 492.833 6165.99 1390.56 717.266 [[ suppressing 34 column names 'OR4F5', 'OR4F29', 'FAM41C' ... ]]6 x 18184 sparse Matrix of class "dgCMatrix"cell1 . . . . . . . . 2 . . . . 1 3 1 . . . . . . . . . 4 1
cell2 . . . . . . . . 2 1 . . . . . . . . . . . 1 . . . 1 .
cell3 . . . . . . . . 1 1 . . . . . . . 1 . . . . . . . . .
cell4 . . . . 1 1 . 2 9 2 . . . 13 5 . . . . . . . . . . 3 1
cell5 . . . . . . . . . . . . . . . . . . . . . . . . . . 1
cell6 . . . . . . . . 2 . . . . 1 2 1 . 2 . . . . . 1 1 5 1cell7 2 . . . . . . ......
cell8 . . . 4 . . . ......
cell9 . . . 1 . . . ......
cell10 2 . . 8 1 . . ......
cell11 2 . . . . . . ......
cell12 . . . 4 . . . ...........suppressing 18150 columns in show(); maybe adjust 'options(max.print= *, width = *)'
..............................
- 'Endothelial'
- 'Endothelial'
- 'Endothelial'
- 'Endothelial'
- 'Endothelial'
- 'Endothelial'
- 'Endothelial'
- 'Endothelial'
- 'Endothelial'
- 'Endothelial'
- 'Endothelial'
- 'Endothelial'
Hi all,
I've tested the program on my local machine and it works after significantly shrinking my datasets. So I've moved to a compute cluster to try to get things working on a less scaled down version of my data. I got the error below.
I guess my question is will BayesPrism will work if the bigmemory
package is not installed? It isn't currently available on our cluster and I'm guessing this is where the error originates.
Run Gibbs sampling...
Current time: 2022-08-31 09:22:58
Estimated time to complete: 29mins
Estimated finishing time: 2022-08-31 09:51:32
Start run...
R Version: R version 4.2.0 (2022-04-22)
snowfall 1.84-6.2 initialized (using snow 0.4-4): parallel execution on 32 CPUs.
Error in checkForRemoteErrors(val) :
8 nodes produced errors; first error: could not find function "sample.Z.theta_n"
> traceback()
12: stop(count, " nodes produced errors; first error: ", firstmsg)
11: checkForRemoteErrors(val)
10: staticClusterApply(cl, fun, length(x), argfun)
9: clusterApply(cl, splitList(x, length(cl)), lapply, fun, ...)
8: lapply(args, enquote)
7: do.call("fun", lapply(args, enquote))
6: docall(c, clusterApply(cl, splitList(x, length(cl)), lapply,
fun, ...))
5: parLapply(sfGetCluster(), x, fun, ...)
4: sfLapply(1:nrow(X), cpu.fun)
3: run.gibbs.refPhi(gibbsSampler.obj = gibbsSampler.obj, final = final,
compute.elbo = compute.elbo)
2: run.gibbs(gibbsSampler.ini.cs, final = FALSE)
1: run.prism(prism = myPrism, n.cores = 32)
I have read the tutorial of the package, and it mentioned that "Correlating Z (after normalization using vst or from bp.res@reference.update@psi_mal) with theta to understand how gene expression of each gene (in malignant cells) correlates with the cell type fraction of non-malignant cells in tumor microenvironment". So should I use the two filters mentioned in the original papers, or use the genes from Z directly?
It is important to use the updated estimates of cell type compositions in the output for better results. However, would the package also provide updated estimates of cell state composition ?
Thanks!
Hi, BayesPrism team
I've played this package for a while and it performes very stable. But today I encountered an error like this:
recommend to have sufficient number of cells in each cell state
Number of outlier genes filtered from mixture = 44
Aligning reference and mixture...
Nornalizing reference...
Warning message:
In new.prism(reference = sc.dat.filtered.pc, mixture = bk.dat, input.type = "count.matrix", :
Warning: very few gene from reference and mixture match! Please double check your gene names.
Run Gibbs sampling...
Current time: 2022-07-19 02:02:40
Estimated time to complete: 2mins
Estimated finishing time: 2022-07-19 02:04:01
Start run...
R Version: R version 4.1.2 (2021-11-01)
snowfall 1.84-6.1 initialized (using snow 0.4-4): parallel execution on 50 CPUs.
Stopping cluster
Update the reference matrix ...
Error in if (any(which.row)) { : missing value where TRUE/FALSE needed
Calls: run.prism -> updateReference -> get.MLE.psi_mal -> norm.to.one
In addition: Warning message:
In searchCommandline(parallel, cpus = cpus, type = type, socketHosts = socketHosts, :
Unknown option on commandline: --file
Execution halted
It's kind of weird because the the query data (83 samples) is always the same in the past few succeed trials. The only difference is that the number of genes in it is reduced from 2000 to 901 like this
> dim(bk.dat)
[1] 83 901
Besides, if I subset some of the 83 samples and add part/all them up to make a artifical dataset like this.
data$testA1 = with(data, CR190516 + CR190517 + CR190518 + CR190522 + CR190523) ;
data$testA2 = with(data, CR190516 + CR190517 + CR190518 ) ;
data$testA3 = with(data, CR190518 + CR190522 + CR190523) ;
data$testA4 = with(data, CR190516 );
data$testA5 = with(data, CR190517 );
data$testA6 = with(data, CR190518 );
data$testA7 = with(data, CR190522 );
data$testA8 = with(data, CR190523 );
testSamples=colnames(data)[grepl("test", colnames(data))]
test = data[,testSamples]
rownames(test) = rownames(data) ;
testData=t(test) ;
bk.dat <- testData
No errors showed up.
Is the error coming from some of my samples?
Dear,
I am using BayesPrism to perform deconvolution by using two different levels of annotation (cell types and cell subtypes).
However, I get this following error, while using only either annotation level works.
Error in validObject(.Object) :
invalid class “prism” object: cell states between map and phi_cellState do not match
The table of cell type labels and cell state labels looks like this.
Thanks in advance.
Kind regards,
Seoyeon
Hello,
I am running a deconvolution on R/4.2.0 using BayesPrism. I have followed your tutorial without error until the run.prism()
step:
> bp.res <- run.prism(prism = myPrism, n.cores=1)
Run Gibbs sampling...
Current time: 2022-09-26 08:22:58
Estimated time to complete: 8hrs 4mins
Estimated finishing time: 2022-09-26 16:26:16
Start run...
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134
Update the reference matrix ...
Explicit sfStop() is missing: stop now.
Stopping cluster
snowfall 1.84-6.2 initialized (using snow 0.4-4): parallel execution on 1 CPUs.
Error in checkForRemoteErrors(val) :
one node produced an error: could not find function "Rcgminu"
I can see that Rcgminu()
is an exported function from the BayesPrism package and is available when the package is loaded. I have also tried running this with the Rcgmin package loaded and received the same error.
Here is my session info:
> sessionInfo()
R version 4.2.0 (2022-04-22 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19043)
Matrix products: default
locale:
[1] LC_COLLATE=English_Canada.utf8 LC_CTYPE=English_Canada.utf8
[3] LC_MONETARY=English_Canada.utf8 LC_NUMERIC=C
[5] LC_TIME=English_Canada.utf8
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] BayesPrism_2.0 NMF_0.24.0 cluster_2.1.3
[4] rngtools_1.5.2 pkgmaker_0.32.2 registry_0.5-1
[7] snowfall_1.84-6.2 snow_0.4-4 DESeq2_1.36.0
[10] SummarizedExperiment_1.26.1 Biobase_2.56.0 MatrixGenerics_1.8.0
[13] matrixStats_0.62.0 GenomicRanges_1.48.0 GenomeInfoDb_1.32.2
[16] IRanges_2.30.0 S4Vectors_0.34.0 BiocGenerics_0.42.0
loaded via a namespace (and not attached):
[1] bitops_1.0-7 bit64_4.0.5 doParallel_1.0.17
[4] RColorBrewer_1.1-3 httr_1.4.4 tools_4.2.0
[7] irlba_2.3.5 utf8_1.2.2 R6_2.5.1
[10] KernSmooth_2.23-20 DBI_1.1.3 colorspace_2.0-3
[13] withr_2.5.0 tidyselect_1.1.2 bit_4.0.4
[16] compiler_4.2.0 cli_3.3.0 BiocNeighbors_1.14.0
[19] DelayedArray_0.22.0 caTools_1.18.2 scales_1.2.0
[22] genefilter_1.78.0 stringr_1.4.0 digest_0.6.29
[25] XVector_0.36.0 pkgconfig_2.0.3 sparseMatrixStats_1.8.0
[28] limma_3.52.1 fastmap_1.1.0 rlang_1.0.4
[31] rstudioapi_0.13 RSQLite_2.2.14 DelayedMatrixStats_1.18.0
[34] generics_0.1.3 BiocParallel_1.30.2 gtools_3.9.2.1
[37] dplyr_1.0.9 RCurl_1.98-1.6 magrittr_2.0.3
[40] BiocSingular_1.12.0 scuttle_1.6.3 GenomeInfoDbData_1.2.8
[43] Matrix_1.4-1 Rcpp_1.0.8.3 munsell_0.5.0
[46] fansi_1.0.3 lifecycle_1.0.1 edgeR_3.38.4
[49] stringi_1.7.6 zlibbioc_1.42.0 gplots_3.1.3
[52] plyr_1.8.7 grid_4.2.0 blob_1.2.3
[55] dqrng_0.3.0 parallel_4.2.0 crayon_1.5.1
[58] lattice_0.20-45 Biostrings_2.64.0 beachmat_2.12.0
[61] splines_4.2.0 annotate_1.74.0 KEGGREST_1.36.3
[64] locfit_1.5-9.5 metapod_1.4.0 pillar_1.8.1
[67] igraph_1.3.1 geneplotter_1.74.0 reshape2_1.4.4
[70] codetools_0.2-18 ScaledMatrix_1.4.0 XML_3.99-0.9
[73] glue_1.6.2 scran_1.24.1 png_0.1-7
[76] vctrs_0.4.1 foreach_1.5.2 gtable_0.3.0
[79] purrr_0.3.4 assertthat_0.2.1 cachem_1.0.6
[82] ggplot2_3.3.6 BinfTools_0.0.0.9000 gridBase_0.4-7
[85] rsvd_1.0.5 xtable_1.8-4 survival_3.3-1
[88] SingleCellExperiment_1.18.0 tibble_3.1.7 iterators_1.0.14
[91] AnnotationDbi_1.58.0 memoise_2.0.1 statmod_1.4.37
[94] bluster_1.6.0 ellipsis_0.3.2
Please let me know if you need any other information. Any help would be greatly appreciated.
Hello, thank you for creating this great package.
I met an Error while running the get.exp.stat
function:
Error: logical subscript contains NAs
I checked the original code, I think the error might due to NA output while filtering out comparisons for cell states from the same cell type.
#filter out comparisons for cell states from the same cell type pairs.celltype.first <- ct.to.cst[match(fit.up$pairs$first, ct.to.cst[,"cell.state"]),"cell.type"] pairs.celltype.second <- ct.to.cst[match(fit.up$pairs$second, ct.to.cst[,"cell.state"]),"cell.type"]
and I made a small change.
I change the
ct.to.cst <- unique(cbind(cell.type=cell.type.labels, cell.state=cell.state.labels))
to
ct.to.cst <- unique(cbind.data.frame(cell.type=cell.type.labels, cell.state=cell.state.labels))
Then it worked well.
I'm not sure if this is a problem with my cell naming or a general problem.
Thank you very much.
I had a seurat object for B cells which I was using as a reference for my mixture (bulk-seq).
In the step: myPrism <- new.prism(
reference=sc.dat.filtered,
mixture=x,
input.type="count.matrix",
cell.type.labels = cell.type.labels,
cell.state.labels = cell.type.labels_broad,
key="B cell memory",
outlier.cut=0.01,
outlier.fraction=0.1,
)
I wanted to know how to select key in this function as there is no tumour. I tried using NA, although it returned: Error in validObject(.Object) : invalid class “prism” object: invalid key
I then tried using other cell types present and found that only B cell memory was valid for key. In my reference, B cell memory was not the most abundant cell type, so I am not sure how would it choose as a key?
Could you please provide me with more information on how keys are selected?
Many thanks!
Hi!
I have two questions about the sc reference:
Thank you!
Hi,
When I assign 'mm' to the parameters 'species', in the step of filtering out the outlier genes, the function select.gene.type() will get an error. It might be due to lack of gene annotation file for mouse, like the "gencode.v22.broad.category.txt" file in the extdata folder.
By the way, cleanup.genes() only support "MALAT1". If the species is mouse and change MALAT1 to Malat1, you will also get an error.
Thanks in advance !
Will there be a Python version in the future? It may speed things up. Right now it takes me at least half an hour to run with 300 patients.
Hello,
I would like to know what the minimum requirement in sample size is for bulk RNAseq in order to trust the inferred cell fraction. Do we need at least 10, 20 samples ?
Thanks !
Thank you for your excellent algorithm,
I have a question that if I use metacell data, which means a metacell is a sum of dozens of single cells, as a reference, whether the deconvolution result is accurate.
Best wishes!
J Hovelly.
Hi,
I was wondering if there is a way to estimate the proportion of cells that aren't present in the reference cells? I've deconvoluted bulk RNA-seq data for a tumour with a reference atlas from the same healthy tissue. There is no single cell data on this type of cancer, so I was wondering if I could predict gene expression of cancer cells from BayesPrism output? For example, EPIC gives a proportion of 'otherCells' in its output, so I was wondering if I could get something like this from BayesPrism.
Many thanks,
Alina
Hello,
I was wondering where I might be able to find the code used to generate figures from the BayesPrism publication related to downstream analyses such as GSEA-based interpretation of latent embeddings (e.g. Figure 4).
Thank you,
Daniel
Hello. Thank you for the amazing tool BayesPrim. I am wondering if I have bulk RNA seq data under A, B, C, and D conditions. But I only have the single cell sequence data under condition A. Can I still feed the BayesPrim with all the data to infer the cell type composition in bulk seq data under condition B, C, and D?
Hi,
I was running BayesPrism using gene symbols and met this error when running functions plot.scRNA.outlier and plot.bulk.outlier:
" Error in input.genes.short %in% gene.df[gene.df[, 1] == gene.group.i, :
object 'input.genes.short' not found "
I found it's due to the function "assign.category" in "process_input.R" and made a very slight change to make it work:
#detect if EMSEMBLE ID (starts with ENS) or gene symbol is used
if( sum(substr(input.genes,1,3)=="ENS")> length(input.genes)*0.8 ){
cat("EMSEMBLE IDs detected.\n")
input.genes.short <- unlist(lapply(input.genes, function(gene.id) strsplit(gene.id,split="\.")[[1]][1]))
gene.df <- gene.list[,c(1,2)]
gene.group.matrix <- do.call(cbind.data.frame, lapply(unique(gene.df[,1]),
function(gene.group.i) input.genes.short %in% gene.df[gene.df[,1]== gene.group.i,2]))
}
else{
cat("Gene symbols detected. Recommend to use EMSEMBLE IDs for more unique mapping.\n")
gene.df <- gene.list[,c(1,3)]
gene.group.matrix <- do.call(cbind.data.frame, lapply(unique(gene.df[,1]),
function(gene.group.i) input.genes %in% gene.df[gene.df[,1]== gene.group.i,2]))
}
Hope it helps : )
Thanks!
Shuai
Hello!
I am trying to run BayesPrism for my bulk RNAseq data sample size ~200 and Single cell data, no of cells ~70000. I am working with non-malignant samples. Here is the issue I am facing. Please let me know how to solve this.
For my single cell data only cell types information is available. I am not sure how to subtype these cells to come up with the cell states. For now I am using the cell types as both cell.type.labels and cell.state.labels as it is mandatory to provide both. Let me know if it is ok.
The initial data preparation went very well for my data. I just converted the single cell data sparse matrix to dense matrix and I selected only protein coding genes for Prism construction.
My code is here:
myPrism <- new.prism(
reference=sc.dat.filtered.pc,
mixture=bk.dat,
input.type="count.matrix",
cell.type.labels = cell.type.labels,
cell.state.labels = cell.type.labels,
key="NULL",
outlier.cut=0.01,
outlier.fraction=0.1,
)
#I am running this job in a cluster with 120GB memory. This job runs for few minutes, provides the cell state information and terminates with this error.
I am getting following error
#> number of cells in each cell state
#> cell.state.labels
#> PJ017-tumor-6 PJ032-tumor-5 myeloid_8 PJ032-tumor-4
#> 22 41 49
/sw/Containers/singularity/bin/run_singularity: line 28: 42360 Killed singularity $*
I have tried using get.exp to get a tumor-specific gene expression count matrix Z, but I noticed that the Z only contained the expression profile of 4806 genes, which was much fewer than both the bulk matrix and the scRNA-Seq reference. I wonder if this is normal or if there is something I need to check for troubleshooting.
Dear Sir or Madam,
I am having trouble to get the matrix of cell-type expression from bayesprism output. I am trying to use :
get.exp(bp.res, state.or.type="type")
But the latter unfortunately only give me back a flat vector and despite my effort to build a 3d array with it, I cannot find the right dimension order. How to get the 3d array we are supposed to get from this (e.g. high-resolution cell-type expression)
Thanks in advance ! Regards,
Alexandre Coudray
PhD student in Trono/La Manno group, EPFL Switzerland
when I check the result of function select.marker, sc.dat.filtered.pc.sig, I found it return all rows of NA in stromal cells, while other cells didn't. How to solve it? thanks you!
Dear developers,
First off, I'd like to express my appreciation for the development of this method.
I'm interested in using TPM normalized bulk references of sorted cell types. This includes both RNA-seq and micro-array based references. According to your tutorial, it seems this can be accomplished by setting input.type = "GEP"
.
However, I am not sure what would be the correct way to generate markers for each type.
The get.exp.stat
and select.marker
functions are designed for scRNA-seq data.
A potential workaround I'm considering manually performing DE analysis in a pairwise manner for each cell type, and then selecting the most discriminating genes as markers. However, I would like to verify if this approach is okay or if there's a better method available.
Any advice or guidance you can offer would be greatly appreciated.
Almog
Hi, do you have an official docker image for this?
If not, are you open to a pull request for one? I am developing one, and I think this might be useful for some folks.
Hello !
I just updated R to v.4.2, and now I downloaded the new BayesPrism (I had used the one in TED with Rv3.6.3 with no issues).
When I try to launch BayesPrism I am getting this error message :
number of cells in each cell state
cell.state.labels
SUDHL4 THP1 HL60 K562
551 562 570 659
Number of outlier genes filtered from mixture = 3
Aligning reference and mixture...
Nornalizing reference...
Error in validObject(.Object): invalid class “prism” object: invalid key
Traceback:
Could you please help me to resolve it ? Thanks in advance !
Regards,
Alexandre Coudray
PhD Student at EPFL
Hello !
I cannot install BayesPrism on R4.2.2 with the following error:
Downloading GitHub repo Danko-Lab/BayesPrism@HEAD
formatR (NA → 1.12 ) [CRAN]
futile.op… (NA → 1.0.1 ) [CRAN]
lambda.r (NA → 1.2.4 ) [CRAN]
snow (NA → 0.4-4 ) [CRAN]
futile.lo… (NA → 1.4.3 ) [CRAN]
RcppHNSW (NA → 0.3.0 ) [CRAN]
BiocNeigh… (NA → 1.16.0 ) [CRAN]
BiocParallel (NA → 1.32.6 ) [CRAN]
beachmat (NA → 2.14.2 ) [CRAN]
rsvd (NA → 1.0.5 ) [CRAN]
ScaledMatrix (NA → 1.6.0 ) [CRAN]
sparseMat… (NA → 1.10.0 ) [CRAN]
locfit (NA → 1.5-9.5 ) [CRAN]
DelayedMa… (NA → 1.20.0 ) [CRAN]
registry (NA → 0.5-1 ) [CRAN]
metapod (NA → 1.6.0 ) [CRAN]
bluster (NA → 1.8.0 ) [CRAN]
BiocSingular (NA → 1.14.0 ) [CRAN]
statmod (NA → 1.4.36 ) [CRAN]
edgeR (NA → 3.40.2 ) [CRAN]
scuttle (NA → 1.8.4 ) [CRAN]
gridBase (NA → 0.4-7 ) [CRAN]
rngtools (NA → 1.5.2 ) [CRAN]
pkgmaker (NA → 0.32.2 ) [CRAN]
scran (NA → 1.26.2 ) [CRAN]
NMF (NA → 0.24.0 ) [CRAN]
snowfall (NA → 1.84-6.1) [CRAN]
Installing 27 packages: formatR, futile.options, lambda.r, snow, futile.logger, RcppHNSW, BiocNeighbors, BiocParallel, beachmat, rsvd, ScaledMatrix, sparseMatrixStats, locfit, DelayedMatrixStats, registry, metapod, bluster, BiocSingular, statmod, edgeR, scuttle, gridBase, rngtools, pkgmaker, scran, NMF, snowfall
Warning message:
“dependency ‘Matrix (>= 1.5.0)’ is not available”
Warning message in i.p(…):
“installation of package ‘DelayedMatrixStats’ had non-zero exit status”
Warning message in i.p(…):
“installation of package ‘scuttle’ had non-zero exit status”
Warning message in i.p(…):
“installation of package ‘scran’ had non-zero exit status”
checking for file ‘/tmp/Rtmp6tQ3CV/remotes16b72b1d6904/Danko-Lab-BayesPrism-1ad3e82/BayesPrism/DESCRIPTION’ … OK
preparing ‘BayesPrism’:
checking DESCRIPTION meta-information … OK
checking for LF line-endings in source and make files and shell scripts
checking for empty or unneeded directories
building ‘BayesPrism_2.0.tar.gz’
Warning message in i.p(…):
“installation of package ‘/tmp/Rtmp6tQ3CV/file16b7302681db/BayesPrism_2.0.tar.gz’ had non-zero exit status”
Any clue what could it be ?
Thanks a lot !
A.Coudray
PhD student
Dear Sir or Madam,
Thanks for developing this.
when I run the code:
diff.exp.stat <- get.exp.stat(sc.dat=sc.dat[,colSums(sc.dat>0)>3],
cell.type.labels=cell.type.labels,
cell.state.labels=cell.state.labels,
pseudo.count=0.1,
cell.count.cutoff=50,
n.cores=10 #number of threads )
I got an error with the following:
Error: logical subscript contains NAs
How to solve it? thanks you!
Hi, I'm having trouble understanding the cell states vs cell types. In my data I have identified cell types from multiple samples that come from 2 major groups, let's say disease and control. There is not substantial variability in disease vs control in different cell types, so I'm having trouble understanding how to specify the correct cell state in this case: when examining umap, they cluster by cell type and not by group.
If I specify cell states as group+sample, I will end up getting cell states being spread across cell types understandably:
cell_state endothelial macrophages pericyte ....
disease_1 300 70 0
disease_2 199 50 80
healthy_1 500 130 550
This means that I do not get substantial variability per sample (or source) to elicit distinct cell types. It also means I get the error Error: one or more cell states belong to multiple cell types!
.
Likewise, in the example that you show in the tutorial you could easily have them spread across cell types. How would you recommend addressing this?
Hi devs,
I noticed there is a pretty glaring typo in the arguments of the get.exp.stat
function:
psuedo.count
A numeric value used for log2 transformation. =0.1 for 10x data, =10 for smart-seq. Default=0.1.
It seems like it should be named "pseudo.count" instead, and running the function with the correct spelling raises an unused argument error.
Hi Tinyi,
While running new.prism() function, there is an error : "recommend to have sufficient number of cells in each cell state
Error in new.prism(reference = sc.dat.filtered.pc, mixture = bk.dat, input.type = "count.matrix", :
Error: one or more cell states belong to multiple cell types!"
I have 2 questions:
Many of the datasets I have studied are microarray datasets. Can BayesPrism use for microarray datasets? Thank you very much.
I tried TCGA-GBM and didnt found overlap samples, maybe from another dataset?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.