GithubHelp home page GithubHelp logo

batchqc's Introduction

BatchQC: Batch Effects Quality Control

Sequencing and microarray samples often are collected or processed in multiple batches or at different times. This often produces technical biases that can lead to incorrect results in the downstream analysis. BatchQC is a software tool that streamlines batch preprocessing and evaluation by providing interactive diagnostics, visualizations, and statistical analyses to explore the extent to which batch variation impacts the data. BatchQC diagnostics help determine whether batch adjustment needs to be done, and how correction should be applied before proceeding with a downstream analysis. Moreover, BatchQC interactively applies multiple common batch effect approaches to the data, and the user can quickly see the benefits of each method. BatchQC is developed as a Shiny App. The output is organized into multiple tabs, and each tab features an important part of the batch effect analysis and visualization of the data. The BatchQC interface has the following analysis groups: Summary, Differential Expression, Median Correlations, Heatmaps, Circular Dendrogram, PCA Analysis, Shape, ComBat and SVA.

The package includes:

  1. Summary and Sample Diagnostics
  2. Differential Expression Plots and Analysis using LIMMA
  3. Principal Component Analysis and plots to check batch effects
  4. Heatmap plot of gene expressions
  5. Median Correlation Plot
  6. Circular Dendrogram clustered and colored by batch and condition
  7. Shape Analysis for the distribution curve based on HTShape package
  8. Batch Adjustment using ComBat
  9. Surrogate Variable Analysis using sva package
  10. Function to generate simulated RNA-Seq data

batchQC is the pipeline function that generates the BatchQC report and launches the Shiny App in interactive mode. It combines all the functions into one step.

Installation

To begin, install Bioconductor and simply run the following to automatically install BatchQC and all the dependencies, except pandoc, which you have to manually install as follows.

if (!requireNamespace("BiocManager", quietly=TRUE))
    install.packages("BiocManager")
BiocManager::install("BatchQC")

Install 'pandoc' package by following the instructions at the following URL: http://pandoc.org/installing.html

Rstudio also provides pandoc binaries at the following location for Windows, Linux and Mac: https://s3.amazonaws.com/rstudio-buildtools/pandoc-1.13.1.zip

If all went well you should now be able to load BatchQC:

require(BatchQC)
browseVignettes("BatchQC")
vignette('BatchQCIntro', package='BatchQC')
vignette('BatchQC_examples', package='BatchQC')
vignette('BatchQC_usage_advanced', package='BatchQC')

If you want to install the latest development version of BatchQC from Github, Use devtools to install it as follows:

require(devtools)
install_github("mani2012/BatchQC", build_vignettes=TRUE)

If you want to manually install the BatchQC dependencies, run the following:

if (!requireNamespace("BiocManager", quietly=TRUE))
    install.packages("BiocManager")
BiocManager::install(c('MCMCpack', 'limma', 'preprocessCore', 'sva', 'devtools', 'corpcor', 
'matrixStats', 'shiny', 'ggvis', 'd3heatmap', 'reshape2', 'moments', 
'rmarkdown', 'pander', 'gplots'))

Troubleshooting with Installation

Please make sure you have installed pandoc by following the instructions from http://pandoc.org/installing.html. Otherwise, you may get an error such as the following:

* creating vignettes ... ERROR
Warning in engine$weave(file, quiet = quiet, encoding = enc) :
  Pandoc (>= 1.12.3) and/or pandoc-citeproc is not available. Please install 
  both.
Error: processing vignette 'BatchQCIntro.Rmd' failed with diagnostics:
It seems you should call rmarkdown::render() instead of knitr::knit2html() 
because BatchQCIntro.Rmd appears to be an R Markdown v2 document.
Execution halted
Error: Command failed (1)

For generating pdf vignettes in Linux, you need to install texlive and lmodern as follows:

sudo apt-get install texlive
sudo apt-get install lmodern

If you do not have permissions to install in the default location for R, you may have to setup local directory. You may also want to load a version of R 3.3.0 or higher.

export R_LIBS="/my_own_local_directory/R_libs"
module load R/R-3.3.0

And do something like the following

install.packages("devtools", repos="http://cran.r-project.org", 
lib="/my_own_local_directory/R_libs")

If you get an error "X11 font -adobe-helvetica-%s-%s---%d-------*, face 1 at size 6 could not be loaded" in your browser, when you view some plots, this could be due to a missing fonts setup in the OS. This issue has been discussed at the following link: http://goo.gl/ukQXMI

The fonts installed can be found by typing the command ‘xlsfonts’ from bash and installing (or re-installing) xorg-x11-fonts for 75 and 100 dpi as mentioned in the link above could do the trick to resolve this issue.

batchqc's People

Contributors

claireruberman avatar dtenenba avatar hcorrada avatar hpages avatar link-ny avatar mani2012 avatar nturaga avatar selbyh avatar tfaits avatar vobencha avatar wevanjohnson avatar zhangyuqing avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

batchqc's Issues

Idle r shiny

I have quite a big matrix around 200*60000 after run combat and SVA then my monitor got stuck like this as attached. Any suggestions/comments?
Screen Shot 2020-04-09 at 5 52 33 PM

Error running analyses in the shiny app

Hi,

This looks like a really nice package, but I'm having some trouble even running the examples in the interactive mode: https://bioconductor.org/packages/release/bioc/vignettes/BatchQC/inst/doc/BatchQCIntro.html. Running the first example (Simulate data and Apply BatchQC), the app opens, but then if I try to apply ComBat or SVA, it fails to show the output, with the error shown in the shiny window: "Error: argument "p" is missing, with no default".

The RStudio console reads as follows:

Quitting from lines 105-115 (batchqc_report.Rmd)

Warning: Error in graphics::layout: argument "p" is missing, with no default
163: graphics::layout
162: layout.matrix
142: eventReactiveHandler [/usr/local/lib/R/4.0/site-library/BatchQC/shiny/BatchQC/server.R#1013]
98: combatOutText
97: renderText [/usr/local/lib/R/4.0/site-library/BatchQC/shiny/BatchQC/server.R#1021]
96: func
83: origRenderFunc
82: output$combatOutText
2: shiny::runApp
1: batchQC

I realize this may be something about my installation/setup but I have no idea how to fix it and I'd really like to use the app. Any advice is appreciated. Thanks!

sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-apple-darwin18.7.0 (64-bit)
Running under: macOS Mojave 10.14.6

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /usr/local/Cellar/openblas/0.3.10_1/lib/libopenblasp-r0.3.10.dylib

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] parallel stats graphics grDevices utils datasets methods base

other attached packages:
[1] sva_3.36.0 BiocParallel_1.22.0 genefilter_1.70.0 mgcv_1.8-33
[5] nlme_3.1-149 Matrix_1.2-18 limma_3.44.3 reshape2_1.4.4
[9] heatmaply_1.1.1 viridis_0.5.1 viridisLite_0.3.0 plotly_4.9.2.1
[13] ggplot2_3.3.2 ggvis_0.4.6 pander_0.6.3 shiny_1.5.0
[17] BatchQC_1.16.2

loaded via a namespace (and not attached):
[1] BiocFileCache_1.12.1 plyr_1.8.6
[3] lazyeval_0.2.2 splines_4.0.2
[5] GenomeInfoDb_1.24.2 digest_0.6.25
[7] foreach_1.5.0 htmltools_0.5.0
[9] gdata_2.18.0 magrittr_1.5
[11] memoise_1.1.0 Biostrings_2.56.0
[13] annotate_1.66.0 matrixStats_0.56.0
[15] MCMCpack_1.4-9 askpass_1.1
[17] prettyunits_1.1.1 colorspace_1.4-1
[19] blob_1.2.1 rappdirs_0.3.1
[21] xfun_0.17 dplyr_1.0.2
[23] crayon_1.3.4 RCurl_1.98-1.2
[25] jsonlite_1.7.1 survival_3.2-3
[27] iterators_1.0.12 glue_1.4.2
[29] registry_0.5-1 gtable_0.3.0
[31] zlibbioc_1.34.0 XVector_0.28.0
[33] webshot_0.5.2 MatrixModels_0.4-1
[35] DelayedArray_0.14.1 BiocGenerics_0.34.0
[37] SparseM_1.78 scales_1.1.1
[39] DBI_1.1.0 edgeR_3.30.3
[41] TxDb.Mmusculus.UCSC.mm9.knownGene_3.2.2 Rcpp_1.0.5
[43] xtable_1.8-4 progress_1.2.2
[45] bit_4.0.4 deSolve_1.28
[47] stats4_4.0.2 htmlwidgets_1.5.1
[49] httr_1.4.2 gplots_3.1.0
[51] RColorBrewer_1.1-2 ellipsis_0.3.1
[53] pkgconfig_2.0.3 XML_3.99-0.5
[55] dbplyr_1.4.4 locfit_1.5-9.4
[57] tidyselect_1.1.0 rlang_0.4.7
[59] later_1.1.0.1 AnnotationDbi_1.50.3
[61] munsell_0.5.0 tools_4.0.2
[63] generics_0.0.2 moments_0.14
[65] RSQLite_2.2.0 evaluate_0.14
[67] stringr_1.4.0 fastmap_1.0.1
[69] yaml_2.2.1 mcmc_0.9-7
[71] knitr_1.29 bit64_4.0.2
[73] caTools_1.18.0 plgem_1.60.0
[75] purrr_0.3.4 dendextend_1.14.0
[77] rootSolve_1.8.2.1 mime_0.9
[79] quantreg_5.73 biomaRt_2.44.1
[81] compiler_4.0.2 rstudioapi_0.11
[83] curl_4.3 tibble_3.0.3
[85] geneplotter_1.66.0 stringi_1.4.6
[87] GenomicFeatures_1.40.1 lattice_0.20-41
[89] vctrs_0.3.3 pillar_1.4.6
[91] lifecycle_0.2.0 data.table_1.13.0
[93] bitops_1.0-6 conquer_1.0.2
[95] corpcor_1.6.9 seriation_1.2-9
[97] httpuv_1.5.4 rtracklayer_1.48.0
[99] GenomicRanges_1.40.0 R6_2.4.1
[101] promises_1.1.1 TSP_1.1-10
[103] KernSmooth_2.23-17 gridExtra_2.3
[105] IRanges_2.22.2 codetools_0.2-16
[107] MASS_7.3-52 gtools_3.8.2
[109] assertthat_0.2.1 SummarizedExperiment_1.18.2
[111] openssl_1.4.2 DESeq2_1.28.1
[113] withr_2.2.0 GenomicAlignments_1.24.0
[115] Rsamtools_2.4.0 S4Vectors_0.26.1
[117] GenomeInfoDbData_1.2.3 hms_0.5.3
[119] grid_4.0.2 tidyr_1.1.2
[121] coda_0.19-4 rmarkdown_2.3
[123] pROC_1.16.2 Biobase_2.48.0

data is always log2cpm transformed in correlation.R

from looking at the source code, I am under the impression that data is always transformed to log2cpm when performing the correlation analyses in correlation.R even when log2cpm_transform = FALSE

I was using batchQC on microarray data and my data are already log transformed. Though subtle, the additional log transformation does affect the median pairwise correlations

using batchqc_corscatter - matches output from from batchQC(..., log2transfrom = FALSE)
image

modified batchqc_corscatter to not log2 CPM transform
image

Error in quantile.default(med_cor, p = 0.25)--missing values and NaNs not allowed...

Hi,

I installed batchQC from BIoconductor (today) and ran 2 of the examples--worked very well.

I'm trying with my own data, and got this error (before GUI activated):
Quitting from lines 178-179 (batchqc_report.Rmd)
Error in quantile.default(med_cor, p = 0.25) :
missing values and NaN's not allowed if 'na.rm' is FALSE

No NAs or NaNs in the expression data, batch or condition objects...
condition & batch vectors are type character.
Expression data has 189 samples & ~ 16K features (features z-scored)

table(condition,batch,useNA='always')
batch
condition 2014 2016 2017 2018 2020
25minus 5 1 7 9 6 0
26_31 0 4 8 4 1 0
32_37 6 1 17 54 22 0
38plus 5 1 10 19 9 0
0 0 0 0 0 0

Any thoughts on getting past this error?

Thanks much!

limma list in batchqc_report.html defaults to ComBat corrected version

When I analyze the bladderdata sample code (below) the limma data are the same in the uncorrected (batchqc_report.html) and the ComBat corrected (combat_batchqc_report.html) reports, and both match the ComBat corrected limma data displayed in the Shiny app, as can be seen in these screenshots (https://drive.google.com/file/d/1kjTFfem_pXqLneKxldAEsfylEtR_v2_b/view?usp=sharing).

Is there a way to fix this?

Thanks
Josh

R code used:

pheno <- pData(bladderEset)
edata <- exprs(bladderEset)
batch <- pheno$batch
condition <- pheno$cancer
batchQC(
  edata,
  batch = batch,
  condition = condition,
  report_file = "batchqc_report.html",
  report_dir = ".",
  report_option_binary = "111111111",
  view_report = FALSE,
  interactive = TRUE
)

Download corrected data

Hi,

Thanks for sharing this wonderful package. I was wondering if I could download the combat corrected expression matrix from the shiny page.

Thanks!

Error in if (spvaltext2 == 0) { : missing value where TRUE/FALSE needed

Hi,

I have been trying to make BatchQC work for the past two days to no avail. I keep getting the below error:

  Quitting from lines 256-274 (batchqc_report.Rmd) 
  Error in if (spvaltext2 == 0) { : missing value where TRUE/FALSE needed

Having a closer look the problem seemed to appear in lines 263-264:

pval <- batchQC_shapeVariation(lcounts_adj, batch, plot = TRUE, groupCol = 
     rainbow(nlevels(bf))[bf])

Inside the batchQC_shapeVariation function, I tried to narrow down the problem to see where it occurs. My findings were that in line 34 (batch_ps <- batchEffectPvalue(gnormdata, sortgroups, robust=robustGene)) the function batchEffectPvalue returns the below:

 batch_ps      Named num [1:4] 0 0 NaN NaN

These two NaNs are producing the problem since the NaN in the if (spvaltext2 == 0) { cannot give TRUE or FALSE.

Inside the function batchEffectPvalue everything seems to run smoothly until we reach the

  skewbatch <- unlist(lapply(1:length(batch2), function(x) apply(data[,batch2[[x]]], 1, skewness)))
  kurtbatch <- unlist(lapply(1:length(batch2), function(x) apply(data[,batch2[[x]]], 1, kurtosis)))

By having a look at the skewbatch and kurtbatch objects I saw that there are some NaNs present. I believe that this is causing the problem downstream.

Now, I don't know whether this is a problem of skewness and kurtosis functions or is a problem with my data. I tried both raw counts and quantile normalized read counts (as suggested by you). I also filtered the quantile normalized counts for low standard deviation and made sure that none of my batches contain genes with only zeroes (as suggested in your website). I don't know what else I can do. I even thought of adding a 0 in the report_option_binary option to skip the creation of this graph but I am not sure about that since I might be leaving out an important part of the batchQC analysis.

Can you please help me?

Best regards,
Lefteris

PhD candidate, Newcastle University, UK

Running errors

Hi,

I tried to run the first example from the vignettes BatchQC Examples

library(BatchQC)
nbatch <- 3
ncond <- 2
npercond <- 10
data.matrix <- rnaseq_sim(ngenes=50, nbatch=nbatch, ncond=ncond, npercond=
    npercond, basemean=10000, ggstep=50, bbstep=2000, ccstep=800, 
    basedisp=100, bdispstep=-10, swvar=1000, seed=1234)
batch <- rep(1:nbatch, each=ncond*npercond)
condition <- rep(rep(1:ncond, each=npercond), nbatch)
batchQC(data.matrix, batch=batch, condition=condition, 
        report_file="batchqc_report.html", report_dir=".", 
        report_option_binary="111111111",
        view_report=FALSE, interactive=TRUE, batchqc_output=TRUE)

I got the error like:

Error in file(con, "w") : cannot open the connection
In addition: Warning message:
In file(con, "w") :
  cannot open file 'batchqc_report.knit.md': Permission denied

My running environment is:

R version 3.6.1 (2019-07-05)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: Fedora 31 (Thirty One)

Matrix products: default
BLAS/LAPACK: /usr/lib64/R/lib/libRblas.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets 
[6] methods   base     

other attached packages:
[1] BatchQC_1.13.1

loaded via a namespace (and not attached):
 [1] Biobase_2.44.0       splines_3.6.1       
 [3] bit64_0.9-7          gtools_3.8.1        
 [5] shiny_1.4.0          moments_0.14        
 [7] assertthat_0.2.1     stats4_3.6.1        
 [9] pander_0.6.3         blob_1.2.0          
[11] yaml_2.2.0           backports_1.1.5     
[13] pillar_1.4.2         RSQLite_2.1.2       
[15] lattice_0.20-38      quantreg_5.51       
[17] glue_1.3.1           limma_3.40.6        
[19] digest_0.6.22        promises_1.1.0      
[21] htmltools_0.4.0      httpuv_1.5.2        
[23] Matrix_1.2-17        plyr_1.8.4          
[25] XML_3.98-1.20        pkgconfig_2.0.3     
[27] SparseM_1.77         genefilter_1.66.0   
[29] purrr_0.3.3          xtable_1.8-4        
[31] corpcor_1.6.9        gdata_2.18.0        
[33] later_1.0.0          BiocParallel_1.18.1 
[35] MatrixModels_0.4-1   tibble_2.1.3        
[37] annotate_1.62.0      mgcv_1.8-30         
[39] IRanges_2.18.3       ggvis_0.4.5         
[41] BiocGenerics_0.30.0  survival_3.1-6      
[43] magrittr_1.5         crayon_1.3.4        
[45] mime_0.7             memoise_1.1.0       
[47] mcmc_0.9-6           evaluate_0.14       
[49] nlme_3.1-141         MASS_7.3-51.4       
[51] gplots_3.0.1.1       tools_3.6.1         
[53] matrixStats_0.55.0   stringr_1.4.0       
[55] MCMCpack_1.4-4       S4Vectors_0.22.1    
[57] AnnotationDbi_1.46.1 compiler_3.6.1      
[59] caTools_1.17.1.2     rlang_0.4.1         
[61] grid_3.6.1           RCurl_1.95-4.12     
[63] rstudioapi_0.10      htmlwidgets_1.5.1   
[65] bitops_1.0-6         base64enc_0.1-3     
[67] rmarkdown_1.16       d3heatmap_0.6.1.2   
[69] DBI_1.0.0            reshape2_1.4.3      
[71] R6_2.4.0             knitr_1.25          
[73] dplyr_0.8.3          fastmap_1.0.1       
[75] bit_1.1-14           zeallot_0.1.0       
[77] KernSmooth_2.23-16   stringi_1.4.3       
[79] parallel_3.6.1       sva_3.32.1          
[81] Rcpp_1.0.2           vctrs_0.2.0         
[83] png_0.1-7            tidyselect_0.2.5    
[85] xfun_0.10            coda_0.19-3  

PCA plot tooltips show incorrect values with ComBat and SVA

Hi,

I noticed the tooltips on PCA plot show incorrect values for chosen PCs when ComBat or SVA adjusted data is selected:
(EDIT: I just realized that the tooltip shows the PCs from None adjusted data even when ComBat or SVA is selected.)

image

This was generated with an example data but I get the same issue with my own data.

library(BatchQC)
data(example_batchqc_data)
batch <- batch_indicator$V1
condition <- batch_indicator$V2
batchQC(signature_data, batch=batch, condition=condition)

I'm using BatchQC_1.8.0 and can send a full session info if you can't replicate it.

(just a personal preference: A tooltip that shows "condition" in stead of "PCs" might be more helpful when there are numerous conditions.)

Also, thanks a lot for this tool.

Unmet Dependencies

I just upgraded R, and I'm reinstalling a bunch of packages. The install is R 3.3.1

I installed BatchQC via devtools, which should have installed all dependencies, but a bunch were missing that I had to install by hand:

  • mcmc
  • coda
  • SparseM
  • MatrixModels

I'm not totally up on R package specification, but it seems like either the configuration is missing an option to install dependencies of dependencies, or else these dependencies should be included explicitly.

@wevanjohnson It might be a good idea to set up a virtual machine with a build server that tries to install BatchQC (and other CBM packages?) every time a commit is pushed to Github. That way we can ensure packages are easy to install for anyone, regardless of environment.

Error in if (var(data.matrix[i, cond2[[j]]]) == 0) { : missing value where TRUE/FALSE needed

Hello, thanks for the great package. I successfully went through the demo but am having some trouble with my data.

batchQC(counts, batch = batch, condition = cond, interactive = TRUE)
Error in if (var(data.matrix[i, cond2[[j]]]) == 0) { :
missing value where TRUE/FALSE needed

One clue is that my many of my conditions have only one replicate. Does this only work if there are multiple reps per condition (for all conditions)?
Thanks so much.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.