GithubHelp home page GithubHelp logo

stephaniehicks / methylcc Goto Github PK

View Code? Open in Web Editor NEW
19.0 6.0 6.0 9.07 MB

R/BioC package to estimate the cell composition of whole blood in DNA methylation samples in microarray or sequencing platforms

R 100.00%

methylcc's Introduction

methylCC

Citation

Hicks SC, Irizarry RA. (2019). Genome Biology 20, 261. https://doi.org/10.1186/s13059-019-1827-8

Why use methylCC?

This is a package to estimate the cell composition of whole blood in DNA methylation measured on any platform technology (e.g. Illumina 450K microarray, whole genome bisulfite sequencing (WGBS) and reduced representation bisulfite sequencing (RRBS)).

For help with the methylCC R-package, there is a vignette available in the /vignettes folder.

Installing methylCC

The R-package methylCC can be installed from Github using the R package devtools:

Use to install the latest version of methylCC from Github:

library(devtools)
install_github("stephaniehicks/methylCC")

It can also be installed using Bioconductor:

# install BiocManager from CRAN (if not already installed)
if (!requireNamespace("BiocManager", quietly=TRUE))
    install.packages("BiocManager")

# install methylCC package
BiocManager::install("methylCC")

After installation, the package can be loaded into R.

library(methylCC)

Bug reports

Report bugs as issues on the GitHub repository

Authors

methylcc's People

Contributors

hpages avatar j-lawson avatar jwokaty avatar nturaga avatar stephaniehicks avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

methylcc's Issues

Error in `colnames<-`(`*tmp*`, value = cell_levels) : attempt to set 'colnames' on an object with less than two dimensions

Hi,
An issue with estimatecc{methylCC} that I encountered while running with my BSseq object.

est <- estimatecc(BSseq_obj, include_dmrs = FALSE)
Error in colnames<-(*tmp*, value = cell_levels) :
attempt to set 'colnames' on an object with less than two dimensions

Any comments or suggestions are truly appreciated!
Lastly, once we get the cell compositions, could methylCC do cell-specific adjustment on the original WGBS data and return an adjusted WGBS data?
Thanks very much in advance!

methyICC for WGBS data of tissue samples

Hi, thanks for your excellent work :)
This package is designed to estimate the cell composition of whole blood. Can I apply methyICC to esitmate immune cell fractions of WGBS data from tumor tissues?
Thank you so much!

s/compote/methylCC/g;

Vignette was confusing for a moment ("where is this compote you speak of?") but then it clicked...

How come objects must be eSet-derived instead of SummarizedExperiment-derived? Much larger datasets can be handled with the latter (e.g. BSseq objects subclass SE, if memory serves, and Peter Hickey is merrily bolting on out-of-core storage for them).

inconsistent results

Hi,
I did the same analysis on two datasets, even with the same seed. One dataset contains the other dataset. However, their results are different, with only a 15% correlation. Does that mean that the analysis is related to the sample distribution or confounder environment?

Cannot successfully run the script of test-estimatecc.R

Dear Dr. Hicks,
I have installed the methylCC Package with version of 1.2.0 under R 4.0.0. However, I cannot successfully run the test-estimatecc.R at
https://github.com/stephaniehicks/methylCC/blob/master/tests/testthat/test-estimatecc.R
The issue here is the following:

FlowSorted.Blood.450k.sub
class: RGChannelSet
dim: 150000 6
metadata(0):
assays(2): Green Red
rownames(150000): 58715388 43754481 ... 60676484 63741342
rowData names(0):
colnames(6): WB_105 WB_218 ... WB_160 WB_149
colData names(8): Sample_Name Slide ... CellType Sex
Annotation
array: IlluminaHumanMethylation450k
annotation: ilmn12.hg19
set.seed(12345)
est <- estimatecc(object = FlowSorted.Blood.450k.sub)
Loading required package: IlluminaHumanMethylation450kmanifest

*** caught segfault ***
address (nil), cause 'unknown'

Traceback:
1: colMeans2(Green, rows = match(CG.controls, rownames(Green)))
2: colMeans2(Green, rows = match(CG.controls, rownames(Green)))
3: normalize.illumina.control(rgSet, reference = reference)
4: preprocessIllumina(FlowSorted.Blood.450k)
5: .find_dmrs(verbose = verbose, include_cpgs = include_cpgs, include_dmrs = include_dmrs)
6: withCallingHandlers(expr, warning = function(w) if (inherits(w, classes)) tryInvokeRestart("muffleWarning"))
7: suppressWarnings(.find_dmrs(verbose = verbose, include_cpgs = include_cpgs, include_dmrs = include_dmrs))
8: estimatecc(object = FlowSorted.Blood.450k.sub)

Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace
Selection:

What could be the reason for “segfault”?
I look forward to your reply.

Best regards,

Recommendations for low coverage (5x) WGBS data?

I'm encountering the following error with methylCC::estimatecc() for a large number of samples (86) from a low coverage (~5X) WGBS dataset and was wondering if you had any recommendations.

I've tried two strategies with the following call so far:

CC <- bs.filtered.bsseq %>%
   methylCC::estimatecc(include_cpgs = TRUE,
                        include_dmrs = TRUE)

The first strategy I tried is to filter for 1 read in every sample, which gives 3.9 million CpGs but that doesn't seem to be enough sites (I also get the same error when trying with an analysis specific subset with 5.5 million CpGs):

Loading required package: IlluminaHumanMethylation450kanno.ilmn12.hg19
[estimatecc] Searching for Gran cell type-specific
                regions and CpGs.
[bumphunterEngine] Using a single core (backend: doSEQ, version: 1.4.8).
[bumphunterEngine] Computing coefficients.
[bumphunterEngine] Finding regions.
[bumphunterEngine] Found 13875 bumps.
[estimatecc] Found 100 Gran cell type-specific regions and CpGs.
[estimatecc] Searching for CD4T cell type-specific
                regions and CpGs.
[bumphunterEngine] Using a single core (backend: doSEQ, version: 1.4.8).
[bumphunterEngine] Computing coefficients.
[bumphunterEngine] Finding regions.
[bumphunterEngine] Found 8882 bumps.
[estimatecc] Found 100 CD4T cell type-specific regions and CpGs.
[estimatecc] Searching for CD8T cell type-specific
                regions and CpGs.
[bumphunterEngine] Using a single core (backend: doSEQ, version: 1.4.8).
[bumphunterEngine] Computing coefficients.
[bumphunterEngine] Finding regions.
[bumphunterEngine] Found 9497 bumps.
[estimatecc] Found 100 CD8T cell type-specific regions and CpGs.
[estimatecc] Searching for Bcell cell type-specific
                regions and CpGs.
[bumphunterEngine] Using a single core (backend: doSEQ, version: 1.4.8).
[bumphunterEngine] Computing coefficients.
[bumphunterEngine] Finding regions.
[bumphunterEngine] Found 9142 bumps.
[estimatecc] Found 100 Bcell cell type-specific regions and CpGs.
[estimatecc] Searching for Mono cell type-specific
                regions and CpGs.
[bumphunterEngine] Using a single core (backend: doSEQ, version: 1.4.8).
[bumphunterEngine] Computing coefficients.
[bumphunterEngine] Finding regions.
[bumphunterEngine] Found 12948 bumps.
[estimatecc] Found 100 Mono cell type-specific regions and CpGs.
[estimatecc] Searching for NK cell type-specific
                regions and CpGs.
[bumphunterEngine] Using a single core (backend: doSEQ, version: 1.4.8).
[bumphunterEngine] Computing coefficients.
[bumphunterEngine] Finding regions.
[bumphunterEngine] Found 6178 bumps.
[estimatecc] Found 100 NK cell type-specific regions and CpGs.
[estimatecc] Extracting BSSeq data.
[estimatecc] BSseq object contained
                15 out of 600 celltype-specific regions.
[estimatecc] Starting parameter estimation using 15 regions.
[estimatecc] There are not a sufficient number of differentially
  methylated regions (DMRs) for cell composition estimation. Try
  including both differntially methylated CpGs and DMRs by modifying
  the estimatecc(include_cpgs=TRUE, include_dmrs=TRUE) function.
Error in methylCC::estimatecc(., include_cpgs = TRUE, include_dmrs = TRUE) :
  Exiting the estimation now.

The second strategy I tried is to filter for 1 read in 75% of samples (in this case a subset containing 53 samples), which is what I use in the dmrseq::dmrseq() analysis, and it gives 24.6 million CpGs but produces the following error (which I'm guessing may be due to missing data):

[estimatecc] Searching for Gran cell type-specific 
                regions and CpGs.
[bumphunterEngine] Using a single core (backend: doSEQ, version: 1.4.8).
[bumphunterEngine] Computing coefficients.
[bumphunterEngine] Finding regions.
[bumphunterEngine] Found 13875 bumps.
[estimatecc] Found 100 Gran cell type-specific regions and CpGs.
[estimatecc] Searching for CD4T cell type-specific 
                regions and CpGs.
[bumphunterEngine] Using a single core (backend: doSEQ, version: 1.4.8).
[bumphunterEngine] Computing coefficients.
[bumphunterEngine] Finding regions.
[bumphunterEngine] Found 8882 bumps.
[estimatecc] Found 100 CD4T cell type-specific regions and CpGs.
[estimatecc] Searching for CD8T cell type-specific 
                regions and CpGs.
[bumphunterEngine] Using a single core (backend: doSEQ, version: 1.4.8).
[bumphunterEngine] Computing coefficients.
[bumphunterEngine] Finding regions.
[bumphunterEngine] Found 9497 bumps.
[estimatecc] Found 100 CD8T cell type-specific regions and CpGs.
[estimatecc] Searching for Bcell cell type-specific 
                regions and CpGs.
[bumphunterEngine] Using a single core (backend: doSEQ, version: 1.4.8).
[bumphunterEngine] Computing coefficients.
[bumphunterEngine] Finding regions.
[bumphunterEngine] Found 9142 bumps.
[estimatecc] Found 100 Bcell cell type-specific regions and CpGs.
[estimatecc] Searching for Mono cell type-specific 
                regions and CpGs.
[bumphunterEngine] Using a single core (backend: doSEQ, version: 1.4.8).
[bumphunterEngine] Computing coefficients.
[bumphunterEngine] Finding regions.
[bumphunterEngine] Found 12948 bumps.
[estimatecc] Found 100 Mono cell type-specific regions and CpGs.
[estimatecc] Searching for NK cell type-specific 
                regions and CpGs.
[bumphunterEngine] Using a single core (backend: doSEQ, version: 1.4.8).
[bumphunterEngine] Computing coefficients.
[bumphunterEngine] Finding regions.
[bumphunterEngine] Found 6178 bumps.
[estimatecc] Found 100 NK cell type-specific regions and CpGs.
[estimatecc] Extracting BSSeq data.
[estimatecc] BSseq object contained
                69 out of 600 celltype-specific regions.
[estimatecc] Starting parameter estimation using 69 regions.
Error in solve.QP(Dmat = (t(x0) %*% x0), dvec = t(x0) %*% t(t(ys)), Amat = cbind(rep(1,  : 
  matrix D in quadratic function is not positive definite!

BSseq object contained 130 out of 600 celltype-specific regions.

Dear Stephanie,

Thank you so much for creating this package. Im applying it on RRBS data in blood samples, coverage is on average around 10-15x. The steps i take involve inputing my data on a bsseq object to make it compatible with the package and then follow the steps that Ben Lauger recommended in his previous issue ( convert to hg18, use 1 bsseq object /file to minimise issues with coverage across different samples) but it seems that only get 130 (or so depending on the sample) out of 600 regions and therefore my estimates don't seem to be accurate. See below a couple of examples. Do you have any other ideas/recommendations on how to improve the efficiency of this? The issue seems to be with underestimating the presence of granulocytes in RRBS as i get very low counts (similar to what you present in your paper when comparing to houseman algorhithm). It could also be just a coverage issue and with RRBS i dont get enough coverage in the relevant areas.

My estimated counts look like:

        Gran      CD4T      CD8T     Bcell         Mono NK

test1 0.06322873 0.3599262 0.4635277 0.1133166 8.151731e-07 0

       Gran      CD4T     CD8T     Bcell        Mono         NK

test1 0.1281388 0.3148228 0.347911 0.1606397 0.007710417 0.04077731

     Gran      CD4T CD8T     Bcell         Mono           NK

test1 0.167842 0.6489066 0 0.1832514 3.066225e-14 1.291246e-14

I have also notived that if i run the same command with the same file more than once i get slightly different results which im not sure why is the case? Is that to be expected or does the fairly low detection of cell-type specific regions impact on that?

Thank you so much in advance for your help,
Best wishes, Leo

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.