storeylab / biobroom Goto Github PK

View Code? Open in Web Editor NEW

48.0 48.0 12.0 324 KB

Tidy up computational biology objects

R 100.00%

biobroom's People

Contributors

Stargazers

Watchers

Forkers

lgatto dgrtwo aaronwolen gadenbuie ltobalina llrs bioinfomagician birnbera julia-f-s amcdavid

biobroom's Issues

CRAN Check Failure for Upcoming broom Release

Hi there! The broom dev team just ran reverse dependency checks on the upcoming broom 0.7.0 release and found new errors/test failures for the CRAN version of this package. I've pasted the results below, which seem to result from our decision to no longer export the fix_data_frame() function (for maintainability purposes.)

checking tests ...

 ERROR
Running the tests in ‘tests/testthat.R’ failed.
Last 13 lines of output:
  [1mBacktrace:[22m
  [90m 1. [39mgenerics::tidy(dds)
  [90m 2. [39mbiobroom::tidy.EList(dds)
  [90m 3. [39mbiobroom:::tidy_matrix(x$E)
  [90m 7. [39mbroom::fix_data_frame
  [90m 8. [39mbase::getExportedValue(pkg, name)
  
  ══ testthat results  ═══════════════════════════════════════════════════════════
  [ OK: 33 | SKIPPED: 0 | WARNINGS: 0 | FAILED: 3 ]
  1. Error: limma tidier works as expected (@test-limma_tidiers.R#5) 
  2. Error: voom tidier adds weight column (@test-limma_tidiers.R#26) 
  3. Error: voomWithQualityWeights tidier adds weight and sample.weight columns (@test-limma_tidiers.R#49) 
  
  Error: testthat unit tests failed
  Execution halted

I've pasted the most recently exported function definition below as a place to start from in making the necessary fixes.🙂

fix_data_frame <- function(x, newnames = NULL, newcol = "term") {
  if (!is.null(newnames) && length(newnames) != ncol(x)) {
    stop("newnames must be NULL or have length equal to number of columns")
  }

  if (all(rownames(x) == seq_len(nrow(x)))) {
    # don't need to move rownames into a new column
    ret <- data.frame(x, stringsAsFactors = FALSE)
    if (!is.null(newnames)) {
      colnames(ret) <- newnames
    }
  }
  else {
    ret <- data.frame(
      ...new.col... = rownames(x),
      unrowname(x),
      stringsAsFactors = FALSE
    )
    colnames(ret)[1] <- newcol
    if (!is.null(newnames)) {
      colnames(ret)[-1] <- newnames
    }
  }
  as_tibble(ret)
}

We hope to submit this new version of the package to CRAN in the coming weeks. If you encounter any problems fixing these issues, please feel free to reach out!

edgeR glm/lrt objects

Add option to include rowData similar to addPheno/colData

Write q-value vignette

tbl_df() is deprecated

Running biobroom::tidy on a DeSeq2 object, I saw the warning:

Warning message:
`tbl_df()` is deprecated as of dplyr 1.0.0.
Please use `tibble::as_tibble()` instead.

This was with biobroom v1.20.0, and dplyr v1.0.2.

Ideally, biobroom would be updated to avoid this warning.

Thank you!

Add testthat unit tests

Add sva tidiers

Need to add sva tidiers

Add EDGE tidiers

typo in augment.DGEList leads to error

stops with error
stop("No columns to augment in DGEList")
independently of input.

Change
if (is.null(names(list())))
to
if (is.null(names(ret)))
in line 13 of the function augment.DGEList

Handling columnames with special character

Hi,
Thanks for the really useful package. Sometimes sample names get mangled if they contain special characters, eg:

> data(hammer)
> pData(hammer)
                  sample.id num.tech.reps protocol         strain     Time
SRX020102         SRX020102             1  control Sprague Dawley 2 months
SRX020103         SRX020103             2  control Sprague Dawley 2 months
SRX020104         SRX020104             1   L5 SNL Sprague Dawley 2 months
SRX020105         SRX020105             2   L5 SNL Sprague Dawley  2months
SRX020091-3     SRX020091-3             1  control Sprague Dawley  2 weeks
SRX020088-90   SRX020088-90             2  control Sprague Dawley  2 weeks
SRX020094-7     SRX020094-7             1   L5 SNL Sprague Dawley  2 weeks
SRX020098-101 SRX020098-101             2   L5 SNL Sprague Dawley  2 weeks

> tidy(hammer)
# A tibble: 236,128 x 3
                 gene    sample value
                <chr>     <chr> <int>
1  ENSRNOG00000000001 SRX020102     2
2  ENSRNOG00000000007 SRX020102     4
3  ENSRNOG00000000008 SRX020102     0
4  ENSRNOG00000000009 SRX020102     0
5  ENSRNOG00000000010 SRX020102    19
6  ENSRNOG00000000012 SRX020102     7
7  ENSRNOG00000000014 SRX020102     0
8  ENSRNOG00000000017 SRX020102     4
9  ENSRNOG00000000021 SRX020102     7
10 ENSRNOG00000000024 SRX020102    86
# ... with 236,118 more rows
> pData(hammer) %>% dplyr::filter(grepl('SRX020091',sample.id))
    sample.id num.tech.reps protocol         strain    Time
1 SRX020091-3             1  control Sprague Dawley 2 weeks

> tidy(hammer) %>% dplyr::filter(grepl('SRX020091',sample))
# A tibble: 29,516 x 3
                 gene      sample value
                <chr>       <chr> <int>
1  ENSRNOG00000000001 SRX020091.3     7
2  ENSRNOG00000000007 SRX020091.3     5
3  ENSRNOG00000000008 SRX020091.3     0
4  ENSRNOG00000000009 SRX020091.3     0
5  ENSRNOG00000000010 SRX020091.3    50
6  ENSRNOG00000000012 SRX020091.3    31
7  ENSRNOG00000000014 SRX020091.3     0
8  ENSRNOG00000000017 SRX020091.3    21
9  ENSRNOG00000000021 SRX020091.3    30
10 ENSRNOG00000000024 SRX020091.3   257
# ... with 29,506 more rows

Add GRangesList

unitdy() function

Hi,

would you be able to easily add a untidy() function, which reverts the tidy object back to original formatting including any changes made to the tidy version ?

Smth like:
edgeR_oject_tidy <- edgeR_oject %>% tidy()
edgeR_oject <- edgeR_oject_tidy %>% untidy()

I do often get into the situation that I have to jump between formatting, as the package functions need the base formatting.

Cheers
Jakob

Write introductory vignette

tidy on limma fit does not contain log2FC

Thanks for the great package! Could you include the fold change into the table resulting from tidy called on a limma ebays fit?

Add fasta format tidier?

I've been using a broom-style function to tidy seqinr::read.fasta objects. Would there be any interest in adding this to biobroom if I do a pull request?

read_fasta <- function(fasta_filename, annot = FALSE){
    fasta <- seqinr::read.fasta(fasta_filename, as.string = TRUE)

    # Convert seqinr SeqFastadna object to data.frame
    fasta_df <- fasta %>%
                   sapply(function(x){x[1:length(x)]}) %>%
                   as.data.frame %>%
                   broom::fix_data_frame(newcol = "ID", newnames = "Sequence")

    if(annot == TRUE){
        annot_df <- getAnnot(fasta) %>%
                         sapply(function(x){x[1:length(x)]}) %>%
                         as.data.frame() %>%
                         broom::fix_data_frame(newnames = "Annot")

        fasta_df <- cbind(fasta_df, annot_df)
    }
    return(fasta_df)
}
read_fasta('https://www.uniprot.org/uniprot/?query=PGH1&format=fasta&limit=10')

https://gist.github.com/clairemcwhite/a5e889f6192a664be45c0226d0ab5813

`tidy.DESeqTransform` method

Hi,

(firstly, thanks a lot for such a convenient package!)

I was wondering what your view is on having a tidy() method for DESeqTransform objects (coming from rlog() and varianceStabilizingTransform() functions?

Here's a gist with one:
https://gist.github.com/tavareshugo/3973461a7daf8a43e65e3566d5deed14

So, this should work:

# load libraries
library(DESeq2)
library(biobroom)
library(magrittr)

# Source gist
devtools::source_gist("3973461a7daf8a43e65e3566d5deed14", filename = "tidy_DESeqTransform.R")

# Example
dds <- makeExampleDESeqDataSet(betaSD = 1)

# transformations
vst_norm <- varianceStabilizingTransformation(dds)
rlog_norm <- rlog(dds)

# tidying
tidy(vst_norm)
tidy(vst_norm, colData = TRUE)
tidy(rlog_norm)
tidy(rlog_norm, colData = TRUE)

I'm happy to fork and submit a pull request, if you think something along these lines is worth it.

Write differential expression vignette

Tidy a vcf?

Too easy?

Standardized location for dplyr-like methods on SE objects

The scater package implements a number of dplyr verbs for SingleCellExperiment objects, e.g., mutate. I have been trying to get rid of these functions for a while, and I was wondering whether biobroom would be a better home for them (once generalized to work on SummarizedExperiment objects).

This would be a win-win for all of us. For tidyverse/BioC users, who no longer have to put up with masking issues (alanocallaghan/scater#74); for biobroom, by adding and centralizing functionality relating to tidyverse/BioC integration; and for me, who no longer has to maintain these verbs that I never use.

Let me know if this is of interest - I am willing to put in a PR.

storeylab / biobroom Goto Github PK

biobroom's People

Contributors

Stargazers

Watchers

Forkers

biobroom's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs