The coxpresdbr from russhyde

rewrite: drop mutate_, filter_, group_by_ etc

dplyr 0.8.3 throws deprecation warnings for many NSE functions from earlier versions of dplyr

correlation filtering is futile in modern coxpresdb databases => drop it

Write function to print a data-frame based copy of the CoxpresDB dataset to a file

`get_coex_partners` is really slow for 20k genes from a CoxpresDbDataframeAccessor

This should be really fast.

Could be implemented as:

extract_dataframe(importer) %>%
  group_by(source_id) %>%
  top_n(-how_many_are_required, mutual_rank) %>%
  ungroup()

It's currently mapping over each source gene, extracting rows for that gene, filtering to the best hits, then running bind_rows() [ensure it still respects mr_threshold etc]

replace [overwrite/rewrite]_in_bunzip2 with overwrite and rewrite arguments

Add the following arguments for pass-through to Rutils::bunzip2:

overwrite # overwrite the extracted archive if it already exists
remove # remove the compressed archive once it's extracted
skip # if already extracted, use the existing extracted archive

Or, pass arguments along as dots or provide bunzip2_args

use `validity = function(x) my_validity_fn(x)` rather than `validity = my_validity_fn`

covr does not run the code for S4-object validity test functions when they are added in this format:

my_validity_fn <- function(object) {
  # blajh blah blah
}
methods::setClass("className", slots = ..., validity = my_validity_fn)

I think the discussion of function-factories here explains the issue

To ensure that S4-object validity functions are ran by covr when instantiating CoxpresDbPartners and CoxpresDbAccessor objects, use the following syntax:

methods::setClass("className", slots = ..., validity = function(x) my_validity_fn(x))

- rewrite setClass for CoxpresDbPartners
- rewrite setClass for CoxpresDbDataframeAccessor

p-value = 0 bug when summarising z-scores that are very negative

Suspect there's a bug in the .summarise_neighbours internal function in evaluate_coex_partners

Like, dude, where's the README?

- how to import a coxpresdb dataset from the files provided by coxpresdb
- how to annotate an imported dataset
- how to identify genes with behaviour similar to their neighbours
- - z-score based test (node-versus-neighbourNodes)
- - correlation based test (node-versus-outEdges)

remove dependence on purrr

We only use purrr::map twice in R/; so could easily rewrite to not import purrr (would probably still need to use purrr in tests; so move to suggests from imports)

Warning in metap::sumz(p_vals): Some studies omitted

Work out the origin of this warning; can the input statistics be modified to circumvent the warning being thrown?

rewrite to use .zip rather than .tar.bz2

... since the coxpresdb.jp database is now released using .zip files.

rewrite to work with a collapsed all-gene database

In the current version, the coxpresdb files look like

# file for target gene_a
gene_b    MR_ab    COR_ab
gene_c    MR_ac    COR_ac
gene_d    MR_ad    COR_ad
...

This leads to inefficient sampling - for each gene sampled you have to re-read it's coexpression data

It would be more efficient to read in the data for all genes at one time from a single file

The file should look like

gene_a    gene_b    MR_ab    COR_ab
gene_a    gene_c    MR_ac    COR_ac
gene_a    gene_d    MR_ad    COR_ad
...
gene_b    ...
...

allow the user to pass in a gene-id-indexed matrix and get coxpresdbr to compute correlations between sources & targets
or pass in correlation-coefficients instead of p-values (coxpresdbr should convert corrs to z-score)

? Find out alternatives for meta-analysis of correlation scores

russhyde / coxpresdbr Goto Github PK

coxpresdbr's People

Contributors

Stargazers

Watchers

coxpresdbr's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs