rmflight / categorycompare Goto Github PK

Github repo of the categoryCompare meta-analysis package

Home Page: http://bioconductor.org/packages/devel/bioc/html/categoryCompare.html

Shell 0.54% R 99.46%

categorycompare's Introduction

Hello! 👋

I'm a bioinformatics analyst with Hunter Moseley at University of Kentucky's Markey Cancer Center. I spend most of my time both analyzing various -omics datasets (transcriptomics, metabolomics, etc) and developing new tools and methods for analyzing these kinds of data. This means I'm heavily dependent on others sharing their data, and proudly consider myself a research parasite.

I really, really enjoy programming in R, although I also dabble occasionally in Python. I occasionally write about my research, personal projects, and my thoughts on science on my blog.

💬 Ask me about quality control and quality assurance of high-feature -omics datasets, writing R packages, and using {targets} for data analyses.
🛠️ I maintain a few packages related to -omics data analysis.

categorycompare's People

Contributors

Stargazers

Watchers

categorycompare's Issues

use new RCy3 functionality

RCy3 has deprecated a lot of the previous functions that existed, so now we need to update to use new functions

proper breaking edges

Currently, when breaking the edges in a Cytoscape graph, the edges in the underlying graph object attached to the cytoscapeWindow object does not get updated. This should be changed.

saving copy of annotation from enrichment

we should have a saved copy of the annotation that is generated from doing the enrichment.

related, for custom annotation we should have just one copy laying around, instead of one associated with each gene list

creation of hypergeometric features should cull annotation

during the creation of a new hypergeom_features object for enrichment, the annotation object bit should get filtered down properly, as well as the universe, etc so we can keep re-using the annotation object for multiple hypergeom_features or other things.

easily save genes annotated to Terms

Given a set of GO terms, easily spit out the genes annotated to those terms

Warning: undocumented methods

WARNING
Undocumented S4 methods:
generic '[' and siglist 'GENccEnrichResult,ANY,ANY,ANY'
generic '[' and siglist 'ccEnrichResult,ANY,ANY,ANY'

AsIs Warning in check

Warning in .simpleDuplicateClass(def, prev) :
the specification for S3 class “AsIs” in package ‘XMLRPC’ seems equivalent to one from package ‘BiocGenerics’ and is not turning on duplicate class definitions for this class

This warning appears in the RCytoscape package as well, so I think this is where it is coming from, because I don't have an explicit import of XMLRPC in my NAMESPACE file.

Note about use of Biobase

NOTE
Namespace in Imports field not imported from: ‘Biobase’
All declared Imports should be used.
Package in Depends field not imported from: ‘Biobase’
These packages needs to imported from for the case when
this namespace is loaded but not attached.

reverse match possible lists against real lists

We should be doing reverse matches of the possible lists we want to see against the actual lists we have, as this should be much, much faster. Of course, do we really need to do this at all? if we use a named list of colors, could we do this a lot faster?

unique enrichment combinations

For both the colors and pie charts, we should be working with only the unique combinations of enriched annotations. This should be possible from the significance matrix.

create test data for v2

We need an appropriate data-set for testing all aspects of the workflow for v2.

To do this we should have two gene lists with 100 genes as the universe, 20 genes as diff, with various annotations with sizes of 2 - 100 so we can apply some different cutoffs to the annotation graph and try all of the different options.

set individual cutoffs

Make it possible for the user to set different statistical cutoffs for each experiment if desired, probably use by having them supply a named argument that defines the experiment

color schemes

Need two ways to do color schemes

Pie - default, create a pie graph where faded color is not-significant. Need a pie chart image for each combination of significant / not-significant
Solid - a solid color is detailed for each of the significant patterns
Bar - Like the pie, but use a bar graph instead

build combined_enrichments incrementally according to user prefs

Instead of trying to have a single function that takes all the arguments, or having derived objects from the combined_enrichments class, why not instead just incrementally build?

This means doing the initial combining with just the enriched_results and the combined_statistics, and then building the graph, getting the significant_statistics, etc. This means we can go back at almost any step and redo it if we want, or easily modify things and see how they will change. So each function that takes the combined_enrichment object, will return that same object with new slots filled in.

We have already done this with the generate_annotation_graph function.

Have to do it with the significant_annotation function

Will also need an add_info_2_graph, and other functions that do this. I think this is probably the cleanest way to go about it while having something that looks reasonably right.

convert to travis-r

With the availability of the new R support in travis-ci, the travis.yml script needs to be converted over to the new format

https://github.com/craigcitro/r-travis/wiki/Porting-to-native-R-support-in-Travis

always use matrix rep of significant lists

When working with which lists a term was found to be significant in, we should always work with the matrix representation (rows = terms, columns = lists). This would easily allow generation of the piegraph visualization, or solid, because correspondence to solid would be for each list combination just generate the vector representing the list, or we could generate a character vector to match against rather trivially.

Note about namespace

NOTE
Packages listed in more than one of Depends, Imports, Suggests, Enhances:
‘BiocGenerics’ ‘Biobase’ ‘AnnotationDbi’ ‘Category’ ‘GSEABase’ ‘hwriter’ ‘colorspace’ ‘graph’

pie charts

generate the pie charts that will be used for visualization in cytoscape

check v2 graph with original

some informal testing has highlighted that the size of the graphs generated by v2 and original are not the same. This means we need to do some heavy testing on why, and figure out which is correct.

add cytoutnodes to vignette

Should have section in vignette saving data for further examination

summarize comparisons

As output of setting significant cutoffs, one summary should be how many significant annotations in each group of sig annotations. Should either be output directly of setting significant levels or assigning nodes, probably significant levels

profiling graph calculations

A lot of work went into the v2 graph similarity calculations and making them able to take advantage of parallel processing, however, it appears that they are not really that much faster than v1. These need to be profiled to see if there are ways to speed it up substantially.

annotation_combinations should take a logical matrix

we should be able to pass into annotation_combinations a logical matrix that defines the allowed combinations instead of defining them all automatically. This is important because some combinations may not make sense. And we may want to have two parameters, valid and not_valid, because it may be easier as a user to define those combinations that are not valid rather than all the valid ones.

easily get list associations

The listMemberships that are stored currently as an annotation of the nodes of the graph in the ccCompareResult object, should be available as a separate object.

enriched_result constructor

Need a constructor that returns an instance of enriched_result that does some basic sanity checks on the structures within

Redesign: Comparisons first, enrichment after

To be really useful, this package needs a complete redesign that focuses on the core functionality of enabling comparisons between annotation enrichment type calculations.

add github repo to url, website, and vignette

Should have a github and bioconductor url listed in the Description file, and also in the vignette

ANY: create easy annotation generation methods for params objects

Instead of depending on the Category package to get the proper annotations, we need to have a general example of getting the annotations. For example, using an EntrezGene database (e.g. org.Hs.eg.db), we could generate the GO annotation this way:

allGO <- as.list(org.Hs.egGO2ALLEGS)

# filter, and get description

Supplying this as an annotation object to the params creator would then allow it to filter the annotations down to just those that have genes from gene universe. This would also be rather generic, it should work for EG, probes, yeast, and other weird things like drosophila and arabidopsis. All that is needed is the database file, and the "GO2ALL" method. Or the select method, as shown here

bug in current code getting annotation id

In .generate_annotation_graph, at comb_enrichment@annotation@annotation_features[keep_features], there are some keep_features from the annotation_id that are not present in the annotation_features from the statistics data. From my current understanding of the code, this should be impossible, as one is derived from the other, and the statistics derived one should not have anything that is not present in the combined one.

So we need to figure out what exactly is going on here.

As an example, in the current test code, "GO:0022402" is in the annotation_id slot for statistics, but not in the named list of stuff for annotation_features.

Redesign: Enrichment

After having a decent model for comparisons, we should support basic enrichment using hypergeometric and GSEA type calculations, as well as conversion from other tools.

We need a simple container with the

objects
universe
annotation

Check for duplicates in all the lists, and the annotation lists.

Do hypergeometric or GSEA (using romer) enrichment. Generate a container appropriate for use in comparison

Switch to RCy3 from RCytoscape

RCytoscape is being deprecated, so we need to switch to RCy3.

RCytoscape will not build as of 2017-08-07.

summary methods for various classes

Should have decent summary methods that display the relevant information for each of our new classes

graph creation crashes with one or fewer annotations

When creating a graph of the enriched annotations, there is currently no check of how many entries are in the annotation list. As it uses a simple double for loop, it will die with a very uninformative message if there are 1 or zero total enriched annotations. This needs to be fixed, especially for cases of examining the results from only a single gene list.

possible workflows for graph / table generation

There are two potential workflows for generating the graph and tables:

Associate with the combined_enrichment object directly
Have independent objects that use combined_enrichment for generation, and tables or graphs exist independently, allowing user to select which output they want

rmflight / categorycompare Goto Github PK

categorycompare's Introduction

Hello! 👋

categorycompare's People

Contributors

Stargazers

Watchers

categorycompare's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs