GithubHelp home page GithubHelp logo

rmflight / categorycompare Goto Github PK

View Code? Open in Web Editor NEW
5.0 5.0 0.0 8.4 MB

Github repo of the categoryCompare meta-analysis package

Home Page: http://bioconductor.org/packages/devel/bioc/html/categoryCompare.html

Shell 0.54% R 99.46%
bioconductor r rstats

categorycompare's Introduction

Hello! 👋

I'm a bioinformatics analyst with Hunter Moseley at University of Kentucky's Markey Cancer Center. I spend most of my time both analyzing various -omics datasets (transcriptomics, metabolomics, etc) and developing new tools and methods for analyzing these kinds of data. This means I'm heavily dependent on others sharing their data, and proudly consider myself a research parasite.

I really, really enjoy programming in R, although I also dabble occasionally in Python. I occasionally write about my research, personal projects, and my thoughts on science on my blog.

  • 💬 Ask me about quality control and quality assurance of high-feature -omics datasets, writing R packages, and using {targets} for data analyses.
  • 🛠️ I maintain a few packages related to -omics data analysis.

categorycompare's People

Contributors

dtenenba avatar hpages avatar rflight79 avatar rmflight avatar sonali-bioc avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

categorycompare's Issues

use new RCy3 functionality

RCy3 has deprecated a lot of the previous functions that existed, so now we need to update to use new functions

proper breaking edges

Currently, when breaking the edges in a Cytoscape graph, the edges in the underlying graph object attached to the cytoscapeWindow object does not get updated. This should be changed.

saving copy of annotation from enrichment

we should have a saved copy of the annotation that is generated from doing the enrichment.

related, for custom annotation we should have just one copy laying around, instead of one associated with each gene list

creation of hypergeometric features should cull annotation

during the creation of a new hypergeom_features object for enrichment, the annotation object bit should get filtered down properly, as well as the universe, etc so we can keep re-using the annotation object for multiple hypergeom_features or other things.

Warning: undocumented methods

WARNING
Undocumented S4 methods:
generic '[' and siglist 'GENccEnrichResult,ANY,ANY,ANY'
generic '[' and siglist 'ccEnrichResult,ANY,ANY,ANY'

AsIs Warning in check

Warning in .simpleDuplicateClass(def, prev) :
the specification for S3 class “AsIs” in package ‘XMLRPC’ seems equivalent to one from package ‘BiocGenerics’ and is not turning on duplicate class definitions for this class

This warning appears in the RCytoscape package as well, so I think this is where it is coming from, because I don't have an explicit import of XMLRPC in my NAMESPACE file.

Note about use of Biobase

NOTE
Namespace in Imports field not imported from: ‘Biobase’
All declared Imports should be used.
Package in Depends field not imported from: ‘Biobase’
These packages needs to imported from for the case when
this namespace is loaded but not attached.

reverse match possible lists against real lists

We should be doing reverse matches of the possible lists we want to see against the actual lists we have, as this should be much, much faster. Of course, do we really need to do this at all? if we use a named list of colors, could we do this a lot faster?

unique enrichment combinations

For both the colors and pie charts, we should be working with only the unique combinations of enriched annotations. This should be possible from the significance matrix.

create test data for v2

We need an appropriate data-set for testing all aspects of the workflow for v2.

To do this we should have two gene lists with 100 genes as the universe, 20 genes as diff, with various annotations with sizes of 2 - 100 so we can apply some different cutoffs to the annotation graph and try all of the different options.

set individual cutoffs

Make it possible for the user to set different statistical cutoffs for each experiment if desired, probably use by having them supply a named argument that defines the experiment

color schemes

Need two ways to do color schemes

Pie - default, create a pie graph where faded color is not-significant. Need a pie chart image for each combination of significant / not-significant
Solid - a solid color is detailed for each of the significant patterns
Bar - Like the pie, but use a bar graph instead

build combined_enrichments incrementally according to user prefs

Instead of trying to have a single function that takes all the arguments, or having derived objects from the combined_enrichments class, why not instead just incrementally build?

This means doing the initial combining with just the enriched_results and the combined_statistics, and then building the graph, getting the significant_statistics, etc. This means we can go back at almost any step and redo it if we want, or easily modify things and see how they will change. So each function that takes the combined_enrichment object, will return that same object with new slots filled in.

We have already done this with the generate_annotation_graph function.

Have to do it with the significant_annotation function

Will also need an add_info_2_graph, and other functions that do this. I think this is probably the cleanest way to go about it while having something that looks reasonably right.

always use matrix rep of significant lists

When working with which lists a term was found to be significant in, we should always work with the matrix representation (rows = terms, columns = lists). This would easily allow generation of the piegraph visualization, or solid, because correspondence to solid would be for each list combination just generate the vector representing the list, or we could generate a character vector to match against rather trivially.

Note about namespace

NOTE
Packages listed in more than one of Depends, Imports, Suggests, Enhances:
‘BiocGenerics’ ‘Biobase’ ‘AnnotationDbi’ ‘Category’ ‘GSEABase’ ‘hwriter’ ‘colorspace’ ‘graph’

pie charts

generate the pie charts that will be used for visualization in cytoscape

check v2 graph with original

some informal testing has highlighted that the size of the graphs generated by v2 and original are not the same. This means we need to do some heavy testing on why, and figure out which is correct.

summarize comparisons

As output of setting significant cutoffs, one summary should be how many significant annotations in each group of sig annotations. Should either be output directly of setting significant levels or assigning nodes, probably significant levels

profiling graph calculations

A lot of work went into the v2 graph similarity calculations and making them able to take advantage of parallel processing, however, it appears that they are not really that much faster than v1. These need to be profiled to see if there are ways to speed it up substantially.

annotation_combinations should take a logical matrix

we should be able to pass into annotation_combinations a logical matrix that defines the allowed combinations instead of defining them all automatically. This is important because some combinations may not make sense. And we may want to have two parameters, valid and not_valid, because it may be easier as a user to define those combinations that are not valid rather than all the valid ones.

easily get list associations

The listMemberships that are stored currently as an annotation of the nodes of the graph in the ccCompareResult object, should be available as a separate object.

enriched_result constructor

Need a constructor that returns an instance of enriched_result that does some basic sanity checks on the structures within

ANY: create easy annotation generation methods for params objects

Instead of depending on the Category package to get the proper annotations, we need to have a general example of getting the annotations. For example, using an EntrezGene database (e.g. org.Hs.eg.db), we could generate the GO annotation this way:

allGO <- as.list(org.Hs.egGO2ALLEGS)

# filter, and get description

Supplying this as an annotation object to the params creator would then allow it to filter the annotations down to just those that have genes from gene universe. This would also be rather generic, it should work for EG, probes, yeast, and other weird things like drosophila and arabidopsis. All that is needed is the database file, and the "GO2ALL" method. Or the select method, as shown here

bug in current code getting annotation id

In .generate_annotation_graph, at comb_enrichment@annotation@annotation_features[keep_features], there are some keep_features from the annotation_id that are not present in the annotation_features from the statistics data. From my current understanding of the code, this should be impossible, as one is derived from the other, and the statistics derived one should not have anything that is not present in the combined one.

So we need to figure out what exactly is going on here.

As an example, in the current test code, "GO:0022402" is in the annotation_id slot for statistics, but not in the named list of stuff for annotation_features.

Redesign: Enrichment

After having a decent model for comparisons, we should support basic enrichment using hypergeometric and GSEA type calculations, as well as conversion from other tools.

We need a simple container with the

  • objects
  • universe
  • annotation

Check for duplicates in all the lists, and the annotation lists.

Do hypergeometric or GSEA (using romer) enrichment. Generate a container appropriate for use in comparison

graph creation crashes with one or fewer annotations

When creating a graph of the enriched annotations, there is currently no check of how many entries are in the annotation list. As it uses a simple double for loop, it will die with a very uninformative message if there are 1 or zero total enriched annotations. This needs to be fixed, especially for cases of examining the results from only a single gene list.

possible workflows for graph / table generation

There are two potential workflows for generating the graph and tables:

  1. Associate with the combined_enrichment object directly
  2. Have independent objects that use combined_enrichment for generation, and tables or graphs exist independently, allowing user to select which output they want

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.