GithubHelp home page GithubHelp logo

ritan's Introduction

RITAN

RITAN is an R package for Rapid Integration of Term Annotation and Network resources.

RITAN brings together multiple resources, primarily to help answer two questions:

  1. "What functions are achieved by this group of genes?"
  2. "How do these genes work together to achieve that function?"

Specifically, RITAN has been written to:

  1. Identify terms/pathways/genesets represented by (enriched within) the input list of genes
  2. Identify network interactions among the genes within the input list (protein-protein, metabolic, and regulatory)

We have also built RITAN to facilitate research questions such as:

  1. What terms/pathways share similar genes?
  2. How do two resource's definitions of each pathway compare? List their common and unique sets.
  3. From my genes of interest, use network resources to add genes that connect them together or are neighbors.
  4. Annotate a combined list of genes with multiple functions contributed to.
  5. Generate a heatmap for pathway/geneset -by- condition; for each timepoint, treatment, etc., show how term enrichment or pathway activity changes.
  6. Make linking these results to other tools, such as Cytoscape, simpler via generating the focused and annotated subnetwork.

RITAN is to be applied to an unranked list of gene symbols and we perform false-discivery adjustment across all resources used. These and other topics are discussed in our publication:

  • Zimmermann, M.T. et al. RITAN: Rapid Integration of Term Annotation and Network Resources. in review.
  • "We encourage users to decide upon an analysis strategy prior to running their analysis, including consideration of which resources are most appropriate for their dataset/experiment, statistical significance thresholds, background geneset, etc. The ease with which RITAN facilitates multi-resource query may lead users to 'try one more test,' leading to an increased number of hypothesis tests made that may not be accounted for by multiple testing correction. To prevent this, we encourage users to 'include one more resource' - to add an additional resource to a single query in RITAN so that false-dicovery correction is appropriately maintained."

For additional examples of using RITAN, see our vignettes:

  • vignette('enrichment',package='RITAN')
  • vignette('subnetworks',package='RITAN')
  • vignette('choosing_resources',package='RITAN')
  • vignette('resource_relationships',package='RITAN')
  • vignette('multi_tissue_analysis',package='RITAN')

For users interested in using RITAN, but who do not use R

We have made a standalone and self-contained executable available here: https://mcw.box.com/s/bzmt12bgj2uygvcw6bk0hpg4p4w7nm2g To use the app, download the .zip file, extract the zip file, and double-click the file, RITAN-electon.exe.

Format for Group Comparison Expected Result

The standalone app uses Chromium-electron and stand-alone R packaged together using node.js.
For making the combination of Shiny, RPortable, and electron, thank you to: https://github.com/ksasso/Electron_ShinyApp_Deployment


Bringing More Data into RITAN

I make an R script that has all my setup within. That way, I can simply source('~/setup_RITAN.R') and I'm ready to go. If you use RITAN frequently, I suggest writing the source() command in your .Rprofile. Below, are excerpts from my setup script.

To honor data redistribution policies, we provide certain annotation and network data in the RITANdata package. RITAN is most useful when many resources are organized for use. In my lab, we have many genesets, gene panels, pathway resources, gene regulatory networks, etc. organized and loaded into RITAN.

As a middle-ground, we provide here recommendations for downloading other resources and excerpts from my resrouce loading script for adding them to RITAN. Eventually, we plan to add a setup() function to RITAN that would automate this process. For now, we hope the below examples are helpful to users. We welcome feedback on the package and how to work more efficiently with existing open-source solutions.

require(RITANdata)
require(RITAN)

## -------------------------------- -
## Set a couple of utility functions

apath <- '.' # annotation data path - script assumes annotation data are within a common path

csv_split <- function(x){ sort(unique(strsplit( gsub( ' ', '', x ) ,",")[[1]] )) }

read_dlm_as_sif <- function( f = NA, sep = "\t",
                             label_column = 1, gene_column = 2, ...){

  warning(sprintf('Reading delimited file into SIF format.\nAssuming labels are in column %d and genes are in column %d.\n', label_column, gene_column))

  d <- read.table( file=f, sep=sep, ... )
  l <- unique( d[ , label_column ] )
  sif <- list()
  for (n in l){
    i <- ( d[ , label_column ] == n )
    sif[[n]] <- d[ i, gene_column ] %>% unique() %>% sort()
  }

  return(sif)

}

## -------------------------------- -
## Add additional networks to RITAN

network_list$RegNet  <- readSIF( paste(apath, 'RegNetwork/human/human.source.sif', sep="/" ))
network_list$TRRUST  <- readSIF( paste(apath, 'TRRUST/trrust_rawdata.txt', sep="/" ))
network_list$IID     <- readSIF( sprintf('%s/IID/iid.human.2016-03.sif.gz',apath))
network_list$BioPlex <- readSIF( sprintf('%s/BioPlex/BioPlex.sif.gz',apath), et='BioPlex', score=3)
network_list$HPRD    <- readSIF( sprintf('%s/HPRD/HPRD_Release9_062910/BINARY_PROTEIN_PROTEIN_INTERACTIONS.txt', apath), p1=1, p2=4, et=7 )
network_list$BioGRID <- readSIF( sprintf('%s/BioGRID/BIOGRID-ORGANISM-Homo_sapiens-3.4.163.tab2.txt.gz', apath), p1=8, p2=9, et=12, score=19, quote='', skip=1, comment.char='' )
network_list$BioGRID$score <- as.numeric(network_list$BioGRID$score)

## -------------------------------- -
## Add additional genesets to RITAN

CPIC   <- read.csv( 'https://api.pharmgkb.org/v1/download/file/data/cpicPairs.csv', skip=1, header=TRUE, quote='', as.is=TRUE )
COSMIC <- read.table( sprintf("%s/COSMIC/cancer_gene_census_v83_GRCh38.csv",apath), sep=",", header=TRUE, as.is=TRUE)

geneset_list$MSigDB_C6     <- readGMT(sprintf('%s/genesets/c6.all.v6.1.symbols.gmt', apath))
geneset_list$CPIC_LevelABC <- sort(unique(CPIC$Gene[ ! grepl('D', CPIC$CPIC.Level) ])) %>% list(.)
geneset_list$HPO           <- read_dlm_as_sif( sprintf('%s/HPO/ALL_SOURCES_FREQUENT_FEATURES_phenotype_to_genes.2018-05-03.txt.gz', apath), label_column = 1, gene_column = 4, quote = '' )
geneset_list$monarch       <- read_dlm_as_sif( sprintf('%s/monarch/gene_disease.9606.tsv.gz', apath), label_column = 6, gene_column = 2, header = TRUE, quote = '' )
geneset_list$COSMIC        <- sort(unique(as.character( COSMIC$Gene.Symbol ))) %>% list(.)
geneset_list$ACMG56        <- sort(unique(strsplit("BRCA1 BRCA2 TP53 STK11 MLH1 MSH2 MSH6 PMS2 APC MUTYH VHL MEN1 RET PTEN RB1 SDHD SDHAF2 SDHC SDHB TSC1 TSC2 WT1 NF2 COL3A1 FBN1 TGFBR1 TGFBR2 SMAD3 ACTA2 MYLK MYH11 MYBPC3 MYH7 TNNT2 TNNI3 TPM1 MYL3 ACTC1 PRKAG2 GLA MYL2 LMNA RYR2 PKP2 DSP DSC2 TMEM43 DSG2 KCNQ1 KCNH2 SCN5A LDLR APOB PCSK9 RYR1 CACNA1S", " ")[[1]])) %>% list(.)

ritan's People

Contributors

mtzimmermann avatar nturaga avatar hpages avatar lshep avatar vobencha avatar link-ny avatar

Stargazers

Khemlal Nirmalkar avatar  avatar

Watchers

 avatar

Forkers

tools-jusue404

ritan's Issues

Empty column (sample) for enriched terms

Dear Michael,

Thanks for providing such nice package for enrichment analysis. I am starting from the vignette tutorial using RITAN data and I noticed that when using the function

e <- term_enrichment_by_subset( study_set, q_value_threshold = 1e-5, resources = resources, all_symbols = cached_coding_genes)

then ploting with:
plot( e, show_values = FALSE, label_size_y = 7, label_size_x = 7 )

the first sample (GSE9988_LPS_VS_LOW_LPS_MONOCYTE_UP) gets an empty column. Please look at the figures in https://www.bioconductor.org/packages/release/bioc/vignettes/RITAN/inst/doc/enrichment.html

That happens also with my custom data, one of the samples always gets empty terms, even when setting a very high threshold for q_value. So one of the samples gets all enrichment -logp(10) as 0.

When finding enriched terms for each gene list separately with term_enrichment() I do get enriched terms. Also, I noticed in Figure 3 of the tutorial, the GSE9988_LPS_VS_LOW_LPS_MONOCYTE_UP sample gets all 0 (zeros).

Please, could you provide any guidance on this or explanation of why that happens?
Thanks a lot!
Gustavo

min_q parameter

min_q parameter is in plot.term_enrichment but not plot.term_enrichment_by_subset

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.