GithubHelp home page GithubHelp logo

bimsbbioinfo / rcas Goto Github PK

View Code? Open in Web Editor NEW
5.0 5.0 3.0 14.97 MB

R package for the RNA Centric Annotation System (RCAS)

R 94.35% HTML 5.65%
bioconductor interactive-plots par-clip cage rna modification reporting msigdb go protein-rna-interactions

rcas's Introduction

RCAS project

Build Status codecov.io

Introduction

RCAS is an R/Bioconductor package designed as a generic reporting tool for the functional analysis of transcriptome-wide regions of interest detected by high-throughput experiments. Such transcriptomic regions could be, for instance, signal peaks detected by CLIP-Seq analysis for protein-RNA interaction sites, RNA modification sites (alias the epitranscriptome), CAGE-tag locations, or any other collection of query regions at the level of the transcriptome. RCAS produces in-depth annotation summaries and coverage profiles based on the distribution of the query regions with respect to transcript features (exons, introns, 5’/3’ UTR regions, exon-intron boundaries, promoter regions). Moreover, RCAS can carry out functional enrichment analyses and discriminative motif discovery. RCAS supports all genome versions that are available in BSgenome::available.genomes

installation:

Installing from Bioconductor

if (!requireNamespace("BiocManager", quietly=TRUE)) install.packages("BiocManager")

BiocManager::install('RCAS')

Installing the development version from Github

library('devtools')
devtools::install_github('BIMSBbioinfo/RCAS')

Installing via Bioconda channel

conda install bioconductor-rcas -c bioconda

Installing via Guix

guix package -i r r-rcas

usage:

Package vignettes and reference manual

For detailed instructions on how to use RCAS, please see:

Use cases from published RNA-based omics datasets

Multi-sample analysis use case

Single Sample Analysis Use Cases

Citation

In order to cite RCAS, please use:

Bora Uyar, Dilmurat Yusuf, Ricardo Wurmus, Nikolaus Rajewsky, Uwe Ohler, Altuna Akalin; RCAS: an RNA centric annotation system for transcriptome-wide regions of interest. Nucleic Acids Res 2017 gkx120. doi: 10.1093/nar/gkx120

See our publication here.

Acknowledgements

RCAS is developed in the group of Altuna Akalin (head of the Scientific Bioinformatics Platform) by Bora Uyar (Bioinformatics Scientist), Dilmurat Yusuf (Bioinformatics Scientist) and Ricardo Wurmus (System Administrator) at the Berlin Institute of Medical Systems Biology (BIMSB) at the Max-Delbrueck-Center for Molecular Medicine (MDC) in Berlin.

RCAS is developed as a bioinformatics service as part of the RNA Bioinformatics Center, which is one of the eight centers of the German Network for Bioinformatics Infrastructure (de.NBI).

rcas's People

Contributors

borauyar avatar dyusuf avatar hpages avatar jwokaty avatar link-ny avatar lshep avatar nturaga avatar rekado avatar vobencha avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

rcas's Issues

Error with `runReport`

Hi!

I'm trying to get annotation summary for m6A peak calling result. My bed file is named "Mod.bed".
I tried to run this but got an error, which I don't know why

library(RCAS)
hg19_mod_path <- "/path/to/Mod.bed"
gff_path <- "/path/to/GRCh37_RefSeq_24.gff"
runReport( queryFilePath = hg19_mod_path,
           gffFilePath = gff_path,
           motifAnalysis = FALSE,
           goAnalysis = FALSE )

However, there is no problem to get enriched motif by running the following code

queryRegions <- importBed(filePath = hg19_mod_path, sampleN = 10000)
gff <- importGtf(filePath = "/path/to/GRCh37_RefSeq_24.gff")
motifResults <- runMotifDiscovery(queryRegions = queryRegions, 
                                  resizeN = 15, sampleN = 10000,
                                  genomeVersion = 'hg19', motifWidth = 5,
                                  motifN = 3, nCores = 5)
ggseqlogo::ggseqlogo(motifResults$matches_query)
summary <- getMotifSummaryTable(motifResults)
knitr::kable(summary)

Did I do anything wrong with the code usage? Is there any way to avoid error in runReport and got the annotation summary?

suggestions

I really love this tool and the output it gives, even for users like me.

From the RBP researcher point of view I have small suggestions:

  • A normalization of overlaps like it is done in homer ( I think it is normalized to the total length of the feature)

  • motif and GO term detection based on features, so for example you can search for motifs only in 3'UTRs and introns and annotate only those GO terms related to this feature

  • maybe as a little nice add-on to motif discovery you could implement secondary structure prediction tools

However, this is a really nice tool and I hope my suggestions are useful!

Best,

Deniz

Increased variety of genomic annotations and genome versions

We plan to keep adding more and more useful plots and tables that can enrich the biological context of the given input datasets. For instance, plots that show the distribution of mutations and polymorphisms in relation to the input transcript segments could provide useful insights in prioritizing potential targets for follow-up studies. Moreover, we would like to provide support for an expanded collection of species (e.g. other eukaryotic model organisms such as Zebrafish) and additional genome builds for the currently supported species (e.g. hg38 for human, mm10 for mouse).

meta-analysis capability

As of RCAS version 1.1.1, the package is designed to prepare reports for one input dataset at a time. If a user wanted to do comparative analysis of multiple experimental datasets (for instance, to detect differences between case-control conditions), they would need to run the runReport function multiple times (once for each condition and setting the ‘printProcessedTables’ argument to TRUE). Using ‘printProcessedTables’ argument would enable the user to get the processed data for each experiment that can later be used for down-stream case-control comparisons. However, this would require additional scripting, which may not be ideal for non-programmer users. Therefore, in our next major release with Bioconductor 3.5, we plan to integrate functions that will enable meta-analysis of two or more input datasets at the same time to save the users from the need to do additional programming.

Error in previously working function getFeatureBoundaryCoverage

Dear RCAS team,

First, thanks for creating the RCAS package, I use it frequently to interrogate my CLIP datasets, and it's extremely useful.

Recently, when I try to run the function getFeatureBoundaryCoverage, I get the following error:
invalid class "ScoreMatrix" object: superclass "mMatrix" not defined in the environment of the object's class

In a previously working code chunk. I could not find out what was happening, and it did not seem to depend on my datasets.
Any idea of what could be happening?

Many thanks.

Best regards,
Raul

More fine-tuned control on the generated HTML reports

Another planned revision on the package is to provide users with more fine-tuned control over the reports the runReport function generates. Currently, users can turn on/off certain analysis modules, but they do not have control over which plots/tables are generated in each module. We believe that providing such a flexibility would be a useful feature for non-programmer users, especially considering that the package will keep expanding with more variety of plots and tables in each module.

potential bug in runMotifRG

runMotifRG doesn't report any motifs when nCores is set to '1', while it can reproducibly find the same motifs when nCores is set to 2 or more.

non-unique values when setting 'row.names': ‘Ighv1-13’, ‘Ighv5-8’

Hi,
when i use RCAS to process multi-samples :

WT_rep1_path <-"D:/WT_rep1.bed"
WT_rep2_path <-"D:/WT_rep2.bed"
WT_rep3_path <-"D:/WT_rep3.bed"
KO_rep1_path <-"D:/KO_rep1.bed"
KO_rep2_path <-"D:/KO_rep2.bed"
KO_rep3_path <-"D:/KO_rep3.bed"
projData <- data.frame('sampleName' = c('WT_1', 'WT_2','WT_3' 'KO_1', 'KO_2','KO_3'),
'bedFilePath' = c(WT_rep1_path, WT_rep2_path, WT_rep3_path
KO_rep1_path,
KO_rep2_path,KO_rep3_path),
stringsAsFactors = FALSE)
projDataFile <- "D:/myProjDataFile.tsv"
write.table(projData, projDataFile, sep = '\t', quote =FALSE, row.names = FALSE)
gtfFilePath <- "D:/Mus_musculus.GRCm38.102.gtf"
databasePath <-"D:/myProject.sqlite"
createDB(dbPath = databasePath, projDataFile = projDataFile, gtfFilePath = gtfFilePath, genomeVersion = 'mm10',update = TRUE,motifAnalysis = FALSE)

it show error:
Importing GTF annotations
importing gtf file from D:/Mus_musculus.GRCm38.102.gtf
Keeping standard chromosomes only
File D:/Mus_musculus.GRCm38.102.gtf.granges.rds already exists.
Use overwriteObjectAsRds = TRUE to overwrite the file
Parsing transcript features
|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=02m 50s
Saving interval datasets in 'bedData' table
Calculating annotation summaries
|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=02m 28s
Saving annotation summaries in 'annotationSummaries' table
Running function: getIntervalOverlapMatrix for tablegeneOverlaps
|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=14s
Error in .rowNamesDF<-(x, value = value) :
duplicate 'row.names' are not allowed
In addition: Warning message:
non-unique values when setting 'row.names': ‘Ighv1-13’, ‘Ighv5-8’

i think my row.name is unique? so it is why?

Issue with the `runReport` function.

Thank you for providing this tool, however I have come across a small issue.
The following part of code from the runReport function causes error due to changed structure of the BSGenome package.

  db <- checkSeqDb(genomeVersion)
  # get species name 
  # this is needed for gprofiler functional enrichment 
  fields <- unlist(strsplit(db@organism, ' '))

which gives the error

Loading required package: rtracklayer
Error in strsplit(db@organism, " ") : 
  no slot of name "organism" for this object of class "BSgenome"

This can be fixed by updating the line to

  db <- checkSeqDb(genomeVersion)
  # get species name 
  # this is needed for gprofiler functional enrichment 
  fields <- unlist(strsplit(db@metadata$organism, ' '))

I am not familiar with pull request policy, therefore I am raising it as an issue here.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.