bimsbbioinfo / rcas Goto Github PK

R package for the RNA Centric Annotation System (RCAS)

R 94.35% HTML 5.65%

bioconductor interactive-plots par-clip cage rna modification reporting msigdb go protein-rna-interactions

rcas's Issues

calculateCoverageProfileList argument clarification

calculateCoverageProfileList has an argument with class list or GRangesList, this is has to be clearer. Now, it seems like a list, but GRangesList should be used if the elements of the the list are GRanges objects

https://github.com/BIMSBbioinfo/RCAS/blob/rcas_R/rpackage/RCAS/R/report_functions.R#L351

Error with `runReport`

Hi!

I'm trying to get annotation summary for m6A peak calling result. My bed file is named "Mod.bed".
I tried to run this but got an error, which I don't know why

library(RCAS)
hg19_mod_path <- "/path/to/Mod.bed"
gff_path <- "/path/to/GRCh37_RefSeq_24.gff"
runReport( queryFilePath = hg19_mod_path,
           gffFilePath = gff_path,
           motifAnalysis = FALSE,
           goAnalysis = FALSE )

However, there is no problem to get enriched motif by running the following code

queryRegions <- importBed(filePath = hg19_mod_path, sampleN = 10000)
gff <- importGtf(filePath = "/path/to/GRCh37_RefSeq_24.gff")
motifResults <- runMotifDiscovery(queryRegions = queryRegions, 
                                  resizeN = 15, sampleN = 10000,
                                  genomeVersion = 'hg19', motifWidth = 5,
                                  motifN = 3, nCores = 5)
ggseqlogo::ggseqlogo(motifResults$matches_query)
summary <- getMotifSummaryTable(motifResults)
knitr::kable(summary)

Did I do anything wrong with the code usage? Is there any way to avoid error in runReport and got the annotation summary?

line length shouldn't exceed 80 characters

this is important for some reviewers at BioC, they will comment on it if the lines exceed 80 characters

check if input BED file is correctly formatted

If the BED file doesn't follow the specifications or contains incomplete columns (e.g. missing the "name" field), some functions may fail (e.g. queryGff).

Error in previously working function getFeatureBoundaryCoverage

Dear RCAS team,

First, thanks for creating the RCAS package, I use it frequently to interrogate my CLIP datasets, and it's extremely useful.

Recently, when I try to run the function getFeatureBoundaryCoverage, I get the following error:
invalid class "ScoreMatrix" object: superclass "mMatrix" not defined in the environment of the object's class

In a previously working code chunk. I could not find out what was happening, and it did not seem to depend on my datasets.
Any idea of what could be happening?

Many thanks.

Best regards,
Raul

Increased variety of genomic annotations and genome versions

We plan to keep adding more and more useful plots and tables that can enrich the biological context of the given input datasets. For instance, plots that show the distribution of mutations and polymorphisms in relation to the input transcript segments could provide useful insights in prioritizing potential targets for follow-up studies. Moreover, we would like to provide support for an expanded collection of species (e.g. other eukaryotic model organisms such as Zebrafish) and additional genome builds for the currently supported species (e.g. hg38 for human, mm10 for mouse).

Issue with the `runReport` function.

Thank you for providing this tool, however I have come across a small issue.
The following part of code from the runReport function causes error due to changed structure of the BSGenome package.

  db <- checkSeqDb(genomeVersion)
  # get species name 
  # this is needed for gprofiler functional enrichment 
  fields <- unlist(strsplit(db@organism, ' '))

which gives the error

Loading required package: rtracklayer
Error in strsplit(db@organism, " ") : 
  no slot of name "organism" for this object of class "BSgenome"

This can be fixed by updating the line to

  db <- checkSeqDb(genomeVersion)
  # get species name 
  # this is needed for gprofiler functional enrichment 
  fields <- unlist(strsplit(db@metadata$organism, ' '))

I am not familiar with pull request policy, therefore I am raising it as an issue here.

RCAS r package should have its own repository

I think this is a bit crowded. Ideally, RCAS R package should have its own repository but this up for debate.

More fine-tuned control on the generated HTML reports

Another planned revision on the package is to provide users with more fine-tuned control over the reports the runReport function generates. Currently, users can turn on/off certain analysis modules, but they do not have control over which plots/tables are generated in each module. We believe that providing such a flexibility would be a useful feature for non-programmer users, especially considering that the package will keep expanding with more variety of plots and tables in each module.

suggestions

I really love this tool and the output it gives, even for users like me.

From the RBP researcher point of view I have small suggestions:

A normalization of overlaps like it is done in homer ( I think it is normalized to the total length of the feature)
motif and GO term detection based on features, so for example you can search for motifs only in 3'UTRs and introns and annotate only those GO terms related to this feature
maybe as a little nice add-on to motif discovery you could implement secondary structure prediction tools

However, this is a really nice tool and I hope my suggestions are useful!

Best,

Deniz

non-unique values when setting 'row.names': ‘Ighv1-13’, ‘Ighv5-8’

Hi,
when i use RCAS to process multi-samples :

WT_rep1_path <-"D:/WT_rep1.bed"
WT_rep2_path <-"D:/WT_rep2.bed"
WT_rep3_path <-"D:/WT_rep3.bed"
KO_rep1_path <-"D:/KO_rep1.bed"
KO_rep2_path <-"D:/KO_rep2.bed"
KO_rep3_path <-"D:/KO_rep3.bed"
projData <- data.frame('sampleName' = c('WT_1', 'WT_2','WT_3' 'KO_1', 'KO_2','KO_3'),
'bedFilePath' = c(WT_rep1_path, WT_rep2_path, WT_rep3_path
KO_rep1_path,
KO_rep2_path,KO_rep3_path),
stringsAsFactors = FALSE)
projDataFile <- "D:/myProjDataFile.tsv"
write.table(projData, projDataFile, sep = '\t', quote =FALSE, row.names = FALSE)
gtfFilePath <- "D:/Mus_musculus.GRCm38.102.gtf"
databasePath <-"D:/myProject.sqlite"
createDB(dbPath = databasePath, projDataFile = projDataFile, gtfFilePath = gtfFilePath, genomeVersion = 'mm10',update = TRUE,motifAnalysis = FALSE)

it show error:
Importing GTF annotations
importing gtf file from D:/Mus_musculus.GRCm38.102.gtf
Keeping standard chromosomes only
File D:/Mus_musculus.GRCm38.102.gtf.granges.rds already exists.
Use overwriteObjectAsRds = TRUE to overwrite the file
Parsing transcript features
|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=02m 50s
Saving interval datasets in 'bedData' table
Calculating annotation summaries
|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=02m 28s
Saving annotation summaries in 'annotationSummaries' table
Running function: getIntervalOverlapMatrix for tablegeneOverlaps
|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=14s
Error in .rowNamesDF<-(x, value = value) :
duplicate 'row.names' are not allowed
In addition: Warning message:
non-unique values when setting 'row.names': ‘Ighv1-13’, ‘Ighv5-8’

i think my row.name is unique? so it is why?

meta-analysis capability

As of RCAS version 1.1.1, the package is designed to prepare reports for one input dataset at a time. If a user wanted to do comparative analysis of multiple experimental datasets (for instance, to detect differences between case-control conditions), they would need to run the runReport function multiple times (once for each condition and setting the ‘printProcessedTables’ argument to TRUE). Using ‘printProcessedTables’ argument would enable the user to get the processed data for each experiment that can later be used for down-stream case-control comparisons. However, this would require additional scripting, which may not be ideal for non-programmer users. Therefore, in our next major release with Bioconductor 3.5, we plan to integrate functions that will enable meta-analysis of two or more input datasets at the same time to save the users from the need to do additional programming.

potential bug in runMotifRG

runMotifRG doesn't report any motifs when nCores is set to '1', while it can reproducibly find the same motifs when nCores is set to 2 or more.

bimsbbioinfo / rcas Goto Github PK

rcas's Issues

calculateCoverageProfileList argument clarification

Error with `runReport`

line length shouldn't exceed 80 characters

check if input BED file is correctly formatted

Error in previously working function getFeatureBoundaryCoverage

Increased variety of genomic annotations and genome versions

Issue with the `runReport` function.

RCAS r package should have its own repository

More fine-tuned control on the generated HTML reports

suggestions

non-unique values when setting 'row.names': ‘Ighv1-13’, ‘Ighv5-8’

meta-analysis capability

potential bug in runMotifRG

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs