GithubHelp home page GithubHelp logo

bimsbbioinfo / rcas Goto Github PK

View Code? Open in Web Editor NEW
5.0 5.0 3.0 14.97 MB

R package for the RNA Centric Annotation System (RCAS)

R 94.35% HTML 5.65%
bioconductor interactive-plots par-clip cage rna modification reporting msigdb go protein-rna-interactions

rcas's Issues

Error with `runReport`

Hi!

I'm trying to get annotation summary for m6A peak calling result. My bed file is named "Mod.bed".
I tried to run this but got an error, which I don't know why

library(RCAS)
hg19_mod_path <- "/path/to/Mod.bed"
gff_path <- "/path/to/GRCh37_RefSeq_24.gff"
runReport( queryFilePath = hg19_mod_path,
           gffFilePath = gff_path,
           motifAnalysis = FALSE,
           goAnalysis = FALSE )

However, there is no problem to get enriched motif by running the following code

queryRegions <- importBed(filePath = hg19_mod_path, sampleN = 10000)
gff <- importGtf(filePath = "/path/to/GRCh37_RefSeq_24.gff")
motifResults <- runMotifDiscovery(queryRegions = queryRegions, 
                                  resizeN = 15, sampleN = 10000,
                                  genomeVersion = 'hg19', motifWidth = 5,
                                  motifN = 3, nCores = 5)
ggseqlogo::ggseqlogo(motifResults$matches_query)
summary <- getMotifSummaryTable(motifResults)
knitr::kable(summary)

Did I do anything wrong with the code usage? Is there any way to avoid error in runReport and got the annotation summary?

Error in previously working function getFeatureBoundaryCoverage

Dear RCAS team,

First, thanks for creating the RCAS package, I use it frequently to interrogate my CLIP datasets, and it's extremely useful.

Recently, when I try to run the function getFeatureBoundaryCoverage, I get the following error:
invalid class "ScoreMatrix" object: superclass "mMatrix" not defined in the environment of the object's class

In a previously working code chunk. I could not find out what was happening, and it did not seem to depend on my datasets.
Any idea of what could be happening?

Many thanks.

Best regards,
Raul

Increased variety of genomic annotations and genome versions

We plan to keep adding more and more useful plots and tables that can enrich the biological context of the given input datasets. For instance, plots that show the distribution of mutations and polymorphisms in relation to the input transcript segments could provide useful insights in prioritizing potential targets for follow-up studies. Moreover, we would like to provide support for an expanded collection of species (e.g. other eukaryotic model organisms such as Zebrafish) and additional genome builds for the currently supported species (e.g. hg38 for human, mm10 for mouse).

Issue with the `runReport` function.

Thank you for providing this tool, however I have come across a small issue.
The following part of code from the runReport function causes error due to changed structure of the BSGenome package.

  db <- checkSeqDb(genomeVersion)
  # get species name 
  # this is needed for gprofiler functional enrichment 
  fields <- unlist(strsplit(db@organism, ' '))

which gives the error

Loading required package: rtracklayer
Error in strsplit(db@organism, " ") : 
  no slot of name "organism" for this object of class "BSgenome"

This can be fixed by updating the line to

  db <- checkSeqDb(genomeVersion)
  # get species name 
  # this is needed for gprofiler functional enrichment 
  fields <- unlist(strsplit(db@metadata$organism, ' '))

I am not familiar with pull request policy, therefore I am raising it as an issue here.

More fine-tuned control on the generated HTML reports

Another planned revision on the package is to provide users with more fine-tuned control over the reports the runReport function generates. Currently, users can turn on/off certain analysis modules, but they do not have control over which plots/tables are generated in each module. We believe that providing such a flexibility would be a useful feature for non-programmer users, especially considering that the package will keep expanding with more variety of plots and tables in each module.

suggestions

I really love this tool and the output it gives, even for users like me.

From the RBP researcher point of view I have small suggestions:

  • A normalization of overlaps like it is done in homer ( I think it is normalized to the total length of the feature)

  • motif and GO term detection based on features, so for example you can search for motifs only in 3'UTRs and introns and annotate only those GO terms related to this feature

  • maybe as a little nice add-on to motif discovery you could implement secondary structure prediction tools

However, this is a really nice tool and I hope my suggestions are useful!

Best,

Deniz

non-unique values when setting 'row.names': ‘Ighv1-13’, ‘Ighv5-8’

Hi,
when i use RCAS to process multi-samples :

WT_rep1_path <-"D:/WT_rep1.bed"
WT_rep2_path <-"D:/WT_rep2.bed"
WT_rep3_path <-"D:/WT_rep3.bed"
KO_rep1_path <-"D:/KO_rep1.bed"
KO_rep2_path <-"D:/KO_rep2.bed"
KO_rep3_path <-"D:/KO_rep3.bed"
projData <- data.frame('sampleName' = c('WT_1', 'WT_2','WT_3' 'KO_1', 'KO_2','KO_3'),
'bedFilePath' = c(WT_rep1_path, WT_rep2_path, WT_rep3_path
KO_rep1_path,
KO_rep2_path,KO_rep3_path),
stringsAsFactors = FALSE)
projDataFile <- "D:/myProjDataFile.tsv"
write.table(projData, projDataFile, sep = '\t', quote =FALSE, row.names = FALSE)
gtfFilePath <- "D:/Mus_musculus.GRCm38.102.gtf"
databasePath <-"D:/myProject.sqlite"
createDB(dbPath = databasePath, projDataFile = projDataFile, gtfFilePath = gtfFilePath, genomeVersion = 'mm10',update = TRUE,motifAnalysis = FALSE)

it show error:
Importing GTF annotations
importing gtf file from D:/Mus_musculus.GRCm38.102.gtf
Keeping standard chromosomes only
File D:/Mus_musculus.GRCm38.102.gtf.granges.rds already exists.
Use overwriteObjectAsRds = TRUE to overwrite the file
Parsing transcript features
|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=02m 50s
Saving interval datasets in 'bedData' table
Calculating annotation summaries
|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=02m 28s
Saving annotation summaries in 'annotationSummaries' table
Running function: getIntervalOverlapMatrix for tablegeneOverlaps
|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=14s
Error in .rowNamesDF<-(x, value = value) :
duplicate 'row.names' are not allowed
In addition: Warning message:
non-unique values when setting 'row.names': ‘Ighv1-13’, ‘Ighv5-8’

i think my row.name is unique? so it is why?

meta-analysis capability

As of RCAS version 1.1.1, the package is designed to prepare reports for one input dataset at a time. If a user wanted to do comparative analysis of multiple experimental datasets (for instance, to detect differences between case-control conditions), they would need to run the runReport function multiple times (once for each condition and setting the ‘printProcessedTables’ argument to TRUE). Using ‘printProcessedTables’ argument would enable the user to get the processed data for each experiment that can later be used for down-stream case-control comparisons. However, this would require additional scripting, which may not be ideal for non-programmer users. Therefore, in our next major release with Bioconductor 3.5, we plan to integrate functions that will enable meta-analysis of two or more input datasets at the same time to save the users from the need to do additional programming.

potential bug in runMotifRG

runMotifRG doesn't report any motifs when nCores is set to '1', while it can reproducibly find the same motifs when nCores is set to 2 or more.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.