bimsbbioinfo / rcas Goto Github PK
View Code? Open in Web Editor NEWR package for the RNA Centric Annotation System (RCAS)
R package for the RNA Centric Annotation System (RCAS)
calculateCoverageProfileList has an argument with class list or GRangesList, this is has to be clearer. Now, it seems like a list, but GRangesList should be used if the elements of the the list are GRanges objects
https://github.com/BIMSBbioinfo/RCAS/blob/rcas_R/rpackage/RCAS/R/report_functions.R#L351
Hi!
I'm trying to get annotation summary for m6A peak calling result. My bed file is named "Mod.bed".
I tried to run this but got an error, which I don't know why
library(RCAS)
hg19_mod_path <- "/path/to/Mod.bed"
gff_path <- "/path/to/GRCh37_RefSeq_24.gff"
runReport( queryFilePath = hg19_mod_path,
gffFilePath = gff_path,
motifAnalysis = FALSE,
goAnalysis = FALSE )
However, there is no problem to get enriched motif by running the following code
queryRegions <- importBed(filePath = hg19_mod_path, sampleN = 10000)
gff <- importGtf(filePath = "/path/to/GRCh37_RefSeq_24.gff")
motifResults <- runMotifDiscovery(queryRegions = queryRegions,
resizeN = 15, sampleN = 10000,
genomeVersion = 'hg19', motifWidth = 5,
motifN = 3, nCores = 5)
ggseqlogo::ggseqlogo(motifResults$matches_query)
summary <- getMotifSummaryTable(motifResults)
knitr::kable(summary)
Did I do anything wrong with the code usage? Is there any way to avoid error in runReport
and got the annotation summary?
this is important for some reviewers at BioC, they will comment on it if the lines exceed 80 characters
If the BED file doesn't follow the specifications or contains incomplete columns (e.g. missing the "name" field), some functions may fail (e.g. queryGff).
Dear RCAS team,
First, thanks for creating the RCAS package, I use it frequently to interrogate my CLIP datasets, and it's extremely useful.
Recently, when I try to run the function getFeatureBoundaryCoverage, I get the following error:
invalid class "ScoreMatrix" object: superclass "mMatrix" not defined in the environment of the object's class
In a previously working code chunk. I could not find out what was happening, and it did not seem to depend on my datasets.
Any idea of what could be happening?
Many thanks.
Best regards,
Raul
We plan to keep adding more and more useful plots and tables that can enrich the biological context of the given input datasets. For instance, plots that show the distribution of mutations and polymorphisms in relation to the input transcript segments could provide useful insights in prioritizing potential targets for follow-up studies. Moreover, we would like to provide support for an expanded collection of species (e.g. other eukaryotic model organisms such as Zebrafish) and additional genome builds for the currently supported species (e.g. hg38 for human, mm10 for mouse).
Thank you for providing this tool, however I have come across a small issue.
The following part of code from the runReport
function causes error due to changed structure of the BSGenome
package.
db <- checkSeqDb(genomeVersion)
# get species name
# this is needed for gprofiler functional enrichment
fields <- unlist(strsplit(db@organism, ' '))
which gives the error
Loading required package: rtracklayer
Error in strsplit(db@organism, " ") :
no slot of name "organism" for this object of class "BSgenome"
This can be fixed by updating the line to
db <- checkSeqDb(genomeVersion)
# get species name
# this is needed for gprofiler functional enrichment
fields <- unlist(strsplit(db@metadata$organism, ' '))
I am not familiar with pull request policy, therefore I am raising it as an issue here.
I think this is a bit crowded. Ideally, RCAS R package should have its own repository but this up for debate.
Another planned revision on the package is to provide users with more fine-tuned control over the reports the runReport function generates. Currently, users can turn on/off certain analysis modules, but they do not have control over which plots/tables are generated in each module. We believe that providing such a flexibility would be a useful feature for non-programmer users, especially considering that the package will keep expanding with more variety of plots and tables in each module.
I really love this tool and the output it gives, even for users like me.
From the RBP researcher point of view I have small suggestions:
A normalization of overlaps like it is done in homer ( I think it is normalized to the total length of the feature)
motif and GO term detection based on features, so for example you can search for motifs only in 3'UTRs and introns and annotate only those GO terms related to this feature
maybe as a little nice add-on to motif discovery you could implement secondary structure prediction tools
However, this is a really nice tool and I hope my suggestions are useful!
Best,
Deniz
Hi,
when i use RCAS to process multi-samples :
WT_rep1_path <-"D:/WT_rep1.bed"
WT_rep2_path <-"D:/WT_rep2.bed"
WT_rep3_path <-"D:/WT_rep3.bed"
KO_rep1_path <-"D:/KO_rep1.bed"
KO_rep2_path <-"D:/KO_rep2.bed"
KO_rep3_path <-"D:/KO_rep3.bed"
projData <- data.frame('sampleName' = c('WT_1', 'WT_2','WT_3' 'KO_1', 'KO_2','KO_3'),
'bedFilePath' = c(WT_rep1_path, WT_rep2_path, WT_rep3_path
KO_rep1_path,
KO_rep2_path,KO_rep3_path),
stringsAsFactors = FALSE)
projDataFile <- "D:/myProjDataFile.tsv"
write.table(projData, projDataFile, sep = '\t', quote =FALSE, row.names = FALSE)
gtfFilePath <- "D:/Mus_musculus.GRCm38.102.gtf"
databasePath <-"D:/myProject.sqlite"
createDB(dbPath = databasePath, projDataFile = projDataFile, gtfFilePath = gtfFilePath, genomeVersion = 'mm10',update = TRUE,motifAnalysis = FALSE)
it show error:
Importing GTF annotations
importing gtf file from D:/Mus_musculus.GRCm38.102.gtf
Keeping standard chromosomes only
File D:/Mus_musculus.GRCm38.102.gtf.granges.rds already exists.
Use overwriteObjectAsRds = TRUE to overwrite the file
Parsing transcript features
|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=02m 50s
Saving interval datasets in 'bedData' table
Calculating annotation summaries
|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=02m 28s
Saving annotation summaries in 'annotationSummaries' table
Running function: getIntervalOverlapMatrix for tablegeneOverlaps
|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=14s
Error in .rowNamesDF<-
(x, value = value) :
duplicate 'row.names' are not allowed
In addition: Warning message:
non-unique values when setting 'row.names': ‘Ighv1-13’, ‘Ighv5-8’
i think my row.name is unique? so it is why?
As of RCAS version 1.1.1, the package is designed to prepare reports for one input dataset at a time. If a user wanted to do comparative analysis of multiple experimental datasets (for instance, to detect differences between case-control conditions), they would need to run the runReport function multiple times (once for each condition and setting the ‘printProcessedTables’ argument to TRUE). Using ‘printProcessedTables’ argument would enable the user to get the processed data for each experiment that can later be used for down-stream case-control comparisons. However, this would require additional scripting, which may not be ideal for non-programmer users. Therefore, in our next major release with Bioconductor 3.5, we plan to integrate functions that will enable meta-analysis of two or more input datasets at the same time to save the users from the need to do additional programming.
runMotifRG doesn't report any motifs when nCores is set to '1', while it can reproducibly find the same motifs when nCores is set to 2 or more.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.