snystrom / memes Goto Github PK
View Code? Open in Web Editor NEWAn R interface to the MEME Suite
Home Page: https://snystrom.github.io/memes/
License: Other
An R interface to the MEME Suite
Home Page: https://snystrom.github.io/memes/
License: Other
Utils
Give titles to plot_denovo_matches. Do this by abstracting the stack into a new function that allows titling, then use that inside plot_denovo_match.
Better stats than with grob table?
runAme
importAme
plot heatmap results
utilities for resolving redundant entries? (ie get_best_tfid for finding tf w/ lowest p-value)
database as universalmotif list
change environment/option variable names for database to be inclusive to ame
modify get_sequence
to provide method to add score
after sequence position name (sometimes used by ame), take either a vector or a column name from input regions?
option to import sequences.tsv
when method = "fisher"
Perhaps consider also adding as_universalmotif
method which runs update_motifs
, then returns motif
columns only as list.
Originally posted by @snystrom in #31 (comment)
test:
add cols to dreme output with the 'm01', 'dreme-1', 'seq' values in addition to generating the name
columns.
# Runs correctly
options(meme_db = system.file("extdata/flyFactorSurvey_cleaned.meme", package = "dremeR"))
tt_out <- runTomTom(dreme_out)
vs.
# Runs forever
options(meme_db = system.file("extdata/flyFactorSurvey_cleaned.meme"))
tt_out <- runTomTom(dreme_out)
add option to rescale heatmap values within a group? Ie use rank but in case regions are filtered, rescale to rank 1:n
So end-user can troubleshoot whether their install is detected (and can run? Check with -h or --version?).
currently, importAME requires the user to say which method they ran. This isn't very helpful to beginners and since we know the different colname types for different methods, it would be easy enough to look for different ones and guess the type instead.
Try: m01-IUPAC
This way rank and sequence are encoded into the list object. Might make things a bit easier to deal with later.
Need to think about this some more & get feedback. If not using shuffled input it becomes difficult to label the regions by whether they're input or control sequences.
Possible solution modify get_sequences
to add an optional ID label which users must use to convert sequences easily??? Seems too complicated.
# Attempt at writing sequence converter for AME results
ame_analysis_seq <- peaks %>%
resize(200, "center") %>%
get_sequence(dm.genome) %>%
runAme(evalue_report_threshold = 30, sequences = TRUE)
ame_analysis_seq$sequences[[1]] %>%
tidyr::separate(seq_id, c("pos", 'type'), sep = "_") %>%
# what about partitioning or background/control?
# when using paritioning or control fasta, there is no ID appended after sequence info,
# so no easy way to label them.... need to think about this
dplyr::mutate(type = dplyr::case_when(is.na(type) ~ "input",
type == "shuf" ~ "shuffle")) %>%
{
dat <- .
ranges <- GRanges(.$pos)
mcols(ranges) <- dat %>%
dplyr::select(-pos)
return(ranges)
}
method to check list is list of universalmotifs
S3 to deploy path vs list
also need to use tmpdir if using list input
So shall it be.
utilities for masking sequences?
Use meme built-in dust
?
One solution to crossplatform support & instead of using environment variables to point to a local install which could cause issues with different meme versions, is to containerize the meme layer. Interface could be docker or singularity (likely prefer singularity for HPC support).
R Interface to containers:
https://cran.r-project.org/web/packages/babelwhale/index.html
If you do go this route be sure to include the MEME-suite Copyright Notice:
http://meme-suite.org/doc/copyright.html?man_type=web
dplyr-like mutate_motifs
for manipulating motif
column of data.frame and assigning to value from data.frame.
update_motifs
updates motifs to values from data.frame matching universalmotif slot names (id/alt
are used for name
/altname
).
mutate_motifs
currently doesn't support NSE, just vector args. Needs more user testing to see what people like.
dreme_input -> sequence_input??
tomtom_input -> motif_input??
update generics
This was a dumb holdover from before, and it will cause confusion, because $motif only holds 1 motif per row.
It's been so long since I looked at this piece of code I need to sort this out.
This might have to do with inheritance of previous named state of this column.
But check when running from .meme path also. Could require an additional modification to the import step.
Shuffle dinucleotides by default.
Write test for shuffle using random seed (expect match with 2 iterations @ same seed).
Should allow multiple database paths (since tomtom allows this), and using motif list as database.
Also implement for AME
Should expose some import functions to users so they can work with MEME-server data inside R as well.
write a get_sequences function taking GRanges & genome as input
S3 method to deploy runDreme on stringset vs path.
use tmpdir if stringset input
Some ideas:
Also tool for selecting what appears to be the true "best match".
Good addition would be RNA count data so you can have plot like
Discovered PWM | PWM#1 | PWM#2 | ...
Spacer or stats | Expression barplot of all TFs
dotargs will allow more flexibility to commandline interface.
write_fasta_from_region
runFimoGenome
Core Utilities
Helpers
check_meme_install
#14Experimental features
Unsure how to move forward with these ideas. Patch universalmotif? Force user to destroy object? Need external input & user testing before making a decision.
mutate_motifs
#31Better Error Checking
dotargs::suggest_flag_name
loop if program has nonzero exit status to check if arguments are wrong. Undo argsDict or other processing to flags??Input Types
run*
output object (if applicable)database
with multiple entry types + vector inputData Types
motifs
to motif
in output columnsPlots
Documentation
Fixes
warn_dreme
call in motif_input
fails (function not exist)is_dreme_results
check)run*
functions.ggplot2
check to ame_plot_heatmap
functions.ggseqlogo
to suggests?Testing
Bioconductor Submission
cmdfun
accepted to CRAN--text can cause file to be very large and overrun systems with limited memory if trying to read the whole file into R. Solution is add a return_type
argument with 'data' or 'path'. If path, return the file path which the user can choose to import.
To Add:
return_type
Use example from Megan?
If error, return stdError/stdOut to user
Allow modular running of just DREME followed by TOMTOM. Currently the process is too intimately linked and the internal functions need some refactoring.
All other utils use the user-facing import function internally. This should also work with tomtom (and reduce overhead/possibility for bugs in 1 interface vs the other). I forget my original reasoning for this separation, but it's time to merge these if possible.
Issues
Main reason for this feature is to avoid rare scenario where two motifs may have identical names (like if the user joined two data.frames).
Possible Solutions
Change "tomtom_db" to "meme_db" for env & options.
default = /meme/db/
currently importMeme has parse_sequences
and combined_sequences
flags. They currently assume that the input will be fasta headers of the genomic position. This eliminates using proteins with runMeme.
Refactor so parse_sequences
controls whether to convert to GRanges->data.frame, otherwise use data.frame to start for everything.
Additional changes accompanying this fix:
Add flag for if dna / rna = T use parse_sequences
otherwise if protein = T don't parse sequences.
Also, filter expressed TFs from motifDB list, then use as tomtom/ame search, etc.
Use as_universalmotif_df
on motifDb query to clean up motif entries before using as database. (Example: flyFactor Survey FBgn).
Pretty sure the no altname issue carries over to the tomtom database entries as well. Need to fix this using any_of
and dplyr::rename_all(recode, alt = "altname")
where needed. Or initialize altname to NA_character_
as soon as possible.
everything works well when using dreme results as input, but things break down when using dreme.txt or universalmotif list
It is very useful to have a column of $motif and $best_match with $tomtom full results nested inside. This currently doesn't happen when not using dreme results.
issues with dreme.txt
input: dreme.txt leaves out extra information such as p-value, pos, and neg counts, etc. so xml will allow consistent behavior. xml vs. txt. Build dreme-results-like data.frame from the query entries in tomtom.xml
. If users want to use dreme.txt
they can import it as a meme file? (this will error with read_meme currently: submitted PR to universalmotif to fix)
For universalmotif input, build dreme-results like dataframe from coercion with as.data.frame
and append the motif
column w/ universalmotif object. rename 'name' to 'id' to be consistent with meme-suite identifiers. (d60b8bc)
better handling of tomtom_results = NULL
double check columns returned in tomtom_results, ensure nothing is being left out!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.