GithubHelp home page GithubHelp logo

memes's People

Contributors

jwokaty avatar nturaga avatar snystrom avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

memes's Issues

Overhaul summary plot functions.

Give titles to plot_denovo_matches. Do this by abstracting the stack into a new function that allows titling, then use that inside plot_denovo_match.

Better stats than with grob table?

AME Support

  • runAme

  • importAme

  • plot heatmap results

  • utilities for resolving redundant entries? (ie get_best_tfid for finding tf w/ lowest p-value)

  • database as universalmotif list

  • change environment/option variable names for database to be inclusive to ame

  • modify get_sequence to provide method to add score after sequence position name (sometimes used by ame), take either a vector or a column name from input regions?

  • option to import sequences.tsv when method = "fisher"

tomtom warnings & tests

  • Loaded/excluded database entries (print to terminal)

test:

  • more than 1 db produces correct output (need to left_join on db + idx cols, or is idx sufficient?)

importAME needs a way to guess 'method'

currently, importAME requires the user to say which method they ran. This isn't very helpful to beginners and since we know the different colname types for different methods, it would be easy enough to look for different ones and guess the type instead.

helper to convert ame sequences to GRanges

Need to think about this some more & get feedback. If not using shuffled input it becomes difficult to label the regions by whether they're input or control sequences.

Possible solution modify get_sequences to add an optional ID label which users must use to convert sequences easily??? Seems too complicated.

# Attempt at writing sequence converter for AME results
ame_analysis_seq <- peaks %>% 
  resize(200, "center") %>% 
  get_sequence(dm.genome) %>% 
  runAme(evalue_report_threshold = 30, sequences = TRUE)

ame_analysis_seq$sequences[[1]] %>% 
  tidyr::separate(seq_id, c("pos", 'type'), sep = "_") %>% 
  # what about partitioning or background/control?
  # when using paritioning or control fasta, there is no ID appended after sequence info,
  # so no easy way to label them.... need to think about this
  dplyr::mutate(type = dplyr::case_when(is.na(type) ~ "input",
                                        type == "shuf" ~ "shuffle")) %>% 
  {
    dat <- .
    ranges <- GRanges(.$pos)
    mcols(ranges) <- dat %>% 
      dplyr::select(-pos)
    return(ranges)
  }

Consider Containerizing meme versions

One solution to crossplatform support & instead of using environment variables to point to a local install which could cause issues with different meme versions, is to containerize the meme layer. Interface could be docker or singularity (likely prefer singularity for HPC support).

R Interface to containers:
https://cran.r-project.org/web/packages/babelwhale/index.html

If you do go this route be sure to include the MEME-suite Copyright Notice:
http://meme-suite.org/doc/copyright.html?man_type=web

Experimental: mutate_motifs / update_motifs

dplyr-like mutate_motifs for manipulating motif column of data.frame and assigning to value from data.frame.

update_motifs updates motifs to values from data.frame matching universalmotif slot names (id/alt are used for name/altname).

mutate_motifs currently doesn't support NSE, just vector args. Needs more user testing to see what people like.

rename `motifs` column in results to `motif`

This was a dumb holdover from before, and it will cause confusion, because $motif only holds 1 motif per row.

places to change

  • parseDreme
  • universalmotif_to_meme_df
  • xml parser functions
  • plot utitlites
  • update README to reference correct column
  • generics for dreme_out

Ensure good unit tests in place before making this change

Fix shuffle algorithm

Shuffle dinucleotides by default.

Write test for shuffle using random seed (expect match with 2 iterations @ same seed).

Take stringset as input to runDreme

write a get_sequences function taking GRanges & genome as input

S3 method to deploy runDreme on stringset vs path.

use tmpdir if stringset input

Function to compare all TOMTOM PWMs to discovered PWM

Also tool for selecting what appears to be the true "best match".

Good addition would be RNA count data so you can have plot like

Discovered PWM | PWM#1 | PWM#2 | ...
Spacer or stats | Expression barplot of all TFs

Release 1.0

Core Utilities

  • runDreme
  • runTomTom
  • runAme
  • runFimo
  • runMeme

Helpers

  • Global environment variables/options
  • importXML (Dreme / TomTom)
  • importAME (for all results types)
  • get_sequence
  • user-facing check_meme_install #14
  • convert ame fasta_id column back to GRanges (hard problem, because AME appends information sometimes, but not always, hard to tell if region is the real region or the background region)

Experimental features
Unsure how to move forward with these ideas. Patch universalmotif? Force user to destroy object? Need external input & user testing before making a decision.

  • update tomtom best_match (drop best_* cols, grab top row from .$tomtom, update best_*)
  • mutate_motifs #31
  • update motifs (take data.frame cols, update .$motif entry to reflect new values) #31

Better Error Checking

  • Add dotargs::suggest_flag_name loop if program has nonzero exit status to check if arguments are wrong. Undo argsDict or other processing to flags??
  • runDreme
  • runTomTom
  • runAme
  • runFimo
  • runTomTom returns empty columns for matches (Partial implementation)

Input Types

  • File path
  • stringSet (and associated get_sequence functions)
  • Discovered motifs (universalmotif list)
  • run* output object (if applicable)
  • database with multiple entry types + vector input

Data Types

  • Change motifs to motif in output columns

Plots

  • Dreme/TomTom match plots
  • TomTom hits comparison
  • AME heatmap

Documentation

  • Simple README with core functionality
  • MEME installation instructions (also link MEME-suite.org) in README/vignette
  • dreme core
  • tomtom core
  • ame core
  • fimo core
  • Vignette repeating E93 paper (complex vignette)

Fixes

  • runTomTom currently requires id & altname, need to support id only. #37
  • warn_dreme call in motif_input fails (function not exist)
  • revise motif_input to allow meme results data.frame also (relax is_dreme_results check)
  • add OS check to run* functions.
  • add ggplot2 check to ame_plot_heatmap functions.
  • move ggseqlogo to suggests?

Testing

  • Fix unit tests to work on remote systems (copy test data to /tmp to avoid version issues)
  • check on macOS, windows will always fail (MEME won't install).
  • Fimo tests

Bioconductor Submission

Fimo text output to temp file

--text can cause file to be very large and overrun systems with limited memory if trying to read the whole file into R. Solution is add a return_type argument with 'data' or 'path'. If path, return the file path which the user can choose to import.

To Add:

  • fimo return_type
  • importFimo

separate TOMTOM from DremeStats?

Allow modular running of just DREME followed by TOMTOM. Currently the process is too intimately linked and the internal functions need some refactoring.

use importTomTomXML backend for both import & internal parsing

All other utils use the user-facing import function internally. This should also work with tomtom (and reduce overhead/possibility for bugs in 1 interface vs the other). I forget my original reasoning for this separation, but it's time to merge these if possible.

tomtom fails without altid

Issues
Main reason for this feature is to avoid rare scenario where two motifs may have identical names (like if the user joined two data.frames).

Possible Solutions

  • It may be that tomtom doesn't allow identical named motifs as input (haven't checked this). If this is the case, then I can drop requirement of altname and instead check & error if names are non-unique?
  • add altname columns to everything initialized with NA_character_ if altname column does not exist.

import MEME needs rework

currently importMeme has parse_sequences and combined_sequences flags. They currently assume that the input will be fasta headers of the genomic position. This eliminates using proteins with runMeme.

Refactor so parse_sequences controls whether to convert to GRanges->data.frame, otherwise use data.frame to start for everything.

Additional changes accompanying this fix:
Add flag for if dna / rna = T use parse_sequences otherwise if protein = T don't parse sequences.

  • add user-override sequence parse (parse_sequence = "auto") argument to enforce one way or the other (allow use with user-fasta without sequence headers)

test tomtom database with no altname

Pretty sure the no altname issue carries over to the tomtom database entries as well. Need to fix this using any_of and dplyr::rename_all(recode, alt = "altname") where needed. Or initialize altname to NA_character_ as soon as possible.

  • add unit test for this

Clean up unit tests

General

  • Set NOT_CRAN/meme_is_installed flags for remote systems

Tools

  • TomTom
  • Dreme
  • AME
  • MEME
  • FIMO

Tomtom output needs revision

Problem

everything works well when using dreme results as input, but things break down when using dreme.txt or universalmotif list

It is very useful to have a column of $motif and $best_match with $tomtom full results nested inside. This currently doesn't happen when not using dreme results.

Solution

  • Do not allow path input. Users should use motif list from import function.

issues with dreme.txt input: dreme.txt leaves out extra information such as p-value, pos, and neg counts, etc. so xml will allow consistent behavior. xml vs. txt. Build dreme-results-like data.frame from the query entries in tomtom.xml. If users want to use dreme.txt they can import it as a meme file? (this will error with read_meme currently: submitted PR to universalmotif to fix)

  • For universalmotif input, build dreme-results like dataframe from coercion with as.data.frame and append the motif column w/ universalmotif object. rename 'name' to 'id' to be consistent with meme-suite identifiers. (d60b8bc)

  • better handling of tomtom_results = NULL

  • double check columns returned in tomtom_results, ensure nothing is being left out!

Tests to write

  • check that downstream utilities work with all types of input (shouldn't need multiple dispatch)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.