GithubHelp home page GithubHelp logo

cnluzon / wigglescout Goto Github PK

View Code? Open in Web Editor NEW
2.0 2.0 1.0 886 KB

Explore and visualize genomics bigWig data.

License: GNU General Public License v3.0

R 100.00%
bigwig genomics data-visualization

wigglescout's People

Stargazers

 avatar

Watchers

 avatar  avatar

Forkers

snardeli

wigglescout's Issues

Allow zero flanking region for stretch mode profiles and heatmaps

Right now, the minimal flanking region to be used in both heatmaps and profiles is bin size. Setting upstream or downstream values to 0 will exit with an error.

This should be adapted to also accept empty flanking regions and return just the inside matrix. This only makes sense of course in stretch mode.

Better tests for visual outputs

While adding new features I realised I completely messed up a figure by changing something in the aes function call of ggplot2. However, all the tests succeeded because I am not testing on the actual look of the plots. It would be great to test for this as well.

Provide light clustering functionality for heatmaps

This would apply to:

  • Summary heatmaps. Hierarchical clustering would be preferred as these heatmaps show small amount of rows. However ggplot does not provide dendrogram functionality, so in order to do this I should use R base functionality or just draw the dendrogram somewhere else.
  • Classic heatmaps. K-means or something like this would be good.

How do define genome

Nice packages!
From the help message, I find the genome arguments only mm9 and hg38, How Can I define the other genomes, thanks.

bw_loci fails on MACS `narrowPeak` files but processes correctly resulting GRanges

I recently have spotted an error where if I try to do bw_loci directly on a narrowPeak file I get this issue:

Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec,  : 
  scan() expected 'an integer', got 'NNNNNN'

Where NNNN is a floating point that corresponds to the signalValue column on the narrowPeak file. Apparently this comes from the .loci_to_granges function where there is a call:

bed <- import(loci, format = "BED")

Apparently BiocIO::import does not work with format = BED and narrowPeak although it parses correctly the file if this parameter is skipped.

Rasterize point layer in scatterplots

With ggrastr library it is possible to rasterize layers of a plot. This is useful for scatterplots with many points, when you want to save them in a vectorial format.

bedfile parameters should be named locus

Wherever there is a BED file accepted now, one can provide a GRanges. This is not reflected in the parameter names and function names. I think it should.

This is a major change in the API so I think changes like this should be gathered together for the formal release.

Accept both BED files and GRanges objects

Accepting GRanges objects wherever a BED file is accepted could be useful when processing BED files before plotting (i.e. expanding or shortening loci, filtering and so on).

Plot scatter body introduces repeated points if GRanges are not deduplicated

This is a very specific and unusual case. If provided BED file has redundant loci, at some point in the merging the values are duplicated, creating spurious points. In the plot this does not matter but in the point-count it does.

This would be solved by deduplicating the BED file beforehand, which can be a bit costful but perhaps worth it.

Remove_top should not remove by mean in scatterplot

In general it seems sensible to use the mean to remove the top percentile for multi dimensional plots, but in the case of scatterplot it creates an odd diagonal effect that is not nice.

So in this case I think we should separately remove top_percentile from x and y axis.

Not implemented norm_mode error is not user-friendly

If one provides a norm_mode parameter that is not implemented, .process_norm_func returns NULL, and eventually you get an error because a function does not exist.

.process_norm_func should further validate and return an aproppriate error in such case.

Include selection parameter in bwplot functions that affect bins

Calculations run slow regardless on whether a bigWig file is an example one with very little values, becuause all the bins will still be calculated. This can be fixed by passing on the selection parameter, which is already there.

This speeds up the generation of vignettes, otherwise they take some unnecesary extra time.

norm_func is confusing and does not generalize that much

At the beginning I decided to pass norm_func as a parameter to be able to either normalize signal / background or log2(signal / background). This log2 function is what is passed to the function. This is confusing because it does not convey that signal is divided by background in any way, so users would often think that log2 means only that. It also does not generalize that much because there are only so many ways in which we would like to transform the data, and it allows to inject strange functions probably for no use.

So I propose changing this to normalization and allow a set of string values: fc, log2fc and perhaps diff or whatever we can think of.

Adapt plot sizes defaults to medium size screens

Some of the values don't look good by default in RStudio on laptop screens, which I find inconvenient. Things that look good on display on a laptop will more likely look good on a larger screen than viceversa, so I think default should be shifted towards smaller displays.

Sometimes plot_bw_profile crashes with remove_top

This is a very specific example from a RNA_seq bigwig file from another reference.

The error I got:

Error in quantile.default(rowMeans(full), probs = c(1 - remove_top)) : 
  missing values and NaN's not allowed if 'na.rm' is FALSE

Haven't been able to reproduce this easily but I will look into it.

granges_cbind could probably perform faster

Internal function granges_cbind sorts GRanges objects and merges them. This is OK because it is always called with objects of the same bins and ranges, but it could probably run faster using merge function on dataframe and converting back to GRanges.

bwtools functions subsample parameter

The idea would be to add a parameter like subsample that randomly takes a subset of subsample bins or loci to perform the analysis. This is helpful for two things:

  • Quick run of plots to have an idea of how things look without having to wait.
  • Calculating some bootstrap/montecarlo approach for some analyses.

Probably also requires a seed parameter for testing.

Better practices in ggplot functionality

aes_string uses should be replaced by .data, as aes_string is deprecated and eventually will be removed.
ggplot2 imports should import only the specific functions to each given function.

Switch back to getting genome data from GenomeInfoDb

At some point in the past genome info data was stored as a sysdata object within the package, in order to make it a bit more lightweight, since we only needed really seq lengths info.

At this point, I think this decision is not really solving dependencies and it makes the package less general, as it can only handle mm9/10 and hg38 genomes, which is unreasonably restrictive.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.