cnluzon / wigglescout Goto Github PK

View Code? Open in Web Editor NEW

2.0 2.0 1.0 886 KB

Explore and visualize genomics bigWig data.

License: GNU General Public License v3.0

R 100.00%

bigwig genomics data-visualization

wigglescout's People

Stargazers

Watchers

Forkers

snardeli

wigglescout's Issues

r-cmd-check test on other OS

This is to be done when repository is made public

Allow zero flanking region for stretch mode profiles and heatmaps

Right now, the minimal flanking region to be used in both heatmaps and profiles is bin size. Setting upstream or downstream values to 0 will exit with an error.

This should be adapted to also accept empty flanking regions and return just the inside matrix. This only makes sense of course in stretch mode.

Colors are incorrectly assigned in plot_bw_profile

Sometimes I get unexpected label order. I believe it's the factor sorting alphabetically. Labels are properly assigned, it's just the colors.

Better tests for visual outputs

While adding new features I realised I completely messed up a figure by changing something in the aes function call of ggplot2. However, all the tests succeeded because I am not testing on the actual look of the plots. It would be great to test for this as well.

Provide light clustering functionality for heatmaps

This would apply to:

Summary heatmaps. Hierarchical clustering would be preferred as these heatmaps show small amount of rows. However ggplot does not provide dendrogram functionality, so in order to do this I should use R base functionality or just draw the dendrogram somewhere else.
Classic heatmaps. K-means or something like this would be good.

plot_bw_profile with background files should divide aggregated values

I believe plot_bw_profile should perform the normalization after aggregating values instead of before. Signal is otherwise too noisy.

Add order_by to heatmap caption

This is a relevant parameter to show in the caption, which is not obvious from the plot itself.

How do define genome

Nice packages!
From the help message, I find the genome arguments only mm9 and hg38, How Can I define the other genomes, thanks.

Factor out validate_ functions from bwtools.R

There are a bunch of validation functions that could be factored out from bwtools.R. If refactor is done properly they can probably be reused in additional functionality.

Make utils.R functions comply with the dot-function policy

Now that package internal functions start with a . dot this should also apply to utils.R functions.

Include more information in verbose summarized plots

Summarized heatmaps and profiles could include more information on the number of points included in the plot and so on.

plot_bw_summary_heatmap should not reorder rows

Since the ggplot implementation, plot_bw_summary_heatmap reorders rows, because they are factor type. Factor should be reordered to keep the initial order.

bw_loci fails on MACS `narrowPeak` files but processes correctly resulting GRanges

I recently have spotted an error where if I try to do bw_loci directly on a narrowPeak file I get this issue:

Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec,  : 
  scan() expected 'an integer', got 'NNNNNN'

Where NNNN is a floating point that corresponds to the signalValue column on the narrowPeak file. Apparently this comes from the .loci_to_granges function where there is a call:

bed <- import(loci, format = "BED")

Apparently BiocIO::import does not work with format = BED and narrowPeak although it parses correctly the file if this parameter is skipped.

Rasterize point layer in scatterplots

With ggrastr library it is possible to rasterize layers of a plot. This is useful for scatterplots with many points, when you want to save them in a vectorial format.

.multi_bw_ranges several processors test is slow

Running the full set is still not very slow, but I noticed this test slows noticeably (like seconds). I think this: future::plan(multisession, workers=2) line is the problem.

bedfile parameters should be named locus

Wherever there is a BED file accepted now, one can provide a GRanges. This is not reflected in the parameter names and function names. I think it should.

This is a major change in the API so I think changes like this should be gathered together for the formal release.

Accept both BED files and GRanges objects

Accepting GRanges objects wherever a BED file is accepted could be useful when processing BED files before plotting (i.e. expanding or shortening loci, filtering and so on).

Exclude top percentile loci parameter

Outlier removal - add a parameter to the functions that allows to exclude a percentile of the data.

Plot scatter body introduces repeated points if GRanges are not deduplicated

This is a very specific and unusual case. If provided BED file has redundant loci, at some point in the merging the values are duplicated, creating spurious points. In the plot this does not matter but in the point-count it does.

This would be solved by deduplicating the BED file beforehand, which can be a bit costful but perhaps worth it.

Add an order parameter to plot_bw_heatmap

Heatmap plots now are by default sorted by means. We need an order parameter, so we can plot heatmaps side to side where loci correspond to the same line.

Some loci labels are now full paths instead of basename

This happens in plot_bw_profile and plot_bw_heatmap.

Summary heatmap label lacks one parenthesis

Tiny issue, but needs to be fixed: make_norm_label returns log2(RPGC/background instead of log2(RPGC/background)

plot_bw_loci_summary_heatmap does not handle single NA values properly

It seems that plot_bw_summary_heatmap will aggregate without removing NA's resulting in a NA value over a distribution that contains NA's.

We had not run into this issue before, as our bigWig files are always complete.

Remove_top should not remove by mean in scatterplot

In general it seems sensible to use the mean to remove the top percentile for multi dimensional plots, but in the case of scatterplot it creates an odd diagonal effect that is not nice.

So in this case I think we should separately remove top_percentile from x and y axis.

Not implemented norm_mode error is not user-friendly

If one provides a norm_mode parameter that is not implemented, .process_norm_func returns NULL, and eventually you get an error because a function does not exist.

.process_norm_func should further validate and return an aproppriate error in such case.

Caption in verbose plots sometimes too wide

Caption lines in verbose plots are sometimes longer than the plot, making some of the values not visible.

Make r-cmd-check fail on BiocCheck fail

Now BiocCheck is included in the CI workflow but it doesn't make the test fail.

plot_bw_loci_summary_heatmap log2fc norm is not zero when bw == bg_bw and remove_top > 0

I have noticed in an Input line on a ChromHMM plot that values were close to zero but not exactly zero for the input bigWig when I used remove_top = 0.001 and norm_mode = "log2fc"

I suspect this is because it is removing the elements from only the signal files but not the input ones.

Include selection parameter in bwplot functions that affect bins

Calculations run slow regardless on whether a bigWig file is an example one with very little values, becuause all the bins will still be calculated. This can be fixed by passing on the selection parameter, which is already there.

This speeds up the generation of vignettes, otherwise they take some unnecesary extra time.

norm_func is confusing and does not generalize that much

At the beginning I decided to pass norm_func as a parameter to be able to either normalize signal / background or log2(signal / background). This log2 function is what is passed to the function. This is confusing because it does not convey that signal is divided by background in any way, so users would often think that log2 means only that. It also does not generalize that much because there are only so many ways in which we would like to transform the data, and it allows to inject strange functions probably for no use.

So I propose changing this to normalization and allow a set of string values: fc, log2fc and perhaps diff or whatever we can think of.

Profile plots are missing the x axis kb annotation

This has gotten lost in some recent update

Profile bwtools core functions

I think some of the core bwtools functions: multi_bw_ranges_ bw_ranges have some room for optimization.

Add coverage reports to unit testing

This can probably be done better after package is public.

Adapt plot sizes defaults to medium size screens

Some of the values don't look good by default in RStudio on laptop screens, which I find inconvenient. Things that look good on display on a laptop will more likely look good on a larger screen than viceversa, so I think default should be shifted towards smaller displays.

Make overlayed legend in profile plots transparent

It is not easy to predict when the legend is going to be on top of the lines, so it's better to make it transparent.

Name consistently colors across package

The usual color palette that is not part of other ggplot default palettes should be specified somewhere instead of hard-coded where needed.

Sometimes plot_bw_profile crashes with remove_top

This is a very specific example from a RNA_seq bigwig file from another reference.

The error I got:

Error in quantile.default(rowMeans(full), probs = c(1 - remove_top)) : 
  missing values and NaN's not allowed if 'na.rm' is FALSE

Haven't been able to reproduce this easily but I will look into it.

granges_cbind could probably perform faster

Internal function granges_cbind sorts GRanges objects and merges them. This is OK because it is always called with objects of the same bins and ranges, but it could probably run faster using merge function on dataframe and converting back to GRanges.

bwtools functions subsample parameter

The idea would be to add a parameter like subsample that randomly takes a subset of subsample bins or loci to perform the analysis. This is helpful for two things:

Quick run of plots to have an idea of how things look without having to wait.
Calculating some bootstrap/montecarlo approach for some analyses.

Probably also requires a seed parameter for testing.

cnluzon / wigglescout Goto Github PK

wigglescout's People

Stargazers

Watchers

Forkers

wigglescout's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs