simonvh / fluff Goto Github PK

Fluff is a Python package that contains several scripts to produce pretty, publication-quality figures for next-generation sequencing experiments.

License: MIT License

Python 100.00%

fluff's People

Contributors

Stargazers

Watchers

Forkers

cnatures cauyrd readbio dolittle007 al3n70rn zqfang mpg-age-bioinformatics hrk2109 shouldsee rizoic maarten-vd-sande fpfonseca trellixvulnteam vonalphabiszulu

fluff's Issues

Duplicates/repeats in fluff_profile.py

fluff_heatmap.py: track and comment lines

Currently if a track line or a comment line (#) is present in the BED file, fluff_heatmap.py fails to run. It would be a good idea to allow BED files that contain these lines.

ticks in the genomic scale have disappeared

Is it possible to set different y axis for each track shown on fluff profile image ?

Hi,

I would like to use fluff profile to show my tracks, is it possible to manually set different vertical viewing range(y axis) for each track like UCSC genome browser ?

By the way, can I set different colors for each track by -c option ?

Thanks for your help !

grey lines in heatmap output are misaligned

See here:

It seems they are off by one at the top and bottom (maybe this is not noticable in bigger heatmaps, but you can easily see it here).

remove Pycluster dependency

For easier install with dependencies (pip, conda), replace Pycluster with clustering from scipy (if possible, otherwise use sklearn).

Get gene/peak name in output

My question is: “Is it somehow possible to save the gene name in the output bed file of fluff_heatmap.py”

At the moment it only saves chromosome, start, end, cluster number, some score and strand.
It would be very useful to have a gene name as well.

default number of bins in bandplot

100 is too high for most datasets. It probably should be something like 20.

switch order of required versus optional arguments

Have required arguments first, then the optional arguments.

Order of command line arguments

I think it's good to re-order the command line arguments:

For consistency between the three tools
To have the most useful/used arguments on top

missing dependencies

Hi, just a quick thing: Pycluster and pp are not listed as dependencies but they are needed by fluff_heatmap.py to work

Add new method of normalization (according to tag in each region)

Give meaningful error if input BED file does not exist

If the input BED file does not exist, the current error is very cryptic. We should just add a check for the input file, and also let load_heatmap_data produce a meaningful error.

Warning: Running fluff with too many files might make you system use enormous amount of memory!
Pearson distance method
Loading data
An error has occured during the function execution
Traceback (most recent call last):
File "/usr/lib64/python2.7/site-packages/pp-1.6.1-py2.7.egg/ppworker.py", line 90, in run
__result = __f(*__args)
File "", line 6, in load_heatmap_data
IOError: [Errno 2] No such file or directory: 'all_ifferential_peaks_single.bed'
Parallel Python (pp) not installed, can't load data in parallel
Traceback (most recent call last):
File "/usr/bin/fluff_heatmap.py", line 5, in
pkg_resources.run_script('fluff==1.3', 'fluff_heatmap.py')
File "/usr/lib64/python2.7/site-packages/distribute-0.6.34-py2.7.egg/pkg_resources.py", line 505, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/usr/lib64/python2.7/site-packages/distribute-0.6.34-py2.7.egg/pkg_resources.py", line 1245, in run_script
execfile(script_filename, namespace, namespace)
File "/usr/lib64/python2.7/site-packages/fluff-1.3-py2.7.egg/EGG-INFO/scripts/fluff_heatmap.py", line 259, in
clus = hstack([norm_data[t] for i,t in enumerate(tracks) if (not pick or i in pick)])
KeyError: 'B188_iDC_script.bam'

Overlay in bandplot

Overlay the profiles of multiple data tracks in one graph

RuntimeWarning and UnboundLocalError when using kmedoids

K-medoids/PAM(Partitioning Around Medoids) clustering
/usr/lib64/python2.7/site-packages/numpy/core/_methods.py:55: RuntimeWarning: invalid value encountered in true_divide
out=ret, casting='unsafe', subok=False)
Loading data
Parallel Python (pp) not installed, can't load data in parallel
Traceback (most recent call last):
File "../../../repo/fluff/scripts/fluff_heatmap.py", line 339, in
data, regions = load_data(featurefile, bins, extend_up, extend_down, rmdup, rpkm, rmrepeats, fragmentsizes)
File "../../../repo/fluff/scripts/fluff_heatmap.py", line 246, in load_data
jobs.append(job_server.submit(load_heatmap_data, (featurefile, datafile, amount_bins, extend_up, extend_down, rmdup, rpkm, rmrepeats, fragmentsizes), (), ("tempfile","sys","os","fluff.fluffio","numpy")))
UnboundLocalError: local variable 'jobs' referenced before assignment

User to be able to define the number of CPUs

No parameter for extending each read to a specified length in heatmap and maybe other two

fluff profile: scale is broken

RunTimError when using "merge mirrored clusters"

/home/george/repo/fluff/fluff/util.py:195: RuntimeWarning: invalid value encountered in greater_equal
if (numpy.array(ps) >= cutoff).all():

non-numeric value in 4th column of BED file causes fluff_bandplot.py to crash

Oh, and leaving it empty also doesn't work!

Traceback (most recent call last):
File "/usr/bin/fluff_bandplot.py", line 5, in
pkg_resources.run_script('fluff==1.0', 'fluff_bandplot.py')
File "/usr/lib64/python2.7/site-packages/pkg_resources.py", line 506, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/usr/lib64/python2.7/site-packages/pkg_resources.py", line 1246, in run_script
execfile(script_filename, namespace, namespace)
File "/usr/lib64/python2.7/site-packages/fluff-1.0-py2.7.egg/EGG-INFO/scripts/fluff_bandplot.py", line 82, in
cluster_data = load_bed_clusters(clust_file)
File "/usr/lib64/python2.7/site-packages/fluff-1.0-py2.7.egg/fluff/fluffio.py", line 136, in load_bed_clusters
cluster_data.setdefault(int(f.name), []).append("%s:%s-%s" % (f.chrom, f.start, f.end))
ValueError: invalid literal for int() with base 10: 'ranger_region_210686_212359_pval_3.06962e-32_fdrPassed_407'

get original peaks with cluster from fluff_heatmap.py

Now the peaks are extended, however it should be possible to get the original peaks.

replace pp with multiprocessing

For easier install with dependencies (pip, conda), replace pp with multiprocessing.

list index out of range error !

Hi,

I am using fluff to plot heatmap: fluff heatmap -f mergePeaks.bed -d bed1 bed2 bed3 bed4 bed5 -C k -k 5 -g -M Pearson -p 12 -o kmeans5_dynamics But it comes with an error :

Traceback (most recent call last):
  File "/software/anaconda/bin/fluff", line 4, in <module>
    __import__('pkg_resources').run_script('biofluff==2.0.2', 'fluff')
  File "/software/anaconda/lib/python2.7/site-packages/setuptools-20.3-py2.7.egg/pkg_resources/__init__.py", line 726, in run_script

  File "/software/anaconda/lib/python2.7/site-packages/setuptools-20.3-py2.7.egg/pkg_resources/__init__.py", line 1484, in run_script

  File "/software/anaconda/lib/python2.7/site-packages/biofluff-2.0.2-py2.7.egg-info/scripts/fluff", line 327, in <module>
    args.func(args)
  File "/software/anaconda/lib/python2.7/site-packages/fluff/commands/heatmap.py", line 136, in heatmap
    clus = hstack([norm_data[t] for i, t in enumerate(tracks) if (not pick or i in pick)])
  File "/software/anaconda/lib/python2.7/site-packages/numpy/core/shape_base.py", line 277, in hstack
    if arrs[0].ndim == 1:
IndexError: list index out of range

Could you please give some guides where I can debug on ?

Thanks in advance !

Support for paired-end tracks

Optionally: use genomic fragments instead of extending reads in fluff_profile.py.

Cluster output from fluff_heatmap.py does not work in fluff_bandplot.py

Input for clustering with fluff_heatmap.py: 3-column BED file
The output _clusters.bed file should be usable by fluff_bandplot.py, but strand is in 5th column instead of 6th.

heatmap: specify datasets to cluster

Specifiy specific datasets to use for clustering, others only for visualization

readcount output files

Add the region to all the readcount output files, this makes it easier to use them for analysis.

For reference-based analyses only?

Hi, it seems like fluff works on reads after mapping to a reference - am I right? This would be useful information to add up front somewhere ;).

when plot heatmap using bigwig files

Hi,
I am impressed by the heatmap parts in fluff.
I can get the plot from the example, but this error came up when I use own data:

CMD

$ fluff heatmap -f tsses_1kb.bed -d WT_H2A_Z.bw arp6_H2A.Z.bw WT_H3K4me3.bw arp6_H3K4me3.bw -o test_heatmap

Euclidean distance method
Loading data
[bwGetOverlappingIntervalsCore] Got an error

I can use the bigwig files to plot the profiles of select genes, but it doesn't work when use the bw files to plot the heatmap.

Please help me! Thank you very much!

Xiaozhuan Dai

Read count matrix as output file

I was thinking that perhaps it would be convenient to save a matrix with tag counts in bins (I assume the one which is used for making heatmaps) in a separate output file. Later on, the matrix can be used in R for some customized plots. I understand that many people probably won't use this matrix at all, but if it is not difficult to implement, can be a nice option

Option to use specific ColorBrewer colors from a palette

And include info on where to find the palettes.

Data loaded twice?

Isn't data loaded twice here, when the dynam option is set?

https://github.com/simonvh/fluff/blob/DEV_v2Alpha/fluff/commands/heatmap.py#L119-L128

trouble in installing biofluff, pycluster not found

hello,
can some body please help me with installing biofluff?
I ran "sudo pip install biofluff" command.
here is the outcome i see -

Requirement already satisfied (use --upgrade to upgrade): biofluff in /usr/lib/python2.7/site-packages/biofluff-2.1.0-py2.7.egg
Requirement already satisfied (use --upgrade to upgrade): pysam in /usr/lib64/python2.7/site-packages (from biofluff)
Collecting HTSeq (from biofluff)
Downloading HTSeq-0.6.1.tar.gz (226kB)
100% |████████████████████████████████| 235kB 3.5MB/s
Requirement already satisfied (use --upgrade to upgrade): numpy in /usr/lib64/python2.7/site-packages (from biofluff)
Requirement already satisfied (use --upgrade to upgrade): scipy in /usr/lib64/python2.7/site-packages (from biofluff)
Requirement already satisfied (use --upgrade to upgrade): matplotlib in /usr/lib64/python2.7/site-packages (from biofluff)
Collecting colorbrewer (from biofluff)
Downloading colorbrewer-0.1.1.tar.gz
Collecting pybedtools (from biofluff)
Downloading pybedtools-0.7.7.tar.gz (12.6MB)
100% |████████████████████████████████| 12.6MB 70kB/s
Collecting Pycluster (from biofluff)
Could not find a version that satisfies the requirement Pycluster (from biofluff) (from versions: )
No matching distribution found for Pycluster (from biofluff)

Thanks
AMoL

repeat filtering

Currently, repeats (multi-mapped reads) are filtered out based on tags. We'd better use mapping quality, which is not bwa-specific and pretty consistently used amongst different aligners. In addition, once that is changed, update the command-line documentation to remove the "bwa-only" message.

Update fluffio.py to work on cluster files with no strand information

Currently, if a BED file contains only 4 columns for use with fluff_bandplot.py, fluff will crash. If there's no strand (no 6th column), we can assume that the strand is +

Support for replicates

error - invalid DISPLAY variable

Hi,

when I try to run fluff either on my data or on the example data I get this error:

Traceback (most recent call last):
File "anaconda/bin/fluff", line 4, in
import('pkg_resources').run_script('biofluff==2.1.0', 'fluff')
File "anaconda/lib/python2.7/site-packages/setuptools-20.3-py2.7.egg/pkg_resources/init.py", line 726, in run_script
File "anaconda/lib/python2.7/site-packages/setuptools-20.3-py2.7.egg/pkg_resources/init.py", line 1484, in run_script
File "anaconda/lib/python2.7/site-packages/biofluff-2.1.0-py2.7.egg-info/scripts/fluff", line 9, in
args.func(args)
File "anaconda/lib/python2.7/site-packages/fluff/commands/heatmap.py", line 235, in heatmap
heatmap_plot(data, ind[::-1], outfile, tracks, titles, colors, bgcolors, scale, tscale, labels, fontsize)
File "anaconda/lib/python2.7/site-packages/fluff/plot.py", line 85, in heatmap_plot
fig = plt.figure(figsize=(plot_width, plot_height))
File "anaconda/lib/python2.7/site-packages/matplotlib/pyplot.py", line 527, in figure
**kwargs)
File "anaconda/lib/python2.7/site-packages/matplotlib/backends/backend_qt4agg.py", line 46, in new_figure_manager
return new_figure_manager_given_figure(num, thisFig)
File "anaconda/lib/python2.7/site-packages/matplotlib/backends/backend_qt4agg.py", line 53, in new_figure_manager_given_figure
canvas = FigureCanvasQTAgg(figure)
File "anaconda/lib/python2.7/site-packages/matplotlib/backends/backend_qt4agg.py", line 76, in init
FigureCanvasQT.init(self, figure)
File "anaconda/lib/python2.7/site-packages/matplotlib/backends/backend_qt4.py", line 68, in init
_create_qApp()
File "anaconda/lib/python2.7/site-packages/matplotlib/backends/backend_qt5.py", line 138, in _create_qApp
raise RuntimeError('Invalid DISPLAY variable')
RuntimeError: Invalid DISPLAY variable

Any suggestion?

Thanks!

heatmap / bandplot: relative to an input/control?

Give an clear error message if the -p argument is out of range for the number of tracks

See issue #47

label clusters in heatmap

Warning in case of too much input data

In case there are a lot of peaks in combination with a lot of BAM files the memory usage can become enormous. Maybe fluff should check for reasonable limits and warn or refuse to run without explicit affirmation?

fluff_bandplot.py

fluff_bandplot.py is not working in 1.43. It keeps running for days without any result or output.

Width of heatmap

Keep the width per experiment constant. Optionally add an argument to control the width.

IndexError when plot dynamic patterns

Hi,
I am impressed by the dynamic patterns parts in fluff.
I can get the dynamic plot from the example, but this error came up when I use own date:

CMD

$ fluff heatmap -f tsses_2kb_new.bed -d WT_H2A_Z.sort.bam WT_H3K4me3.sort.bam WT_H3K27me3.sort.bam WT_pol.sort.bam -C k -k 5 -g -M Pearson -o kmeans5_dynamics

Pearson distance method
Loading data
K-means clustering
Loading data
Traceback (most recent call last):
File "/root/anaconda2/bin/fluff", line 4, in
import('pkg_resources').run_script('biofluff==2.0.1', 'fluff')
File "/root/anaconda2/lib/python2.7/site-packages/setuptools-20.3-py2.7.egg/pkg_resources/init.py", line 726, in run_script
File "/root/anaconda2/lib/python2.7/site-packages/setuptools-20.3-py2.7.egg/pkg_resources/init.py", line 1484, in run_script
File "/root/anaconda2/lib/python2.7/site-packages/biofluff-2.0.1-py2.7.egg-info/scripts/fluff", line 327, in
args.func(args)
File "/root/anaconda2/lib/python2.7/site-packages/fluff/commands/heatmap.py", line 181, in heatmap
for (chrom, start, end, gene, strand), cluster in zip(array(regions, dtype="object")[ind], array(labels)[ind]):
IndexError: index 27201 is out of bounds for axis 0 with size 27200

Bed file attached here

tsses_2kb_new.bed.txt

Please help me!

Youhuang Bai

'bottom=%s, top=%s') % (bottom, top))
/usr/lib64/python2.7/site-packages/matplotlib-1.3.1-py2.7-linux-x86_64.egg/matplotlib/axis.py:1004: UserWarning: Unable to find pixel distance along axis for interval padding; assuming no interval padding needed.
warnings.warn("Unable to find pixel distance along axis for interval padding; assuming no interval padding needed.")
/usr/lib64/python2.7/site-packages/matplotlib-1.3.1-py2.7-linux-x86_64.egg/matplotlib/axis.py:1011: UserWarning: Unable to find pixel distance along axis for interval padding; assuming no interval padding needed.
warnings.warn("Unable to find pixel distance along axis for interval padding; assuming no interval padding needed.")

simonvh / fluff Goto Github PK

fluff's People

Contributors

Stargazers

Watchers

Forkers

fluff's Issues

CMD

CMD

Bed file attached here

Recommend Projects

Recommend Topics

Recommend Org

Jobs