simonvh / fluff Goto Github PK
View Code? Open in Web Editor NEWFluff is a Python package that contains several scripts to produce pretty, publication-quality figures for next-generation sequencing experiments.
License: MIT License
Fluff is a Python package that contains several scripts to produce pretty, publication-quality figures for next-generation sequencing experiments.
License: MIT License
Currently if a track line or a comment line (#) is present in the BED file, fluff_heatmap.py fails to run. It would be a good idea to allow BED files that contain these lines.
It's ugly. Fix that!
Hi,
I would like to use fluff profile to show my tracks, is it possible to manually set different vertical viewing range(y axis) for each track like UCSC genome browser ?
By the way, can I set different colors for each track by -c
option ?
Thanks for your help !
For easier install with dependencies (pip, conda), replace Pycluster with clustering from scipy (if possible, otherwise use sklearn).
My question is: “Is it somehow possible to save the gene name in the output bed file of fluff_heatmap.py”
At the moment it only saves chromosome, start, end, cluster number, some score and strand.
It would be very useful to have a gene name as well.
100 is too high for most datasets. It probably should be something like 20.
Have required arguments first, then the optional arguments.
I think it's good to re-order the command line arguments:
Hi, just a quick thing: Pycluster and pp are not listed as dependencies but they are needed by fluff_heatmap.py to work
If the input BED file does not exist, the current error is very cryptic. We should just add a check for the input file, and also let load_heatmap_data produce a meaningful error.
Warning: Running fluff with too many files might make you system use enormous amount of memory!
Pearson distance method
Loading data
An error has occured during the function execution
Traceback (most recent call last):
File "/usr/lib64/python2.7/site-packages/pp-1.6.1-py2.7.egg/ppworker.py", line 90, in run
__result = __f(*__args)
File "", line 6, in load_heatmap_data
IOError: [Errno 2] No such file or directory: 'all_ifferential_peaks_single.bed'
Parallel Python (pp) not installed, can't load data in parallel
Traceback (most recent call last):
File "/usr/bin/fluff_heatmap.py", line 5, in
pkg_resources.run_script('fluff==1.3', 'fluff_heatmap.py')
File "/usr/lib64/python2.7/site-packages/distribute-0.6.34-py2.7.egg/pkg_resources.py", line 505, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/usr/lib64/python2.7/site-packages/distribute-0.6.34-py2.7.egg/pkg_resources.py", line 1245, in run_script
execfile(script_filename, namespace, namespace)
File "/usr/lib64/python2.7/site-packages/fluff-1.3-py2.7.egg/EGG-INFO/scripts/fluff_heatmap.py", line 259, in
clus = hstack([norm_data[t] for i,t in enumerate(tracks) if (not pick or i in pick)])
KeyError: 'B188_iDC_script.bam'
Overlay the profiles of multiple data tracks in one graph
K-medoids/PAM(Partitioning Around Medoids) clustering
/usr/lib64/python2.7/site-packages/numpy/core/_methods.py:55: RuntimeWarning: invalid value encountered in true_divide
out=ret, casting='unsafe', subok=False)
Loading data
Parallel Python (pp) not installed, can't load data in parallel
Traceback (most recent call last):
File "../../../repo/fluff/scripts/fluff_heatmap.py", line 339, in
data, regions = load_data(featurefile, bins, extend_up, extend_down, rmdup, rpkm, rmrepeats, fragmentsizes)
File "../../../repo/fluff/scripts/fluff_heatmap.py", line 246, in load_data
jobs.append(job_server.submit(load_heatmap_data, (featurefile, datafile, amount_bins, extend_up, extend_down, rmdup, rpkm, rmrepeats, fragmentsizes), (), ("tempfile","sys","os","fluff.fluffio","numpy")))
UnboundLocalError: local variable 'jobs' referenced before assignment
No parameter for extending each read to a specified length in heatmap and maybe other two
/home/george/repo/fluff/fluff/util.py:195: RuntimeWarning: invalid value encountered in greater_equal
if (numpy.array(ps) >= cutoff).all():
Oh, and leaving it empty also doesn't work!
Traceback (most recent call last):
File "/usr/bin/fluff_bandplot.py", line 5, in
pkg_resources.run_script('fluff==1.0', 'fluff_bandplot.py')
File "/usr/lib64/python2.7/site-packages/pkg_resources.py", line 506, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/usr/lib64/python2.7/site-packages/pkg_resources.py", line 1246, in run_script
execfile(script_filename, namespace, namespace)
File "/usr/lib64/python2.7/site-packages/fluff-1.0-py2.7.egg/EGG-INFO/scripts/fluff_bandplot.py", line 82, in
cluster_data = load_bed_clusters(clust_file)
File "/usr/lib64/python2.7/site-packages/fluff-1.0-py2.7.egg/fluff/fluffio.py", line 136, in load_bed_clusters
cluster_data.setdefault(int(f.name), []).append("%s:%s-%s" % (f.chrom, f.start, f.end))
ValueError: invalid literal for int() with base 10: 'ranger_region_210686_212359_pval_3.06962e-32_fdrPassed_407'
Now the peaks are extended, however it should be possible to get the original peaks.
For easier install with dependencies (pip, conda), replace pp with multiprocessing.
Hi,
I am using fluff to plot heatmap: fluff heatmap -f mergePeaks.bed -d bed1 bed2 bed3 bed4 bed5 -C k -k 5 -g -M Pearson -p 12 -o kmeans5_dynamics
But it comes with an error :
Traceback (most recent call last):
File "/software/anaconda/bin/fluff", line 4, in <module>
__import__('pkg_resources').run_script('biofluff==2.0.2', 'fluff')
File "/software/anaconda/lib/python2.7/site-packages/setuptools-20.3-py2.7.egg/pkg_resources/__init__.py", line 726, in run_script
File "/software/anaconda/lib/python2.7/site-packages/setuptools-20.3-py2.7.egg/pkg_resources/__init__.py", line 1484, in run_script
File "/software/anaconda/lib/python2.7/site-packages/biofluff-2.0.2-py2.7.egg-info/scripts/fluff", line 327, in <module>
args.func(args)
File "/software/anaconda/lib/python2.7/site-packages/fluff/commands/heatmap.py", line 136, in heatmap
clus = hstack([norm_data[t] for i, t in enumerate(tracks) if (not pick or i in pick)])
File "/software/anaconda/lib/python2.7/site-packages/numpy/core/shape_base.py", line 277, in hstack
if arrs[0].ndim == 1:
IndexError: list index out of range
Could you please give some guides where I can debug on ?
Thanks in advance !
Optionally: use genomic fragments instead of extending reads in fluff_profile.py
.
Input for clustering with fluff_heatmap.py: 3-column BED file
The output _clusters.bed file should be usable by fluff_bandplot.py, but strand is in 5th column instead of 6th.
Specifiy specific datasets to use for clustering, others only for visualization
Add the region to all the readcount output files, this makes it easier to use them for analysis.
Hi, it seems like fluff works on reads after mapping to a reference - am I right? This would be useful information to add up front somewhere ;).
Hi,
I am impressed by the heatmap parts in fluff.
I can get the plot from the example, but this error came up when I use own data:
$ fluff heatmap -f tsses_1kb.bed -d WT_H2A_Z.bw arp6_H2A.Z.bw WT_H3K4me3.bw arp6_H3K4me3.bw -o test_heatmap
Euclidean distance method
Loading data
[bwGetOverlappingIntervalsCore] Got an error
I can use the bigwig files to plot the profiles of select genes, but it doesn't work when use the bw files to plot the heatmap.
Please help me! Thank you very much!
Xiaozhuan Dai
I was thinking that perhaps it would be convenient to save a matrix with tag counts in bins (I assume the one which is used for making heatmaps) in a separate output file. Later on, the matrix can be used in R for some customized plots. I understand that many people probably won't use this matrix at all, but if it is not difficult to implement, can be a nice option
And include info on where to find the palettes.
Isn't data loaded twice here, when the dynam option is set?
https://github.com/simonvh/fluff/blob/DEV_v2Alpha/fluff/commands/heatmap.py#L119-L128
hello,
can some body please help me with installing biofluff?
I ran "sudo pip install biofluff" command.
here is the outcome i see -
Requirement already satisfied (use --upgrade to upgrade): biofluff in /usr/lib/python2.7/site-packages/biofluff-2.1.0-py2.7.egg
Requirement already satisfied (use --upgrade to upgrade): pysam in /usr/lib64/python2.7/site-packages (from biofluff)
Collecting HTSeq (from biofluff)
Downloading HTSeq-0.6.1.tar.gz (226kB)
100% |████████████████████████████████| 235kB 3.5MB/s
Requirement already satisfied (use --upgrade to upgrade): numpy in /usr/lib64/python2.7/site-packages (from biofluff)
Requirement already satisfied (use --upgrade to upgrade): scipy in /usr/lib64/python2.7/site-packages (from biofluff)
Requirement already satisfied (use --upgrade to upgrade): matplotlib in /usr/lib64/python2.7/site-packages (from biofluff)
Collecting colorbrewer (from biofluff)
Downloading colorbrewer-0.1.1.tar.gz
Collecting pybedtools (from biofluff)
Downloading pybedtools-0.7.7.tar.gz (12.6MB)
100% |████████████████████████████████| 12.6MB 70kB/s
Collecting Pycluster (from biofluff)
Could not find a version that satisfies the requirement Pycluster (from biofluff) (from versions: )
No matching distribution found for Pycluster (from biofluff)
Thanks
AMoL
Currently, repeats (multi-mapped reads) are filtered out based on tags. We'd better use mapping quality, which is not bwa-specific and pretty consistently used amongst different aligners. In addition, once that is changed, update the command-line documentation to remove the "bwa-only" message.
Currently, if a BED file contains only 4 columns for use with fluff_bandplot.py, fluff will crash. If there's no strand (no 6th column), we can assume that the strand is +
Hi,
when I try to run fluff either on my data or on the example data I get this error:
Traceback (most recent call last):
File "anaconda/bin/fluff", line 4, in
import('pkg_resources').run_script('biofluff==2.1.0', 'fluff')
File "anaconda/lib/python2.7/site-packages/setuptools-20.3-py2.7.egg/pkg_resources/init.py", line 726, in run_script
File "anaconda/lib/python2.7/site-packages/setuptools-20.3-py2.7.egg/pkg_resources/init.py", line 1484, in run_script
File "anaconda/lib/python2.7/site-packages/biofluff-2.1.0-py2.7.egg-info/scripts/fluff", line 9, in
args.func(args)
File "anaconda/lib/python2.7/site-packages/fluff/commands/heatmap.py", line 235, in heatmap
heatmap_plot(data, ind[::-1], outfile, tracks, titles, colors, bgcolors, scale, tscale, labels, fontsize)
File "anaconda/lib/python2.7/site-packages/fluff/plot.py", line 85, in heatmap_plot
fig = plt.figure(figsize=(plot_width, plot_height))
File "anaconda/lib/python2.7/site-packages/matplotlib/pyplot.py", line 527, in figure
**kwargs)
File "anaconda/lib/python2.7/site-packages/matplotlib/backends/backend_qt4agg.py", line 46, in new_figure_manager
return new_figure_manager_given_figure(num, thisFig)
File "anaconda/lib/python2.7/site-packages/matplotlib/backends/backend_qt4agg.py", line 53, in new_figure_manager_given_figure
canvas = FigureCanvasQTAgg(figure)
File "anaconda/lib/python2.7/site-packages/matplotlib/backends/backend_qt4agg.py", line 76, in init
FigureCanvasQT.init(self, figure)
File "anaconda/lib/python2.7/site-packages/matplotlib/backends/backend_qt4.py", line 68, in init
_create_qApp()
File "anaconda/lib/python2.7/site-packages/matplotlib/backends/backend_qt5.py", line 138, in _create_qApp
raise RuntimeError('Invalid DISPLAY variable')
RuntimeError: Invalid DISPLAY variable
Any suggestion?
Thanks!
See issue #47
In case there are a lot of peaks in combination with a lot of BAM files the memory usage can become enormous. Maybe fluff should check for reasonable limits and warn or refuse to run without explicit affirmation?
fluff_bandplot.py is not working in 1.43. It keeps running for days without any result or output.
Keep the width per experiment constant. Optionally add an argument to control the width.
Hi,
I am impressed by the dynamic patterns parts in fluff.
I can get the dynamic plot from the example, but this error came up when I use own date:
$ fluff heatmap -f tsses_2kb_new.bed -d WT_H2A_Z.sort.bam WT_H3K4me3.sort.bam WT_H3K27me3.sort.bam WT_pol.sort.bam -C k -k 5 -g -M Pearson -o kmeans5_dynamics
Pearson distance method
Loading data
K-means clustering
Loading data
Traceback (most recent call last):
File "/root/anaconda2/bin/fluff", line 4, in
import('pkg_resources').run_script('biofluff==2.0.1', 'fluff')
File "/root/anaconda2/lib/python2.7/site-packages/setuptools-20.3-py2.7.egg/pkg_resources/init.py", line 726, in run_script
File "/root/anaconda2/lib/python2.7/site-packages/setuptools-20.3-py2.7.egg/pkg_resources/init.py", line 1484, in run_script
File "/root/anaconda2/lib/python2.7/site-packages/biofluff-2.0.1-py2.7.egg-info/scripts/fluff", line 327, in
args.func(args)
File "/root/anaconda2/lib/python2.7/site-packages/fluff/commands/heatmap.py", line 181, in heatmap
for (chrom, start, end, gene, strand), cluster in zip(array(regions, dtype="object")[ind], array(labels)[ind]):
IndexError: index 27201 is out of bounds for axis 0 with size 27200
Please help me!
Youhuang Bai
Make sure fluff can be installed via pip (including dependencies) and then (once we have a good release version) add it to pypi.
use samtools to make one
/usr/lib64/python2.7/site-packages/matplotlib-1.3.1-py2.7-linux-x86_64.egg/matplotlib/axes.py:2760: UserWarning: Attempting to set identical bottom==top results
in singular transformations; automatically expanding.
bottom=0, top=0.0
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.