GithubHelp home page GithubHelp logo

simonvh / fluff Goto Github PK

View Code? Open in Web Editor NEW
68.0 68.0 15.0 175.88 MB

Fluff is a Python package that contains several scripts to produce pretty, publication-quality figures for next-generation sequencing experiments.

License: MIT License

Python 100.00%

fluff's People

Contributors

georgeg9 avatar maarten-vd-sande avatar simonvh avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

fluff's Issues

fluff_heatmap.py: track and comment lines

Currently if a track line or a comment line (#) is present in the BED file, fluff_heatmap.py fails to run. It would be a good idea to allow BED files that contain these lines.

remove Pycluster dependency

For easier install with dependencies (pip, conda), replace Pycluster with clustering from scipy (if possible, otherwise use sklearn).

Get gene/peak name in output

My question is: “Is it somehow possible to save the gene name in the output bed file of fluff_heatmap.py”

At the moment it only saves chromosome, start, end, cluster number, some score and strand.
It would be very useful to have a gene name as well.

Order of command line arguments

I think it's good to re-order the command line arguments:

  • For consistency between the three tools
  • To have the most useful/used arguments on top

missing dependencies

Hi, just a quick thing: Pycluster and pp are not listed as dependencies but they are needed by fluff_heatmap.py to work

Give meaningful error if input BED file does not exist

If the input BED file does not exist, the current error is very cryptic. We should just add a check for the input file, and also let load_heatmap_data produce a meaningful error.

Warning: Running fluff with too many files might make you system use enormous amount of memory!
Pearson distance method
Loading data
An error has occured during the function execution
Traceback (most recent call last):
File "/usr/lib64/python2.7/site-packages/pp-1.6.1-py2.7.egg/ppworker.py", line 90, in run
__result = __f(*__args)
File "", line 6, in load_heatmap_data
IOError: [Errno 2] No such file or directory: 'all_ifferential_peaks_single.bed'
Parallel Python (pp) not installed, can't load data in parallel
Traceback (most recent call last):
File "/usr/bin/fluff_heatmap.py", line 5, in
pkg_resources.run_script('fluff==1.3', 'fluff_heatmap.py')
File "/usr/lib64/python2.7/site-packages/distribute-0.6.34-py2.7.egg/pkg_resources.py", line 505, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/usr/lib64/python2.7/site-packages/distribute-0.6.34-py2.7.egg/pkg_resources.py", line 1245, in run_script
execfile(script_filename, namespace, namespace)
File "/usr/lib64/python2.7/site-packages/fluff-1.3-py2.7.egg/EGG-INFO/scripts/fluff_heatmap.py", line 259, in
clus = hstack([norm_data[t] for i,t in enumerate(tracks) if (not pick or i in pick)])
KeyError: 'B188_iDC_script.bam'

RuntimeWarning and UnboundLocalError when using kmedoids

K-medoids/PAM(Partitioning Around Medoids) clustering
/usr/lib64/python2.7/site-packages/numpy/core/_methods.py:55: RuntimeWarning: invalid value encountered in true_divide
out=ret, casting='unsafe', subok=False)
Loading data
Parallel Python (pp) not installed, can't load data in parallel
Traceback (most recent call last):
File "../../../repo/fluff/scripts/fluff_heatmap.py", line 339, in
data, regions = load_data(featurefile, bins, extend_up, extend_down, rmdup, rpkm, rmrepeats, fragmentsizes)
File "../../../repo/fluff/scripts/fluff_heatmap.py", line 246, in load_data
jobs.append(job_server.submit(load_heatmap_data, (featurefile, datafile, amount_bins, extend_up, extend_down, rmdup, rpkm, rmrepeats, fragmentsizes), (), ("tempfile","sys","os","fluff.fluffio","numpy")))
UnboundLocalError: local variable 'jobs' referenced before assignment

non-numeric value in 4th column of BED file causes fluff_bandplot.py to crash

Oh, and leaving it empty also doesn't work!

Traceback (most recent call last):
File "/usr/bin/fluff_bandplot.py", line 5, in
pkg_resources.run_script('fluff==1.0', 'fluff_bandplot.py')
File "/usr/lib64/python2.7/site-packages/pkg_resources.py", line 506, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/usr/lib64/python2.7/site-packages/pkg_resources.py", line 1246, in run_script
execfile(script_filename, namespace, namespace)
File "/usr/lib64/python2.7/site-packages/fluff-1.0-py2.7.egg/EGG-INFO/scripts/fluff_bandplot.py", line 82, in
cluster_data = load_bed_clusters(clust_file)
File "/usr/lib64/python2.7/site-packages/fluff-1.0-py2.7.egg/fluff/fluffio.py", line 136, in load_bed_clusters
cluster_data.setdefault(int(f.name), []).append("%s:%s-%s" % (f.chrom, f.start, f.end))
ValueError: invalid literal for int() with base 10: 'ranger_region_210686_212359_pval_3.06962e-32_fdrPassed_407'

list index out of range error !

Hi,

I am using fluff to plot heatmap: fluff heatmap -f mergePeaks.bed -d bed1 bed2 bed3 bed4 bed5 -C k -k 5 -g -M Pearson -p 12 -o kmeans5_dynamics But it comes with an error :

Traceback (most recent call last):
  File "/software/anaconda/bin/fluff", line 4, in <module>
    __import__('pkg_resources').run_script('biofluff==2.0.2', 'fluff')
  File "/software/anaconda/lib/python2.7/site-packages/setuptools-20.3-py2.7.egg/pkg_resources/__init__.py", line 726, in run_script

  File "/software/anaconda/lib/python2.7/site-packages/setuptools-20.3-py2.7.egg/pkg_resources/__init__.py", line 1484, in run_script

  File "/software/anaconda/lib/python2.7/site-packages/biofluff-2.0.2-py2.7.egg-info/scripts/fluff", line 327, in <module>
    args.func(args)
  File "/software/anaconda/lib/python2.7/site-packages/fluff/commands/heatmap.py", line 136, in heatmap
    clus = hstack([norm_data[t] for i, t in enumerate(tracks) if (not pick or i in pick)])
  File "/software/anaconda/lib/python2.7/site-packages/numpy/core/shape_base.py", line 277, in hstack
    if arrs[0].ndim == 1:
IndexError: list index out of range

Could you please give some guides where I can debug on ?

Thanks in advance !

readcount output files

Add the region to all the readcount output files, this makes it easier to use them for analysis.

For reference-based analyses only?

Hi, it seems like fluff works on reads after mapping to a reference - am I right? This would be useful information to add up front somewhere ;).

when plot heatmap using bigwig files

Hi,
I am impressed by the heatmap parts in fluff.
I can get the plot from the example, but this error came up when I use own data:

CMD

$ fluff heatmap -f tsses_1kb.bed -d WT_H2A_Z.bw arp6_H2A.Z.bw WT_H3K4me3.bw arp6_H3K4me3.bw -o test_heatmap

Euclidean distance method
Loading data
[bwGetOverlappingIntervalsCore] Got an error

I can use the bigwig files to plot the profiles of select genes, but it doesn't work when use the bw files to plot the heatmap.

Please help me! Thank you very much!

Xiaozhuan Dai

Read count matrix as output file

I was thinking that perhaps it would be convenient to save a matrix with tag counts in bins (I assume the one which is used for making heatmaps) in a separate output file. Later on, the matrix can be used in R for some customized plots. I understand that many people probably won't use this matrix at all, but if it is not difficult to implement, can be a nice option

trouble in installing biofluff, pycluster not found

hello,
can some body please help me with installing biofluff?
I ran "sudo pip install biofluff" command.
here is the outcome i see -

Requirement already satisfied (use --upgrade to upgrade): biofluff in /usr/lib/python2.7/site-packages/biofluff-2.1.0-py2.7.egg
Requirement already satisfied (use --upgrade to upgrade): pysam in /usr/lib64/python2.7/site-packages (from biofluff)
Collecting HTSeq (from biofluff)
Downloading HTSeq-0.6.1.tar.gz (226kB)
100% |████████████████████████████████| 235kB 3.5MB/s
Requirement already satisfied (use --upgrade to upgrade): numpy in /usr/lib64/python2.7/site-packages (from biofluff)
Requirement already satisfied (use --upgrade to upgrade): scipy in /usr/lib64/python2.7/site-packages (from biofluff)
Requirement already satisfied (use --upgrade to upgrade): matplotlib in /usr/lib64/python2.7/site-packages (from biofluff)
Collecting colorbrewer (from biofluff)
Downloading colorbrewer-0.1.1.tar.gz
Collecting pybedtools (from biofluff)
Downloading pybedtools-0.7.7.tar.gz (12.6MB)
100% |████████████████████████████████| 12.6MB 70kB/s
Collecting Pycluster (from biofluff)
Could not find a version that satisfies the requirement Pycluster (from biofluff) (from versions: )
No matching distribution found for Pycluster (from biofluff)

Thanks
AMoL

repeat filtering

Currently, repeats (multi-mapped reads) are filtered out based on tags. We'd better use mapping quality, which is not bwa-specific and pretty consistently used amongst different aligners. In addition, once that is changed, update the command-line documentation to remove the "bwa-only" message.

error - invalid DISPLAY variable

Hi,

when I try to run fluff either on my data or on the example data I get this error:

Traceback (most recent call last):
File "anaconda/bin/fluff", line 4, in
import('pkg_resources').run_script('biofluff==2.1.0', 'fluff')
File "anaconda/lib/python2.7/site-packages/setuptools-20.3-py2.7.egg/pkg_resources/init.py", line 726, in run_script
File "anaconda/lib/python2.7/site-packages/setuptools-20.3-py2.7.egg/pkg_resources/init.py", line 1484, in run_script
File "anaconda/lib/python2.7/site-packages/biofluff-2.1.0-py2.7.egg-info/scripts/fluff", line 9, in
args.func(args)
File "anaconda/lib/python2.7/site-packages/fluff/commands/heatmap.py", line 235, in heatmap
heatmap_plot(data, ind[::-1], outfile, tracks, titles, colors, bgcolors, scale, tscale, labels, fontsize)
File "anaconda/lib/python2.7/site-packages/fluff/plot.py", line 85, in heatmap_plot
fig = plt.figure(figsize=(plot_width, plot_height))
File "anaconda/lib/python2.7/site-packages/matplotlib/pyplot.py", line 527, in figure
**kwargs)
File "anaconda/lib/python2.7/site-packages/matplotlib/backends/backend_qt4agg.py", line 46, in new_figure_manager
return new_figure_manager_given_figure(num, thisFig)
File "anaconda/lib/python2.7/site-packages/matplotlib/backends/backend_qt4agg.py", line 53, in new_figure_manager_given_figure
canvas = FigureCanvasQTAgg(figure)
File "anaconda/lib/python2.7/site-packages/matplotlib/backends/backend_qt4agg.py", line 76, in init
FigureCanvasQT.init(self, figure)
File "anaconda/lib/python2.7/site-packages/matplotlib/backends/backend_qt4.py", line 68, in init
_create_qApp()
File "anaconda/lib/python2.7/site-packages/matplotlib/backends/backend_qt5.py", line 138, in _create_qApp
raise RuntimeError('Invalid DISPLAY variable')
RuntimeError: Invalid DISPLAY variable

Any suggestion?

Thanks!

Warning in case of too much input data

In case there are a lot of peaks in combination with a lot of BAM files the memory usage can become enormous. Maybe fluff should check for reasonable limits and warn or refuse to run without explicit affirmation?

fluff_bandplot.py

fluff_bandplot.py is not working in 1.43. It keeps running for days without any result or output.

Width of heatmap

Keep the width per experiment constant. Optionally add an argument to control the width.

IndexError when plot dynamic patterns

Hi,
I am impressed by the dynamic patterns parts in fluff.
I can get the dynamic plot from the example, but this error came up when I use own date:

CMD

$ fluff heatmap -f tsses_2kb_new.bed -d WT_H2A_Z.sort.bam WT_H3K4me3.sort.bam WT_H3K27me3.sort.bam WT_pol.sort.bam -C k -k 5 -g -M Pearson -o kmeans5_dynamics

Pearson distance method
Loading data
K-means clustering
Loading data
Traceback (most recent call last):
File "/root/anaconda2/bin/fluff", line 4, in
import('pkg_resources').run_script('biofluff==2.0.1', 'fluff')
File "/root/anaconda2/lib/python2.7/site-packages/setuptools-20.3-py2.7.egg/pkg_resources/init.py", line 726, in run_script
File "/root/anaconda2/lib/python2.7/site-packages/setuptools-20.3-py2.7.egg/pkg_resources/init.py", line 1484, in run_script
File "/root/anaconda2/lib/python2.7/site-packages/biofluff-2.0.1-py2.7.egg-info/scripts/fluff", line 327, in
args.func(args)
File "/root/anaconda2/lib/python2.7/site-packages/fluff/commands/heatmap.py", line 181, in heatmap
for (chrom, start, end, gene, strand), cluster in zip(array(regions, dtype="object")[ind], array(labels)[ind]):
IndexError: index 27201 is out of bounds for axis 0 with size 27200

Bed file attached here

tsses_2kb_new.bed.txt

Please help me!

Youhuang Bai

pip and pypi

Make sure fluff can be installed via pip (including dependencies) and then (once we have a good release version) add it to pypi.

profile - matplot

/usr/lib64/python2.7/site-packages/matplotlib-1.3.1-py2.7-linux-x86_64.egg/matplotlib/axes.py:2760: UserWarning: Attempting to set identical bottom==top results
in singular transformations; automatically expanding.
bottom=0, top=0.0

  • 'bottom=%s, top=%s') % (bottom, top))
    /usr/lib64/python2.7/site-packages/matplotlib-1.3.1-py2.7-linux-x86_64.egg/matplotlib/axis.py:1004: UserWarning: Unable to find pixel distance along axis for interval padding; assuming no interval padding needed.
    warnings.warn("Unable to find pixel distance along axis for interval padding; assuming no interval padding needed.")
    /usr/lib64/python2.7/site-packages/matplotlib-1.3.1-py2.7-linux-x86_64.egg/matplotlib/axis.py:1011: UserWarning: Unable to find pixel distance along axis for interval padding; assuming no interval padding needed.
    warnings.warn("Unable to find pixel distance along axis for interval padding; assuming no interval padding needed.")

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.