GithubHelp home page GithubHelp logo

wrong 'stats' output? about pybigwig HOT 11 CLOSED

deeptools avatar deeptools commented on June 17, 2024
wrong 'stats' output?

from pybigwig.

Comments (11)

dpryan79 avatar dpryan79 commented on June 17, 2024

This is one of the weird thing about bigWig files. Statistics often don't function on the actual values, but on values in a zoom level. If you use bigWigSummary from the command line, what sort of value do you get?

from pybigwig.

dpryan79 avatar dpryan79 commented on June 17, 2024

I guess I can just check myself:

$ bigWigSummary http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeMapability/wgEncodeCrgMapabilityAlign75mer.bigWig chr1 89294 91629 1
0.201209

And in python:

>>> import pyBigWig
>>> bw = pyBigWig.open("http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeMapability/wgEncodeCrgMapabilityAlign75mer.bigWig")
>>> bw.stats('chr1', 89294, 91629)
[0.20120902053804418]

So I get the same values for both.

from pybigwig.

michael-imbeault avatar michael-imbeault commented on June 17, 2024

Stumbled upon the same - I think the expected behavior from a naive user is that if I ask for an array of 100 basepairs from a bigwig, the stats command should give me the mean of those 100 basepairs and not anything else dependant on zoom levels. It should at least be made very clear in the documentation.

For now I just ask for the array and wrap it in numpy.mean, no real loss of performance.

Real nice library aside from that, 10 times the performance I was getting with bx-python :)

from pybigwig.

dpryan79 avatar dpryan79 commented on June 17, 2024

I'll definitely update the documentation for that, since I too found this weird when I wrote libBigWig. I've actually been meaning to create an exact_stats() function, which would do what you want. Given how libBigWig is implemented that should be fairly straight forward. I'll leave this open as a reminder to do that!

from pybigwig.

dpryan79 avatar dpryan79 commented on June 17, 2024

BTW, if you have a better name in mind then just let me know. I could alternatively just add an option to the current stats function to do this (whatever others find the most useful is fine by me).

from pybigwig.

dpryan79 avatar dpryan79 commented on June 17, 2024

I have an exact_stats branch that adds the exact option to the stats command. That needs some more testing (and documentation), but should hopefully handle this. The invocation would be:

bw.stats("chr1", 1, 1000000, exact=True)

from pybigwig.

francois-a avatar francois-a commented on June 17, 2024

Thanks for the quick response and clarifications (I indeed expected the naive behavior) and the patch. I ran a few tests and it works fine for me. This runs significantly faster (3-4x) than wrapping with np.mean, a nice improvement when running over many regions.

from pybigwig.

michael-imbeault avatar michael-imbeault commented on June 17, 2024

New branch working for me too - 2-3 times faster than wrapping with np.mean, thanks a lot!

from pybigwig.

dpryan79 avatar dpryan79 commented on June 17, 2024

I've pushed a small change that should increase the precision of this a bit (I'd be surprised if there's any speed change). I've also added a section to the readme. I'll continue mulling this over for a day or two and then merge into the master branch and make a new release if there's nothing new that needs to be added.

from pybigwig.

dpryan79 avatar dpryan79 commented on June 17, 2024

I'm hoping to release version 0.2.6 today (assuming Travis CI starts working again), which will include this.

from pybigwig.

dpryan79 avatar dpryan79 commented on June 17, 2024

This is now release 0.2.6. I'll try to get this release on pypi and in conda tonight. Thanks for the suggestion.

from pybigwig.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.