GithubHelp home page GithubHelp logo

Comments (13)

endrebak avatar endrebak commented on September 27, 2024 1

How to test dpryans new PR:

git clone -b numpy [email protected]:dpryan79/pyBigWig.git
cd pyBigWig
python setup.py install # I guess this should overwrite the already installed pyBigWig?

Code you can copy into ipython afterwards:

iimport pyBigWig

import pandas as pd

header = [("chr1", 500)]


bw = pyBigWig.open("test.bw", "w")

bw.addHeader(header)

c = ["chr1"] * 3

s = pd.Series([1, 5, 7])

e = s + 1

v = pd.Series([5, 0, -5])

bw.addEntries(c, list(s), ends=list(e), values=list(v)) # error happens here

bw.close()

For me, running the above code lead to the following error:

# RuntimeError: You must provide a valid set of entries. These can be comprised of any of the following:
# 1. A list of each of chromosomes, start positions, end positions and values.
# 2. A list of each of start positions and values. Also, a chromosome and span must be specified.
# 3. A list values, in which case a single chromosome, start position, span and step must be specified.

Edit: come to think of it, this would have segfaulted in the previous version, AFAICR so good job. Might be an error on my part, somewhere.

from pybigwig.

dpryan79 avatar dpryan79 commented on September 27, 2024 1

Great! I'd like to sit on this a little while and play around with it before I make the next release, since the large increase in code size inevitably created a but or two.

I don't have any good reference for python/numpy/C interop. This was my first C/Python hybrid and was quite a learning curve to make (in comparison, I started and completed py2bit/lib2bit in 2 days, so feel free to "borrow" the boilerplate initialization/finalization stuff from my code). I personally found the python C API documentation to be...not the most helpful thing in the world (especially in regards to reference counting). If you're interested in python/C/numpy interop then think early on about python 2/3 differences and the variety of different numeric types that can get passed in by numpy. The latter is relatively easy to handle but the former I at least personally still find to be confusing.

from pybigwig.

dpryan79 avatar dpryan79 commented on September 27, 2024

Good suggestion, I'll add that in.

Regarding parallelism, it depends on how you try to go about things. You can't pickle bigWigFile objects, so can't pass one to a function and have that work (the same is true for AlignmentFile objects from pysam, python's multithreading support just leaves much to be desired). What we do to get around that is to have each thread open the bigWig file(s). That works well for deepTools. Writing is innately single-threaded.

from pybigwig.

davek44 avatar davek44 commented on September 27, 2024

I've now been tripped up by this multiple times, so I'll resound the request. However, I'm also curious why you can't work with numpy floats. I'd prefer to use the float16 to trade off precision for less memory. Thanks a lot for putting this code online!

from pybigwig.

dpryan79 avatar dpryan79 commented on September 27, 2024

@davek44: I certainly could work with numpy input, it's just a matter of having another dependency.

from pybigwig.

dpryan79 avatar dpryan79 commented on September 27, 2024

@endrebak @davek44 If either of you have a chance, give the numpy branch a try. I've started adding support for numpy arrays when creating bigWig files. I'll try to add a method to output a numpy array in the values() method too, since that'd be faster and more memory efficient. Note that this is only possible if you have numpy installed before you install pyBigWig, since pyBigWig is written in C and needs to link against the numpy .so file.

from pybigwig.

endrebak avatar endrebak commented on September 27, 2024

Most people use conda, so installation order should very seldom be a
problem.

Super btw. Will report back when I get around to it.

On Mon, Oct 31, 2016 at 8:57 AM, Devon Ryan [email protected]
wrote:

@endrebak https://github.com/endrebak @davek44
https://github.com/davek44 If either of you have a chance, give the
numpy branch a try. I've started adding support for numpy arrays when
creating bigWig files. I'll try to add a method to output a numpy array in
the values() method too, since that'd be faster and more memory
efficient. Note that this is only possible if you have numpy installed
before you install pyBigWig, since pyBigWig is written in C and needs to
link against the numpy .so file.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/dpryan79/pyBigWig/issues/18#issuecomment-257233002,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AQ9I0utTkLDVZy-5XVXSovcRSfsYYFaqks5q5Z9XgaJpZM4KQuAw
.

from pybigwig.

dpryan79 avatar dpryan79 commented on September 27, 2024

Yup, I'll modify the bioconda recipe accordingly if this works out OK.

from pybigwig.

dpryan79 avatar dpryan79 commented on September 27, 2024

The numpy branch now has some automated tests. Further, the values() function can return a numpy array. This still needs a fair but of testing, since it's likely I've missed some things.

from pybigwig.

dpryan79 avatar dpryan79 commented on September 27, 2024

You can also just pip install --upgrade git+https://github.com/dpryan79/pyBigWig@numpy to install it.

I've only tested directly inputting a numpy array (note that there's a nosetest that uses this), so I'd have to look and see what pandas is actually giving things.

As an aside, this is the downside to writing python extensions in C. They're much faster, but also much less flexible.

from pybigwig.

dpryan79 avatar dpryan79 commented on September 27, 2024

@endrebak Values need to be floats in bigWig files. That's why you're getting the error.

from pybigwig.

dpryan79 avatar dpryan79 commented on September 27, 2024

For pandas, it looks like you can use the values attribute rather than making a list:

import pyBigWig
import pandas as pd
header = [("chr1", 500)]
bw = pyBigWig.open("test.bw", "w")
bw.addHeader(header)
c = ["chr1"] * 3
s = pd.Series([1, 5, 7])
e = s + 1
v = pd.Series([5, 0, -5], dtype=float)
bw.addEntries(c, s.values, ends=e.values, values=v.values)
bw.close()

As an aside, there was a bug preventing this from working with 64bit floats (what pandas uses) that I just fixed.

from pybigwig.

endrebak avatar endrebak commented on September 27, 2024

With the latest PR (Fix the minimum floating point value check) it worked. Thanks!

Ps. I am trying to recreate some of S4Vectors' functionality in Python. Do you have a good reference for reading about Python/numpy/C interop? If no, then please do not go looking for my sake (obviously :) ).

from pybigwig.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.