dpryan79 / libbigwig Goto Github PK

A C library for handling bigWig files

License: MIT License

Makefile 1.32% C 95.57% Python 1.70% CMake 1.41%

libbigwig's Introduction

A C library for reading/parsing local and remote bigWig and bigBed files. While Kent's source code is free to use for these purposes, it's really inappropriate as library code since it has the unfortunate habit of calling exit() whenever there's an error. If that's then used inside of something like python then the python interpreter gets killed. This library is aimed at resolving these sorts of issues and should also use more standard things like curl and has a friendlier license to boot.

Documentation is automatically generated by doxygen and can be found under docs/html or online here.

Example

The only functions and structures that end users need to care about are in "bigWig.h". Below is a commented example. You can see the files under test/ for further examples.

#include "bigWig.h"
int main(int argc, char *argv[]) {
    bigWigFile_t *fp = NULL;
    bwOverlappingIntervals_t *intervals = NULL;
    double *stats = NULL;
    if(argc != 2) {
        fprintf(stderr, "Usage: %s {file.bw|URL://path/file.bw}\n", argv[0]);
        return 1;
    }

    //Initialize enough space to hold 128KiB (1<<17) of data at a time
    if(bwInit(1<<17) != 0) {
        fprintf(stderr, "Received an error in bwInit\n");
        return 1;
    }

    //Open the local/remote file
    fp = bwOpen(argv[1], NULL, "r");
    if(!fp) {
        fprintf(stderr, "An error occured while opening %s\n", argv[1]);
        return 1;
    }

    //Get values in a range (0-based, half open) without NAs
    intervals = bwGetValues(fp, "chr1", 10000000, 10000100, 0);
    bwDestroyOverlappingIntervals(intervals); //Free allocated memory

    //Get values in a range (0-based, half open) with NAs
    intervals = bwGetValues(fp, "chr1", 10000000, 10000100, 1);
    bwDestroyOverlappingIntervals(intervals); //Free allocated memory

    //Get the full intervals that overlap
    intervals = bwGetOverlappingIntervals(fp, "chr1", 10000000, 10000100);
    bwDestroyOverlappingIntervals(intervals);

    //Get an example statistic - standard deviation
    //We want ~4 bins in the range
    stats = bwStats(fp, "chr1", 10000000, 10000100, 4, dev);
    if(stats) {
        printf("chr1:10000000-10000100 std. dev.: %f %f %f %f\n", stats[0], stats[1], stats[2], stats[3]);
        free(stats);
    }

    bwClose(fp);
    bwCleanup();
    return 0;
}

Writing example

N.B., creation of bigBed files is not supported (there are no plans to change this).

Below is an example of how to write bigWig files. You can also find this file under test/exampleWrite.c. Unlike with Kent's tools, you can create bigWig files entry by entry without needing an intermediate wiggle or bedGraph file. Entries in bigWig files are stored in blocks with each entry in a block referring to the same chromosome and having the same type, of which there are three (see the wiggle specification for more information on this).

#include "bigWig.h"

int main(int argc, char *argv[]) {
    bigWigFile_t *fp = NULL;
    char *chroms[] = {"1", "2"};
    char *chromsUse[] = {"1", "1", "1"};
    uint32_t chrLens[] = {1000000, 1500000};
    uint32_t starts[] = {0, 100, 125,
                         200, 220, 230,
                         500, 600, 625,
                         700, 800, 850};
    uint32_t ends[] = {5, 120, 126,
                       205, 226, 231};
    float values[] = {0.0f, 1.0f, 200.0f,
                      -2.0f, 150.0f, 25.0f,
                      0.0f, 1.0f, 200.0f,
                      -2.0f, 150.0f, 25.0f,
                      -5.0f, -20.0f, 25.0f,
                      -5.0f, -20.0f, 25.0f};
    
    if(bwInit(1<<17) != 0) {
        fprintf(stderr, "Received an error in bwInit\n");
        return 1;
    }

    fp = bwOpen("example_output.bw", NULL, "w");
    if(!fp) {
        fprintf(stderr, "An error occurred while opening example_output.bw for writingn\n");
        return 1;
    }

    //Allow up to 10 zoom levels, though fewer will be used in practice
    if(bwCreateHdr(fp, 10)) goto error;

    //Create the chromosome lists
    fp->cl = bwCreateChromList(chroms, chrLens, 2);
    if(!fp->cl) goto error;

    //Write the header
    if(bwWriteHdr(fp)) goto error;

    //Some example bedGraph-like entries
    if(bwAddIntervals(fp, chromsUse, starts, ends, values, 3)) goto error;
    //We can continue appending similarly formatted entries
    //N.B. you can't append a different chromosome (those always go into different
    if(bwAppendIntervals(fp, starts+3, ends+3, values+3, 3)) goto error;

    //Add a new block of entries with a span. Since bwAdd/AppendIntervals was just used we MUST create a new block
    if(bwAddIntervalSpans(fp, "1", starts+6, 20, values+6, 3)) goto error;
    //We can continue appending similarly formatted entries
    if(bwAppendIntervalSpans(fp, starts+9, values+9, 3)) goto error;

    //Add a new block of fixed-step entries
    if(bwAddIntervalSpanSteps(fp, "1", 900, 20, 30, values+12, 3)) goto error;
    //The start is then 760, since that's where the previous step ended
    if(bwAppendIntervalSpanSteps(fp, values+15, 3)) goto error;

    //Add a new chromosome
    chromsUse[0] = "2";
    chromsUse[1] = "2";
    chromsUse[2] = "2";
    if(bwAddIntervals(fp, chromsUse, starts, ends, values, 3)) goto error;

    //Closing the file causes the zoom levels to be created
    bwClose(fp);
    bwCleanup();

    return 0;

error:
    fprintf(stderr, "Received an error somewhere!\n");
    bwClose(fp);
    bwCleanup();
    return 1;
}

Testing file types

As of version 0.3.0, this library supports accessing bigBed files, which are related to bigWig files. Applications that need to support both bigWig and bigBed input can use the bwIsBigWig and bbIsBigBed functions to determine if their inputs are bigWig/bigBed files:

...code...
if(bwIsBigWig(input_file_name, NULL)) {
    //do something
} else if(bbIsBigBed(input_file_name, NULL)) {
    //do something else
} else {
    //handle unknown input
}

Note that these two functions rely on the "magic number" at the beginning of each file, which differs between bigWig and bigBed files.

bigBed support

Support for accessing bigBed files was added in version 0.3.0. The function names used for accessing bigBed files are similar to those used for bigWig files.

Function | Use
--- | ---
bbOpen | Opens a bigBed file
bbGetSQL | Returns the SQL string (if it exists) in a bigBed file
bbGetOverlappingEntries | Returns all entries overlapping an interval (either with or without their associated strings
bbDestroyOverlappingEntries | Free memory allocated by the above command

Other functions, such as bwClose and bwInit, are shared between bigWig and bigBed files. See test/testBigBed.c for a full example.

A note on bigBed entries

Inside bigBed files, entries are stored as chromosome, start, and end coordinates with an (optional) associated string. For example, a "bedRNAElements" file from Encode has name, score, strand, "level", "significance", and "score2" values associated with each entry. These are stored inside the bigBed files as a single tab-separated character vector (char *), which makes parsing difficult. The names of the various fields inside of bigBed files is stored as an SQL string, for example:

table RnaElements 
"BED6 + 3 scores for RNA Elements data "
    (
    string chrom;      "Reference sequence chromosome or scaffold"
    uint   chromStart; "Start position in chromosome"
    uint   chromEnd;   "End position in chromosome"
    string name;       "Name of item"
    uint   score;      "Normalized score from 0-1000"
    char[1] strand;    "+ or - or . for unknown"
    float level;       "Expression level such as RPKM or FPKM. Set to -1 for no data."
    float signif;      "Statistical significance such as IDR. Set to -1 for no data."
    uint score2;       "Additional measurement/count e.g. number of reads. Set to 0 for no data."
    )

Entries will then be of the form (one per line):

59426	115	-	0.021	0.48	218
51	209	+	0.071	0.74	130
52	170	+	0.045	0.61	171
59433	178	-	0.049	0.34	296
53	156	+	0.038	0.19	593
59436	186	-	0.054	0.15	1010
59437	506	-	1.560	0.00	430611

Note that chromosome and start/end intervals are stored separately, so there's no need to parse them out of string. libBigWig can return these entries, either with or without the above associated strings. Parsing these string is left to the application requiring them and is currently outside the scope of this library.

Interval/Entry iterators

Sometimes it is desirable to request a large number of intervals from a bigWig file or entries from a bigBed file, but not hold them all in memory at once (e.g., due to saving memory). To support this, libBigWig (since version 0.3.0) supports two kinds of iterators. The general process of using iterators is: (1) iterator creation, (2) traversal, and finally (3) iterator destruction. Only iterator creation differs between bigWig and bigBed files.

Importantly, iterators return results by one or more blocks. This is for convenience, since bigWig intervals and bigBed entries are stored in together in fixed-size groups, called blocks. The number of blocks of entries returned, therefore, is an option that can be specified to balance performance and memory usage.

Iterator creation

For bigwig files, iterators are created with the bwOverlappingIntervalsIterator(). This function takes chromosomal bounds (chromosome name, start, and end position) as well as a number of blocks. The equivalent function for bigBed files is bbOverlappingEntriesIterator(), which additionally takes a withString argutment, which dictates whether the returned entries include the associated string values or not.

Each of the aforementioned files returns a pointer to a bwOverlapIterator_t object. The only important parts of this structure for end users are the following members: entries, intervals, and data. entries is a pointer to a bbOverlappingEntries_t object, or NULL if a bigWig file is being used. Likewise, intervals is a pointer to a bwOverlappingIntervals_t object, or NULL if a bigBed file is being used. data is a special pointer, used to signify the end of iteration. Thus, when data is a NULL pointer, iteration has ended.

Iterator traversal

Regardless of whether a bigWig or bigBed file is being used, the bwIteratorNext() function will free currently used memory and load the appropriate intervals or entries for the next block(s). On error, this will return a NULL pointer (memory is already internally freed in this case).

Iterator destruction

bwOverlapIterator_t objects MUST be destroyed after use. This can be done with the bwIteratorDestroy() function.

Example

A full example is provided in tests/testIterator.c, but a small example of iterating over all bigWig intervals in chr1:0-10000000 in chunks of 5 blocks follows:

iter = bwOverlappingIntervalsIterator(fp, "chr1", 0, 10000000, 5);
while(iter->data) {
    //Do stuff with iter->intervals
    iter = bwIteratorNext(iter);
}
bwIteratorDestroy(iter);

A note on bigWig statistics

The results of min, max, and mean should be the same as those from BigWigSummary. stdev and coverage, however, may differ due to Kent's tools producing incorrect results (at least for coverage, though the same appears to be the case for stdev). The sum method doesn't exist in Kent's tools, so note that if zoom levels are used, that it will multiply the block average by the lesser of the number of bases covered in the block and the number of bases in a block overlapping the desired region.

Python interface

There are currently two python interfaces that make use of libBigWig: pyBigWig by me and bw-python by Brent Pederson. Those interested are encouraged to give both a try!

Building without remote file access

If you want to compile without remote file access (e.g., you don't have curl installed), then you can append -DNOCURL to the CFLAGS line in the Makefile. You will also need to remove -lcurl from the LIBS line.

If you are building libBigWig using CMake you can instead pass -DWITH_CURL=OFF when calling CMake at configuration time.

libbigwig's People

Contributors

Stargazers

Watchers

libbigwig's Issues

Please publish in the README how to run tests

Configuring with -DBUILD_SHARED_LIBS=ON builds test executables, nut it isn't clear how to run tests.

Clarification sought on coordinate schemas and positions

I've been looking at building a library for accessing Big files from Perl using your library. It's gone pretty well to be honest but I have some questions about your interpretation of coordinates that's not clear from the documentation. I've pasted in an example of the docs from one of your functions below with some bits removed:

/*!
 * @brief Return bigWig entries overlapping an interval.
 * @param start The start position of the interval. This is 0-based half open, so 0 is the first base.
 * @param end The end position of the interval. Again, this is 0-based half open, so 100 will include the 100th base...which is at position 99.
 */
bwOverlappingIntervals_t *bwGetOverlappingIntervals(bigWigFile_t *fp, char *chrom, uint32_t start, uint32_t end);

I think it's your use of ...which is at position 99 is confusing me. 0-based, half open to me would suggest if you use 100 as your end value you will get the 100th base and its value should always be 100. Unless you referring to the location of base 100's values in the arrays passed back by the routine in the bwOverlappingIntervals_t struct.

Also I'm aware that when parsing BigWigs their use of coordinates can differ based on their source data. Those derived from bedGraphs retain their 0-based, half-open system where fixed and variable step use 1-start, fully-closed. I had a poke in the code and can see some mention of this but I'm unsure if you handle this internally so we need only to have to work in 0-based, half open coordinates.

Thanks and sorry for the badgering.

usr/bin/ld: cannot find -lz when installing libBigWig

I am trying to install libBigWig, as it is as dependency of WiggleTools. libBigWig requires “zlib”, “bzip2” and “libcurl”, which I have them installed already. When I try to install libBigWig with the following commands:

git clone https://github.com/dpryan79/libBigWig.git
cd libBigWig
make install

I get the following:

/home/dp456/miniconda3/envs/py37/bin/x86_64-conda-linux-gnu-cc -I. -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /home/dp456/miniconda3/envs/py37/include -DNOCURL  -c -o io.o io.c
/home/dp456/miniconda3/envs/py37/bin/x86_64-conda-linux-gnu-cc -I. -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /home/dp456/miniconda3/envs/py37/include -DNOCURL  -c -o bwValues.o bwValues.c
/home/dp456/miniconda3/envs/py37/bin/x86_64-conda-linux-gnu-cc -I. -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /home/dp456/miniconda3/envs/py37/include -DNOCURL  -c -o bwRead.o bwRead.c
/home/dp456/miniconda3/envs/py37/bin/x86_64-conda-linux-gnu-cc -I. -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /home/dp456/miniconda3/envs/py37/include -DNOCURL  -c -o bwStats.o bwStats.c
/home/dp456/miniconda3/envs/py37/bin/x86_64-conda-linux-gnu-cc -I. -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /home/dp456/miniconda3/envs/py37/include -DNOCURL  -c -o bwWrite.o bwWrite.c
/home/dp456/miniconda3/envs/py37/bin/x86_64-conda-linux-gnu-ar -rcs libBigWig.a io.o bwValues.o bwRead.o bwStats.o bwWrite.o
/home/dp456/miniconda3/envs/py37/bin/x86_64-conda-linux-gnu-ranlib libBigWig.a
/home/dp456/miniconda3/envs/py37/bin/x86_64-conda-linux-gnu-cc -I. -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /home/dp456/miniconda3/envs/py37/include -DNOCURL  -fpic -c -o io.pico io.c
/home/dp456/miniconda3/envs/py37/bin/x86_64-conda-linux-gnu-cc -I. -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /home/dp456/miniconda3/envs/py37/include -DNOCURL  -fpic -c -o bwValues.pico bwValues.c
/home/dp456/miniconda3/envs/py37/bin/x86_64-conda-linux-gnu-cc -I. -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /home/dp456/miniconda3/envs/py37/include -DNOCURL  -fpic -c -o bwRead.pico bwRead.c
/home/dp456/miniconda3/envs/py37/bin/x86_64-conda-linux-gnu-cc -I. -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /home/dp456/miniconda3/envs/py37/include -DNOCURL  -fpic -c -o bwStats.pico bwStats.c
/home/dp456/miniconda3/envs/py37/bin/x86_64-conda-linux-gnu-cc -I. -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /home/dp456/miniconda3/envs/py37/include -DNOCURL  -fpic -c -o bwWrite.pico bwWrite.c
/home/dp456/miniconda3/envs/py37/bin/x86_64-conda-linux-gnu-cc -shared  -o libBigWig.so io.pico bwValues.pico bwRead.pico bwStats.pico bwWrite.pico  -lm -lz
/home/dp456/miniconda3/envs/py37/bin/../lib/gcc/x86_64-conda-linux-gnu/7.5.0/../../../../x86_64-conda-linux-gnu/bin/ld: cannot find -lz
collect2: error: ld returned 1 exit status
make: *** [Makefile:60: libBigWig.so] Error 1

Now, I am still at the beginning of my learning curve in informatics, but as far as I can understand, it seems to be an issue with the ld linker not finding zlib. This thread here suggested installing:

sudo apt-get install zlib1g-dev
sudo apt-get install libz-dev
sudo apt-get install lib32z1-dev
sudo apt-get install zlib*

but the error still persists after trying all variations. What else can I do?? I also tried to install WiggleTools with conda, but that doesn't work as well, because there are several conflicts with other packages.

This is the OS/version of the server that I’m using it:

Welcome to Ubuntu 18.04.5 LTS (GNU/Linux 4.15.0-197-generic x86_64)

Check validity of added entries

There's currently no sanity checking performed when writing a new file. So, someone could specify entries out of order and not get an error message immediately!

Optionally check new entries for consistency/sanity

I do this in pyBigWig, but I'm told that some programs using this do not check to ensure that intervals are entered in a sane order. The various add/append functions should add a checkConsistency parameter. This will result in a new minor version, due to the change in API. I should also start adding .1 or whatever to the .so file.

Question: is it possible to merge bigwig files?

Hey, I'm trying to write a CLI tool to merge to bigWig files. Is it something that is possible with this library? If so, would you be able to provide a basic example?
Otherwise, do you know where I can find a formal definition of the .bigWig format?

cache for reading bigwig and bigbed remote files?

Hi,
Is it possible to add cache for remote bigwig and bigbed files, for faster access next time?
thanks.

Trouble with files on Amazon S3

Hello @dpryan79 ,

It would appear that libBigWig struggles with remote files on S3. e.g, with the following code (compiled to toto):

#include "bigWig.h"

void main(int c, char ** argv) {
bwOpen(argv[1], NULL, "r");
}

I get:

./toto https://encode-public.s3.amazonaws.com/2017/10/03/ad2c0f17-0824-4647-a749-74276daca7da/ENCFF278CUB.bigWig
Segmentation fault
wget https://encode-public.s3.amazonaws.com/2017/10/03/ad2c0f17-0824-4647-a749-74276daca7da/ENCFF278CUB.bigWig
[...]
./toto ENCFF278CUB.bigWig

Conversely, the Kent library has no issue with these remote bigWig files via CURL.

Would it be possible for libBigWig to handle these remote files on AWS?

Thank you,

@dzerbino

Makefile: install target does not create target dirs

The Makefile's install target assumes that $(prefix)/lib and $(prefix)/include already exist. This is not necessarily the case when installing to a non-standard prefix.

It would be better if the target directories were created before installing files to them, e.g. by applying this patch to the Makefile:

From 43f628598dc3478a7a823c46d1d7e5985611045c Mon Sep 17 00:00:00 2001
From: Ricardo Wurmus <[email protected]>
Date: Thu, 25 Feb 2016 10:49:28 +0100
Subject: [PATCH] Create target directories before installing to them.

---
 Makefile | 1 +
 1 file changed, 1 insertion(+)

diff --git a/Makefile b/Makefile
index e1faaf4..731bbf8 100644
--- a/Makefile
+++ b/Makefile
@@ -68,6 +68,7 @@ clean:
    rm -f *.o libBigWig.a libBigWig.so *.pico test/testLocal test/testRemote test/testWrite test/exampleWrite example_output.bw

 install: libBigWig.a libBigWig.so
+   install -d $(prefix)/lib $(prefix)/include
    install libBigWig.a $(prefix)/lib
    install libBigWig.so $(prefix)/lib
    install *.h $(prefix)/include
-- 
2.1.0

What do you think?

Support for extraIndex fields in BigBed files

Hi there

We've started to use BigBed and libBigWig for a lot of our flat file reading and have started to use the extraIndex feature (see the kent source for bigBedNamedItems). Currently we have to shell back out to run bigBedNamedItems to extract rows of the BigBed file we are interested in. Is there any possibility of supporting the extra indexes? It looks from https://github.com/ucscGenomeBrowser/kent/blob/master/src/lib/bigBed.c#L635 they're accessible from the file but also they require some AutoSQL parsing to process.

Anyway a general idea of how plausible support for these are would be really appreciated. Even if it's a case of no way.

Thanks

Infinite loop when creating bw file

Hi there,

Firstly thank you again for this library, it is helping us dramatically speed up our bw file processing.
I have used libBigWig to create a set of tools to manipulate and generate bw files. We have recently run into what seems to be an infinite loop. See cancerit/cgpBigWig#9

Ive done some detective work (using libBigWig master branch) and it seems that the code is getting stuck here https://github.com/dpryan79/libBigWig/blob/master/bwWrite.c#L928-L934 .

Initially I though this could be due to the fact we are missing contigs (Y for example) from the input bed that are present in the .fai file. Some print statements in libBigWig showed me that the code was hitting and infinite loop at the input file line 13 19020094 115108598 4 (as seen in the cgpBigWig issue linked above cancerit/cgpBigWig#9), so the code isn't even reaching the end of the file and therefore finding no issue with missing contigs.

Here is the c code utilising libBigWig and throwing the error https://github.com/cancerit/cgpBigWig/blob/develop/c/bg2bw.c . I haven't been able to find anything obvious causing this infinite loop in my code, so I'm wondering if I've found an edge case in libBigWig, or perhaps that an experienced eye could tell me where I've gone wrong. For the record I have also tried using an bwAppendIntervals where the last contig retrieved from the input bed matches the current contig when looping through the bed file and this hasn't solved the issue.

@keiranmraine has an interest in this too

high memory usage during indexing

the memory use of libbigwig gets quite high during indexing.
is there anyway to reduce this? for example by flushing each chromosome to disk after the index for that tid is created in constructZoomLevels?

URL with Temporary Redirect

I am trying to fetch some bigWig file from ENCODE project, seems they hosted their files at Amazon S3, using testRemote gives me error. wondering is there a solution? Thanks.

$ ./testRemote https://www.encodeproject.org/files/ENCFF188HKC/@@download/ENCFF188HKC.bigWig
[bwHdrRead] There was an error while reading in the header!
An error occured while opening https://www.encodeproject.org/files/ENCFF188HKC/@@download/ENCFF188HKC.bigWig

$ curl -I https://www.encodeproject.org/files/ENCFF188HKC/@@download/ENCFF188HKC.bigWig
HTTP/1.1 307 Temporary Redirect
Server: nginx/1.10.1
Date: Fri, 17 Mar 2017 21:08:37 GMT
Content-Type: text/html; charset=UTF-8
Content-Length: 1196
Connection: keep-alive
X-Request-URL: https://www.encodeproject.org/files/ENCFF188HKC/@@download/ENCFF188HKC.bigWig
X-Stats: es_count=1&es_time=7226&queue_begin=1489784917370001&queue_time=830&rss_begin=492408832&rss_change=0&rss_end=492408832&wsgi_begin=1489784917370831&wsgi_end=1489784917389025&wsgi_time=18194
Access-Control-Allow-Origin: *
Access-Control-Allow-Methods: GET, HEAD
Access-Control-Allow-Headers: Accept, Origin, Range, X-Requested-With
Access-Control-Expose-Headers: Content-Length, Content-Range, Content-Type
Location: https://download.encodeproject.org/http://encode-files.s3.amazonaws.com/2017/03/05/d7be9e16-e742-4554-9e9b-347834665817/ENCFF188HKC.bigWig?Signature=Tb43S%2BXE%2BOT0jVVRYn1E1amZZss%3D&Expires=1489914517&AWSAccessKeyId=ASIAIBGS2LKKZLYIOLVA&response-content-disposition=attachment%3B%20filename%3DENCFF188HKC.bigWig&x-amz-security-token=FQoDYXdzEB4aDKo3b1EtKQK0xzlrqCK3A8jMrctRMooXvbFhPZaBtN46iqYhdsIuZVnmCBYphXlMoRFfa%2B7dyVq1ICoFY7d6wrVj2sKHs4VfVMOYlRJOHonPlRj9BvF5DYR8EHZaItBq4ouDlkzOYcrCNbo36uR1IP%2BsDlX8vwqn7hw6ri/wtQYjReE35P8wyG7D3cN4cHZFm2bAmd4xfS6o7vsgh21LfSHjhKIg%2BoQqPoxZwdNB64qlUBrKYo%2BnhDQdKDceMc/0GB9NJqy1U1n0kaXitFHSwg88LzgXR/CY2Eyk/tQVcScceLERWAupB9nLyVpsVH1uSOumFhwcSf1FXEyqFCKWf4jgUqBHJ7T7kfUHmKcLP8VbJgQs0/TB7q8OY0fn7lzugK4kTXkF3GoGdI8aUwNBo2VuA7Z1S0ldUntTMHeh%2Bl9x9nLETbzmVSOB/qIlP%2BZimKVJTMszPoj57cqhxAUd/%2B6xfdiZbjjoh6NSY5rI%2BfMWfTqZS5my6aCIVXTjkg26Bfm4HGl04bNXhJp7uTiIeorwMex0yfSxFJnybunDaLv9fuFmqsOpY7tOXNzmbina6Tp38euVaLnHD/9rAKNpTu6C7pvyUWQohJ6xxgU%3D
Strict-Transport-Security: max-age=15768000

Feature request: BigBed reader

Hello,

Tied in to ticket #11, would it be possible for libBigWig to read BigBeds?

Hopefully, it should not be too painful: the search tree / compression block structure is identical, only the content of the compressed blocks is different.

Thanks in advance for considering my request,

Daniel

Slow loading a remote BigWig file

Sorry I seem to be finding a lot of these today. Anyway same stub program as before expect I've switched to using a UCSC example bigwig file hosted at genome.ucsc.edu. The following code took over 3 minutes to return:

#include "bigWig.h"
#include <stdio.h>
#include <inttypes.h>
#include <stdlib.h>

int main(int argc, char *argv[]) {
    bigWigFile_t *fp = NULL;
    double *stats = NULL;
    
    char file[] = "http://genome.ucsc.edu/goldenPath/help/examples/bigWigExample.bw";
    char chrom[] = "chr21";
    int start = 33031597;
    int end = 34041570;
    int full = 1;
    int bins = 100;
    
    int buffer = 1<<17;
    
    if(bwInit(buffer) != 0) {
        fprintf(stderr, "Received an error in bwInit\n");
        return 1;
    }
    
    fp = bwOpen(file, NULL, "r");
    if(!fp) {
        fprintf(stderr, "An error occured while opening %s\n", file);
        return 1;
    }
    if(full) 
      stats = bwStatsFromFull(fp, chrom, start, end, bins, mean);
    else 
      stats = bwStats(fp, chrom, start, end, bins, mean);

    if(stats)
      free(stats);

    bwClose(fp);
    bwCleanup();
    return 0;
}

My Perl code had similar issues and the following Python code had the same issues

#!/usr/bin/env python

import pyBigWig
bw = pyBigWig.open("http://genome.ucsc.edu/goldenPath/help/examples/bigWigExample.bw")
bw.stats('chr21', 33031597, 34041570, type="mean", nBins=100, exact=True)

Doing a sample of the C process in OSX showed that 100% of the CPU time is spent in __select from the system kernel and the next level up was Curl_poll so I think it's spending its time transferring. I did some info dumps on the BigWig and this is what I got back

$ bigWigInfo http://genome.ucsc.edu/goldenPath/help/examples/bigWigExample.bw
version: 1
isCompressed: no
isSwapped: 0
primaryDataSize: 56,335,300
primaryIndexSize: 227,048
zoomLevels: 7
chromCount: 1
basesCovered: 35,926,161
mean: 40.862151
min: 0.000000
max: 100.000000
std: 22.968515

The only bit I think is odd here is that it's an uncompressed bigwig file. I also tried mucking around with the buffer size and set it to 8MB. That brought the total time spent down to 15 seconds so I'm guessing there's a problem/limitation with uncompressed BigWigs and streaming them across the wire.

Problem with compilation due to curl.h

Hi, I'm stuck with this install problem. I would appreciated any advice.

I'm on Debian working in conda(4.8.1) env.

I Did before make :

conda install -c anaconda curl
conda install -c anaconda libcurl

make install

/home/jean-philippe.villemin/bin/anaconda3/bin/x86_64-conda_cos6-linux-gnu-cc -I. -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -c -o io.o io.c
io.c:2:10: fatal error: curl/curl.h: No such file or directory
#include <curl/curl.h>
^~~~~~~~~~~~~
compilation terminated.
make: *** [Makefile:33: io.o] Error 1

locate curl.h

/home/jean-philippe.villemin/bin/anaconda3/include/curl/curl.h
/home/jean-philippe.villemin/bin/anaconda3/pkgs/libcurl-7.65.2-h20c2e04_0/include/curl/curl.h
/home/jean-philippe.villemin/bin/anaconda3_4.5.11/envs/majiq_env/include/curl/curl.h
/home/jean-philippe.villemin/bin/anaconda3_4.5.11/envs/outrigger-env/include/curl/curl.h
/home/jean-philippe.villemin/bin/anaconda3_4.5.11/envs/python2/include/curl/curl.h
/home/jean-philippe.villemin/bin/anaconda3_4.5.11/envs/r_env/include/curl/curl.h
/home/jean-philippe.villemin/bin/anaconda3_4.5.11/include/curl/curl.h
/home/jean-philippe.villemin/bin/anaconda3_4.5.11/pkgs/curl-7.52.1-0/include/curl/curl.h
/home/jean-philippe.villemin/bin/anaconda3_4.5.11/pkgs/curl-7.55.1-h78862de_4/include/curl/curl.h
/home/jean-philippe.villemin/bin/anaconda3_4.5.11/pkgs/curl-7.55.1-hcb0b314_2/include/curl/curl.h
/home/jean-philippe.villemin/bin/anaconda3_4.5.11/pkgs/libcurl-7.61.0-h1ad7b7a_0/include/curl/curl.h
/home/jean-philippe.villemin/bin/anaconda3_4.5.11/pkgs/libcurl-7.64.0-h01ee5af_0/include/curl/curl.h
/home/jean-philippe.villemin/bin/anaconda3_4.5.11/pkgs/libcurl-7.65.3-h20c2e04_0/include/curl/curl.h
/home/jean-philippe.villemin/bin/cmake-3.8.1/Utilities/cm_curl.h
/home/jean-philippe.villemin/bin/cmake-3.8.1/Utilities/cmcurl/include/curl/curl.h
/home/jean-philippe.villemin/bin/packages_R-3.3.1/include/curl/curl.h
/usr/include/curl/curl.h

echo $C_INCLUDE_PATH

/home/jean-philippe.villemin/bin/anaconda3/include/curl:/home/jean-philippe.villemin/bin/gsl-2.3/bin/include:/home/jean-philippe.villemin/bin/libBigWig/bin/include:/home/jean-philippe.villemin/bin/htslib/bin/include

echo $LD_LIBRARY_PATH

/home/jean-philippe.villemin/bin/jdk1.8.0_101/jre/lib/amd64:/home/jean-philippe.villemin/bin/jdk1.8.0_101/jre/lib/amd64/server:/home/jean-philippe.villemin/bin/anaconda3/lib/libreadline.so.6:/home/jean-philippe.villemin/bin/anaconda3/lib/libpng16.so.16:/home/jean-philippe.villemin/bin/jdk1.8.0_101/jre/lib/amd64:/home/jean-philippe.villemin/bin/jdk1.8.0_101/jre/lib/amd64/server:/home/jean-philippe.villemin/bin/anaconda3/lib/libreadline.so.6:/home/jean-philippe.villemin/bin/anaconda3/lib/libpng16.so.16::/home/jean-philippe.villemin/bin/libBigWig/bin/lib:/home/jean-philippe.villemin/bin/gsl-2.3/bin/lib:/home/jean-philippe.villemin/bin/htslib/bin/lib:/home/jean-philippe.villemin/bin/libBigWig/bin/lib:/home/jean-philippe.villemin/bin/gsl-2.3/bin/lib:/home/jean-philippe.villemin/bin/htslib/bin/lib

Error whilst querying BigWig file: "got an error in bwStatsFromZoom"

Whilst querying for stats from a BigWig file from the Ensembl FTP site I got an error from libBigWig. I found it first in my Perl bindings but I've been able to replicate this in C below:

#include "bigWig.h"
#include <stdio.h>
#include <inttypes.h>
#include <stdlib.h>

int main(int argc, char *argv[]) {
    bigWigFile_t *fp = NULL;
    double *stats = NULL;
    
    char file[] = "http://ftp.ensembl.org/pub/data_files/homo_sapiens/GRCh38/rnaseq/GRCh38.illumina.sk_muscle.1.bam.bw";
    char chrom[] = "21";
    int start = 33031597;
    int end = 34041570;
    int full = 0;
    int bins = 100;
    
    if(bwInit(1<<17) != 0) {
        fprintf(stderr, "Received an error in bwInit\n");
        return 1;
    }
    
    fp = bwOpen(file, NULL, "r");
    if(!fp) {
        fprintf(stderr, "An error occured while opening %s\n", file);
        return 1;
    }
    if(full) 
      stats = bwStatsFromFull(fp, chrom, 33031597, 34041570, bins, mean);
    else 
      stats = bwStats(fp, chrom, 33031597, 34041570, bins, mean);

    if(stats)
      free(stats);

    bwClose(fp);
    bwCleanup();
    return 0;
}

This was compiled using cc -g -Wall -O3 -Wsign-compare -o query query.c libBigWig.a -lcurl -lm -lz. When run the error I currently get back is

got an error in bwStatsFromZoom in the range 33031597-33041696: Operation now in progress

When I switch to requesting full stats the message goes away. The BigWig file in question is a coverage plot from a BAM file and was created using the kent utils. I've also checked with the kent binaries and it's able to report statistics for the requested region.

Any help is appreciated.

parameter values

hi Devon, can you given guidance on reasonable values / tradeoffs for a few parameters?

maxZooms to bwCreateHdr. I assume 7 is reasonable default?
blocksPerIteration to bwOverlappingIntervalsIterator. higher == more memory, any other tradeoffs?
then other questions:
how does bwStatsFromFull differ from bwStats ?
will it be horribly inefficient to call e.g. bwAppendInterval* with a single interval?

thanks again for the library.

Online docs rendered incorrectly

The link to the CDN for rendered docs is has some kind of headers incorrect and is passing the file as preformatted HTML in a wrapper, instead of as HTML directly:

https://cdn.jsdelivr.net/gh/dpryan79/libBigWig@master/docs/html/index.html

yields:

Search improvement ?

Hello @dpryan79, I was wondering if this could potentially speed up the library.

During the walkRTreeNodes process, the library seems to iterate through all children to figure out if a block overlaps with the current region or not either in overlapsLeaf or overlapsNonLeaf

Instead can this use a binary search approach to find an overlapping block and search the neighborhood for all overlaps ? I'm trying to think of a scenario why this would fail but let me know if you have any thoughts on this

Test failures on other architectures

Hello there,

In Debian we run some of the tests you've included in test/ as autopkgtest for libbigwig package.
This is how we are running the tests in Debian

echo "1c52065211fdc44eea45751a9cbfffe0 test/Local.bw" >> checksums
echo "8e116bd114ffd2eb625011d451329c03 test/Write.bw" >> checksums
echo "ef104f198c6ce8310acc149d0377fc16 test/example_output.bw" >> checksums

LIB_STATIC=`find /usr/lib -name libBigWig.a`

#Compile using installed libbigwig
echo 'Compiling ...'
gcc -g -Wall   test/testLocal.c ${LIB_STATIC} -lBigWig -lcurl -lm -lz -o testlocal
gcc -g -Wall   test/testWrite.c ${LIB_STATIC} -lBigWig -lm -lz -lcurl -o testWrite
gcc -g -Wall   test/exampleWrite.c ${LIB_STATIC} -lBigWig -lcurl -lm -lz -o examplewrite

echo '------------------------------'

echo -e "Test 1"
./testlocal test/test.bw > test/Local.bw

echo -e "Test 2"
./testWrite test/test.bw test/Write.bw

echo -e "Test 3"
./examplewrite

md5sum --check checksums
echo -e "PASS"

Currently the tests are failing for i386 and s390 architectures.

The errors that we are getting are as follows

i386:

Compiling ...
------------------------------
Test 1
Test 2
Test 3
test/Local.bw: FAILED
test/Write.bw: FAILED
test/example_output.bw: OK
md5sum: WARNING: 2 computed checksums did NOT match

The output files generated have different hashsum. Is this expected ?

s390 (Big Endian) :

Compiling ...
------------------------------
Test 1
testlocal: test/testLocal.c:107: main: Assertion `bwIsBigWig(argv[1], NULL) == 1' failed.
/tmp/autopkgtest-lxc.y_h72gqc/downtmp/build.e6I/src/debian/tests/run-unit-test: line 32:  1996 Aborted                 ./testlocal test/test.bw > test/Local.bw

It would be helpful if you could help in debugging these failures :)

Thanks

Rename io.h to BigWig_io.h

Hi,
I have trouble to compile 3rd-party code (hdf5-1.8.18) on a system where I previously installed BigWig. It is unfortunate BigWig introduces io.h header. Please rename it. (Placing it into a subdirectory could help but I assume still in some cases users/compilers will pick this file and not the system-one.)

x86_64-pc-linux-gnu-gcc -DHAVE_CONFIG_H -I. -I/scratch/var/tmp/portage/sci-libs/hdf5-1.8.18/work/hdf5-1.8.18/tools/perform -I../../src  -D_GNU_SOURCE -D_POSIX_C_SOURCE=200112L   -DNDEBUG -UH5_DEBUG_API -I/scratch/var/tmp/portage/sci-libs/hdf5-1.8.18/work/hdf5-1.8.18/src -I/scratch/var/tmp/portage/sci-libs/hdf5-1.8.18/work/hdf5-1.8.18/test -I/scratch/var/tmp/portage/sci-libs/hdf5-1.8.18/work/hdf5-1.8.18/tools/lib  -std=c99 -pedantic -Wall -Wextra -Wundef -Wshadow -Wpointer-arith -Wbad-function-cast -Wcast-qual -Wcast-align -Wwrite-strings -Wconversion -Waggregate-return -Wstrict-prototypes -Wmissing-prototypes -Wmissing-declarations -Wredundant-decls -Wnested-externs -Winline -Wfloat-equal -Wmissing-format-attribute -Wmissing-noreturn -Wpacked -Wdisabled-optimization -Wformat=2 -Wunreachable-code -Wendif-labels -Wdeclaration-after-statement -Wold-style-definition -Winvalid-pch -Wvariadic-macros -Winit-self -Wmissing-include-dirs -Wswitch-default -Wswitch-enum -Wunused-macros -Wunsafe-loop-optimizations -Wc++-compat -Wstrict-overflow -Wlogical-op -Wlarger-than=2048 -Wvla -Wsync-nand -Wframe-larger-than=16384 -Wpacked-bitfield-compat -Wstrict-overflow=5 -Wjump-misses-init -Wunsuffixed-float-constants -Wdouble-promotion -Wsuggest-attribute=const -Wtrampolines -Wstack-usage=8192 -Wvector-operation-performance -Wsuggest-attribute=pure -Wsuggest-attribute=noreturn -Wsuggest-attribute=format -Wdate-time -Wopenmp-simd -O3 -O2 -pipe -mpclmul -mpopcnt -march=native -ftree-vectorize -c -o zip_perf.o /scratch/var/tmp/portage/sci-libs/hdf5-1.8.18/work/hdf5-1.8.18/tools/perform/zip_perf.c
In file included from /scratch/var/tmp/portage/sci-libs/hdf5-1.8.18/work/hdf5-1.8.18/tools/perform/overhead.c:37:0:
/usr/include/io.h:22:6: error: nested redefinition of 'enum bigWigFile_type_enum'
 enum bigWigFile_type_enum {
      ^
/usr/include/io.h:22:6: error: redeclaration of 'enum bigWigFile_type_enum'
In file included from /scratch/var/tmp/portage/sci-libs/hdf5-1.8.18/work/hdf5-1.8.18/src/H5private.h:149:0,
                 from /scratch/var/tmp/portage/sci-libs/hdf5-1.8.18/work/hdf5-1.8.18/tools/perform/overhead.c:26:
/usr/include/io.h:22:6: note: originally defined here
 enum bigWigFile_type_enum {
      ^
In file included from /scratch/var/tmp/portage/sci-libs/hdf5-1.8.18/work/hdf5-1.8.18/tools/perform/overhead.c:37:0:
/usr/include/io.h:23:5: error: redeclaration of enumerator 'BWG_FILE'
     BWG_FILE = 0,
     ^
In file included from /scratch/var/tmp/portage/sci-libs/hdf5-1.8.18/work/hdf5-1.8.18/src/H5private.h:149:0,
                 from /scratch/var/tmp/portage/sci-libs/hdf5-1.8.18/work/hdf5-1.8.18/tools/perform/overhead.c:26:
/usr/include/io.h:23:5: note: previous definition of 'BWG_FILE' was here
     BWG_FILE = 0,
     ^
In file included from /scratch/var/tmp/portage/sci-libs/hdf5-1.8.18/work/hdf5-1.8.18/tools/perform/overhead.c:37:0:
/usr/include/io.h:24:5: error: redeclaration of enumerator 'BWG_HTTP'
     BWG_HTTP = 1,
     ^

There isn't that many tools using BigWig yet so it is doable. Thank you.

Clarification over large buffer sizes sought

Hi. I encountered an issue yesterday when I accidentally set my buffer size to 8MB. It had the effect of quadrupling the cost of loading a bigwig file over HTTP. Once I noticed the problem and set it to the recommended size (1<<17) then performance was more reasonable. However I'm unclear as to how the buffer is being used internally and the impact it had on performance. When you request a portion of a remote file do you request a range that's the start+buffer size or is there something more cunning going on in the background?

Thanks and sorry for the request for clarification. I had thought I had totally messed up my Perl XS bindings for a bit of yesterday because they were worryingly slower than the equivalent kent bindings.

bw = bwOpen("foo", NULL, "w"); bwClose(bw); causes a segfault.

Opening a file for writing and then closing it without adding a header/chromosome list or entries causes a segfault. This shouldn't happen.

Time for a new release?

Hi, do you mind making a new release with the latest changes? :)

MacOS test broken

Hello,

it seems that differences in Mac break tests that otherwise work fine in Linux, I'm guessing due to the use of explicit MD5 checksums:

disorientation:libBigWig dzerbino$ make test
./test/test.py
Traceback (most recent call last):
File "./test/test.py", line 14, in
assert(md5sum == "a15cbf0021f3e80d9ddfd9dbe78057cf")
AssertionError
make: *** [test] Error 1

Cheers,

Daniel

Segmentation fault with sparse data

Hi there @dpryan79

We have encountered a seg fault when trying to create (very) sparse bed files. In this case coverage 0 for a contig.
We are generating bigwig files on a per contig level then merging them later (because in some cases we are trying to create bed files for assemblies with ~35k contigs). In this instance the issue was encountered using Rat Rnor 5.0 on chromosome 2 where there was a value of zero for the whole contig. The seg fault was:

Invalid read of size 8
==24547==    at 0x411DC9: constructZoomLevels (bwWrite.c:1008)
==24547==    by 0x412BBD: bwFinalize (bwWrite.c:1228)
==24547==    by 0x40DE66: bwClose (bwRead.c:287)
==24547==    by 0x40AC16: main (bam2bw.c:364)
==24547==  Address 0x0 is not stack'd, malloc'd or (recently) free'd

I traced this to fp->hdr->nLevels not being a value of 0 where the size of the region was an entire contig, and the length of the contig exceeds that of the test for overflow ((uint32_t)-1)>>2 .
I believe this patch fixes the issue:

--- ../master/libBigWig/bwWrite.c	2017-05-30 15:43:45.284504000 +0100
+++ test/libBigWig/bwWrite.c	2017-05-30 16:41:04.260661000 +0100
@@ -787,7 +787,10 @@
     //In reality, one level is skipped
     meanBinSize *= 4;
     //N.B., we must ALWAYS check that the zoom doesn't overflow a uint32_t!
-    if(((uint32_t)-1)>>2 < meanBinSize) return 0; //No zoom levels!
+    if(((uint32_t)-1)>>2 < meanBinSize){
+      fp->hdr->nLevels = 0;
+      return 0;
+    }//No zoom levels!
     if(meanBinSize*4 > zoom) zoom = multiplier*meanBinSize;
 
     fp->hdr->zoomHdrs = calloc(1, sizeof(bwZoomHdr_t));

Segmentation fault in url_fread

I am getting a segfault after multiple region requests to a remote file. The same file has no issues when accessed locally:

Stacktrace:

(gdb) bt
#0  __memcpy_sse2_unaligned () at ../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S:152
#1  0x00007fbf78d8167a in url_fread (obuf=obuf@entry=0x4b802f0, obufSize=obufSize@entry=3445, URL=0x4b48a00) at io.c:60
#2  0x00007fbf78d81788 in urlRead (URL=<optimized out>, buf=buf@entry=0x4b802f0, bufSize=bufSize@entry=3445) at io.c:94
#3  0x00007fbf78d83efb in bwRead (data=data@entry=0x4b802f0, sz=3445, nmemb=nmemb@entry=1, fp=fp@entry=0x4b45640) at bwRead.c:29
#4  0x00007fbf78d82b9f in bwGetOverlappingIntervalsCore (fp=fp@entry=0x4b45640, o=o@entry=0x4b810c0, tid=tid@entry=21,
    ostart=ostart@entry=325665, oend=oend@entry=325742) at bwValues.c:422
#5  0x00007fbf78d8358e in bwGetOverlappingIntervals (fp=fp@entry=0x4b45640, chrom=chrom@entry=0x4b49240 "", start=start@entry=325665,
    end=end@entry=325742) at bwValues.c:568
#6  0x00007fbf78d83a0a in bwGetValues (fp=fp@entry=0x4b45640, chrom=chrom@entry=0x4b49240 "", start=start@entry=325665,
    end=end@entry=325742, includeNA=includeNA@entry=1) at bwValues.c:715

The reproduction steps are a bit tricky - I can't share the data I have, but after querying a list of 3 regions in one particular order it causes a segfault. If I reverse the order, or add some more requests in the middle no segfault occurs.

My code is:

#include "bigWig.h"

int main(int argc, char *argv[]) {
    bigWigFile_t *fp = NULL;
    bwOverlappingIntervals_t *intervals = NULL;
    double *stats = NULL;
    if(argc != 2) {
        fprintf(stderr, "Usage: %s {file.bw|URL://path/file.bw}\n", argv[0]);
        return 1;
    }

    //Initialize enough space to hold 131072 bytes (taken from Bio::DB::Big default value)
    if(bwInit(131072) != 0) {
        fprintf(stderr, "Received an error in bwInit\n");
        return 1;
    }

    //Open the local/remote file
    fp = bwOpen(argv[1], NULL, "r");
    if(!fp) {
        fprintf(stderr, "An error occured while opening %s\n", argv[1]);
        return 1;
    }

    fprintf(stderr, "Fetching regions\n");

    intervals = bwGetValues(fp, "9", 214972, 215034, 1);
    bwDestroyOverlappingIntervals(intervals);

    intervals = bwGetValues(fp, "9", 317038, 317133, 1);
    bwDestroyOverlappingIntervals(intervals);

    intervals = bwGetValues(fp, "9", 325666, 325742, 1);
    bwDestroyOverlappingIntervals(intervals);

    bwClose(fp);
    bwCleanup();
    return 0;
}

Each interval request on its own does not cause a failure, just the 3 one after the other.

the URL->bufPos ends up larger than the URL->bufLen, which leads to a nonsense memcpy request:

  [url_fread] memBuf: 4425691136 bufPos: 9257, bufLen: 2249, remaining: 3445
  [url_fread] memcpy start: 4425700393 memcpy end: 18446744073709544608

in the

} else if(URL->bufLen < URL->bufPos + remaining) {

block of url_fread

Any ideas why this might be?

Feature request: iterator

Hello,

I have developed a library to compute whole genome statistics from multiple BigWig files WiggleTools. To do this, it needs to be efficient with memory, and therefore uses iterators intensively.

WiggleTools uses the Kent source tree, but this entails quite a few dependencies that I would like to get rid of. I would be very keen to switch to libBigWig.

Would it be possible to create iterator functions within libBigWig?

Typically, an iterator could be created either as a whole genome iterator, or over a region of interest.

Instead of returning all results in a single bwOverlappingIntervals_t struct, it could return a sequence of bwOverlappingIntervals_t structs, each object covering a number of consecutive compressed blocks on disk. FWIW, my code currently looks like (using Kent functions):

	struct fileOffsetSize *blockList, *block, *beforeGap, *afterGap;

       // Search for linked list of blocks overlapping region of interest 
       blockList = bbiOverlappingBlocks(file_handle, search_tree, chrom, start, finish, NULL);

	for (block = blockList; block; block=afterGap) {
		/* Read contiguous blocks into mergedBuf. */
		fileOffsetSizeFindGap(block, &beforeGap, &afterGap);

		// Little hack to limit the number of blocks read at any time
		struct fileOffsetSize * blockPtr, * prevBlock;
		int blockCounter = 0;
		prevBlock = block;

                // Count max blocks or until you hit a gap in the disk
		for (blockPtr = block; blockPtr != afterGap && blockCounter < MAX_BLOCKS; blockPtr = blockPtr->next) {
			blockCounter++;
			prevBlock = blockPtr;
		}

                // If you stopped before the gap, pretend you hit a gap
		if (blockCounter == MAX_BLOCKS) {
			beforeGap = prevBlock;
			afterGap = blockPtr;
		}

		bits64 mergedSize = beforeGap->offset + beforeGap->size - block->offset;

		if (downloadBlockRun(data, chrom, block, afterGap, mergedSize)) {
			slFreeList(blockList);
			return true;
		}
	}

Thanks in advance for considering my request,

Daniel

dpryan79 / libbigwig Goto Github PK

libbigwig's Introduction

Example

Writing example

Testing file types

bigBed support

A note on bigBed entries

Interval/Entry iterators

Iterator creation

Iterator traversal

Iterator destruction

Example

A note on bigWig statistics

Python interface

Building without remote file access

libbigwig's People

Contributors

Stargazers

Watchers

Forkers

libbigwig's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs