GithubHelp home page GithubHelp logo

dpryan79 / libbigwig Goto Github PK

View Code? Open in Web Editor NEW
71.0 5.0 25.0 790 KB

A C library for handling bigWig files

License: MIT License

Makefile 1.32% C 95.57% Python 1.70% CMake 1.41%
bigbed bigwig bioinformatics

libbigwig's Issues

Infinite loop when creating bw file

Hi there,

Firstly thank you again for this library, it is helping us dramatically speed up our bw file processing.
I have used libBigWig to create a set of tools to manipulate and generate bw files. We have recently run into what seems to be an infinite loop. See cancerit/cgpBigWig#9

Ive done some detective work (using libBigWig master branch) and it seems that the code is getting stuck here https://github.com/dpryan79/libBigWig/blob/master/bwWrite.c#L928-L934 .

Initially I though this could be due to the fact we are missing contigs (Y for example) from the input bed that are present in the .fai file. Some print statements in libBigWig showed me that the code was hitting and infinite loop at the input file line 13 19020094 115108598 4 (as seen in the cgpBigWig issue linked above cancerit/cgpBigWig#9), so the code isn't even reaching the end of the file and therefore finding no issue with missing contigs.

Here is the c code utilising libBigWig and throwing the error https://github.com/cancerit/cgpBigWig/blob/develop/c/bg2bw.c . I haven't been able to find anything obvious causing this infinite loop in my code, so I'm wondering if I've found an edge case in libBigWig, or perhaps that an experienced eye could tell me where I've gone wrong. For the record I have also tried using an bwAppendIntervals where the last contig retrieved from the input bed matches the current contig when looping through the bed file and this hasn't solved the issue.

@keiranmraine has an interest in this too

Trouble with files on Amazon S3

Hello @dpryan79 ,

It would appear that libBigWig struggles with remote files on S3. e.g, with the following code (compiled to toto):

#include "bigWig.h"

void main(int c, char ** argv) {
bwOpen(argv[1], NULL, "r");
}

I get:

./toto https://encode-public.s3.amazonaws.com/2017/10/03/ad2c0f17-0824-4647-a749-74276daca7da/ENCFF278CUB.bigWig
Segmentation fault
wget https://encode-public.s3.amazonaws.com/2017/10/03/ad2c0f17-0824-4647-a749-74276daca7da/ENCFF278CUB.bigWig
[...]
./toto ENCFF278CUB.bigWig

Conversely, the Kent library has no issue with these remote bigWig files via CURL.

Would it be possible for libBigWig to handle these remote files on AWS?

Thank you,

@dzerbino

Clarification over large buffer sizes sought

Hi. I encountered an issue yesterday when I accidentally set my buffer size to 8MB. It had the effect of quadrupling the cost of loading a bigwig file over HTTP. Once I noticed the problem and set it to the recommended size (1<<17) then performance was more reasonable. However I'm unclear as to how the buffer is being used internally and the impact it had on performance. When you request a portion of a remote file do you request a range that's the start+buffer size or is there something more cunning going on in the background?

Thanks and sorry for the request for clarification. I had thought I had totally messed up my Perl XS bindings for a bit of yesterday because they were worryingly slower than the equivalent kent bindings.

Segmentation fault with sparse data

Hi there @dpryan79

We have encountered a seg fault when trying to create (very) sparse bed files. In this case coverage 0 for a contig.
We are generating bigwig files on a per contig level then merging them later (because in some cases we are trying to create bed files for assemblies with ~35k contigs). In this instance the issue was encountered using Rat Rnor 5.0 on chromosome 2 where there was a value of zero for the whole contig. The seg fault was:

Invalid read of size 8
==24547==    at 0x411DC9: constructZoomLevels (bwWrite.c:1008)
==24547==    by 0x412BBD: bwFinalize (bwWrite.c:1228)
==24547==    by 0x40DE66: bwClose (bwRead.c:287)
==24547==    by 0x40AC16: main (bam2bw.c:364)
==24547==  Address 0x0 is not stack'd, malloc'd or (recently) free'd

I traced this to fp->hdr->nLevels not being a value of 0 where the size of the region was an entire contig, and the length of the contig exceeds that of the test for overflow ((uint32_t)-1)>>2 .
I believe this patch fixes the issue:

--- ../master/libBigWig/bwWrite.c	2017-05-30 15:43:45.284504000 +0100
+++ test/libBigWig/bwWrite.c	2017-05-30 16:41:04.260661000 +0100
@@ -787,7 +787,10 @@
     //In reality, one level is skipped
     meanBinSize *= 4;
     //N.B., we must ALWAYS check that the zoom doesn't overflow a uint32_t!
-    if(((uint32_t)-1)>>2 < meanBinSize) return 0; //No zoom levels!
+    if(((uint32_t)-1)>>2 < meanBinSize){
+      fp->hdr->nLevels = 0;
+      return 0;
+    }//No zoom levels!
     if(meanBinSize*4 > zoom) zoom = multiplier*meanBinSize;
 
     fp->hdr->zoomHdrs = calloc(1, sizeof(bwZoomHdr_t));

Feature request: iterator

Hello,

I have developed a library to compute whole genome statistics from multiple BigWig files WiggleTools. To do this, it needs to be efficient with memory, and therefore uses iterators intensively.

WiggleTools uses the Kent source tree, but this entails quite a few dependencies that I would like to get rid of. I would be very keen to switch to libBigWig.

Would it be possible to create iterator functions within libBigWig?

Typically, an iterator could be created either as a whole genome iterator, or over a region of interest.

Instead of returning all results in a single bwOverlappingIntervals_t struct, it could return a sequence of bwOverlappingIntervals_t structs, each object covering a number of consecutive compressed blocks on disk. FWIW, my code currently looks like (using Kent functions):

	struct fileOffsetSize *blockList, *block, *beforeGap, *afterGap;

       // Search for linked list of blocks overlapping region of interest 
       blockList = bbiOverlappingBlocks(file_handle, search_tree, chrom, start, finish, NULL);

	for (block = blockList; block; block=afterGap) {
		/* Read contiguous blocks into mergedBuf. */
		fileOffsetSizeFindGap(block, &beforeGap, &afterGap);

		// Little hack to limit the number of blocks read at any time
		struct fileOffsetSize * blockPtr, * prevBlock;
		int blockCounter = 0;
		prevBlock = block;

                // Count max blocks or until you hit a gap in the disk
		for (blockPtr = block; blockPtr != afterGap && blockCounter < MAX_BLOCKS; blockPtr = blockPtr->next) {
			blockCounter++;
			prevBlock = blockPtr;
		}

                // If you stopped before the gap, pretend you hit a gap
		if (blockCounter == MAX_BLOCKS) {
			beforeGap = prevBlock;
			afterGap = blockPtr;
		}

		bits64 mergedSize = beforeGap->offset + beforeGap->size - block->offset;

		if (downloadBlockRun(data, chrom, block, afterGap, mergedSize)) {
			slFreeList(blockList);
			return true;
		}
	}

Thanks in advance for considering my request,

Daniel

Error whilst querying BigWig file: "got an error in bwStatsFromZoom"

Whilst querying for stats from a BigWig file from the Ensembl FTP site I got an error from libBigWig. I found it first in my Perl bindings but I've been able to replicate this in C below:

#include "bigWig.h"
#include <stdio.h>
#include <inttypes.h>
#include <stdlib.h>

int main(int argc, char *argv[]) {
    bigWigFile_t *fp = NULL;
    double *stats = NULL;
    
    char file[] = "http://ftp.ensembl.org/pub/data_files/homo_sapiens/GRCh38/rnaseq/GRCh38.illumina.sk_muscle.1.bam.bw";
    char chrom[] = "21";
    int start = 33031597;
    int end = 34041570;
    int full = 0;
    int bins = 100;
    
    if(bwInit(1<<17) != 0) {
        fprintf(stderr, "Received an error in bwInit\n");
        return 1;
    }
    
    fp = bwOpen(file, NULL, "r");
    if(!fp) {
        fprintf(stderr, "An error occured while opening %s\n", file);
        return 1;
    }
    if(full) 
      stats = bwStatsFromFull(fp, chrom, 33031597, 34041570, bins, mean);
    else 
      stats = bwStats(fp, chrom, 33031597, 34041570, bins, mean);

    if(stats)
      free(stats);

    bwClose(fp);
    bwCleanup();
    return 0;
}

This was compiled using cc -g -Wall -O3 -Wsign-compare -o query query.c libBigWig.a -lcurl -lm -lz. When run the error I currently get back is

got an error in bwStatsFromZoom in the range 33031597-33041696: Operation now in progress

When I switch to requesting full stats the message goes away. The BigWig file in question is a coverage plot from a BAM file and was created using the kent utils. I've also checked with the kent binaries and it's able to report statistics for the requested region.

Any help is appreciated.

Question: is it possible to merge bigwig files?

Hey, I'm trying to write a CLI tool to merge to bigWig files. Is it something that is possible with this library? If so, would you be able to provide a basic example?
Otherwise, do you know where I can find a formal definition of the .bigWig format?

parameter values

hi Devon, can you given guidance on reasonable values / tradeoffs for a few parameters?

  • maxZooms to bwCreateHdr. I assume 7 is reasonable default?
  • blocksPerIteration to bwOverlappingIntervalsIterator. higher == more memory, any other tradeoffs?
    then other questions:
  • how does bwStatsFromFull differ from bwStats ?
  • will it be horribly inefficient to call e.g. bwAppendInterval* with a single interval?

thanks again for the library.

Feature request: BigBed reader

Hello,

Tied in to ticket #11, would it be possible for libBigWig to read BigBeds?

Hopefully, it should not be too painful: the search tree / compression block structure is identical, only the content of the compressed blocks is different.

Thanks in advance for considering my request,

Daniel

Search improvement ?

Hello @dpryan79, I was wondering if this could potentially speed up the library.

During the walkRTreeNodes process, the library seems to iterate through all children to figure out if a block overlaps with the current region or not either in overlapsLeaf or overlapsNonLeaf

Instead can this use a binary search approach to find an overlapping block and search the neighborhood for all overlaps ? I'm trying to think of a scenario why this would fail but let me know if you have any thoughts on this

Rename io.h to BigWig_io.h

Hi,
I have trouble to compile 3rd-party code (hdf5-1.8.18) on a system where I previously installed BigWig. It is unfortunate BigWig introduces io.h header. Please rename it. (Placing it into a subdirectory could help but I assume still in some cases users/compilers will pick this file and not the system-one.)

x86_64-pc-linux-gnu-gcc -DHAVE_CONFIG_H -I. -I/scratch/var/tmp/portage/sci-libs/hdf5-1.8.18/work/hdf5-1.8.18/tools/perform -I../../src  -D_GNU_SOURCE -D_POSIX_C_SOURCE=200112L   -DNDEBUG -UH5_DEBUG_API -I/scratch/var/tmp/portage/sci-libs/hdf5-1.8.18/work/hdf5-1.8.18/src -I/scratch/var/tmp/portage/sci-libs/hdf5-1.8.18/work/hdf5-1.8.18/test -I/scratch/var/tmp/portage/sci-libs/hdf5-1.8.18/work/hdf5-1.8.18/tools/lib  -std=c99 -pedantic -Wall -Wextra -Wundef -Wshadow -Wpointer-arith -Wbad-function-cast -Wcast-qual -Wcast-align -Wwrite-strings -Wconversion -Waggregate-return -Wstrict-prototypes -Wmissing-prototypes -Wmissing-declarations -Wredundant-decls -Wnested-externs -Winline -Wfloat-equal -Wmissing-format-attribute -Wmissing-noreturn -Wpacked -Wdisabled-optimization -Wformat=2 -Wunreachable-code -Wendif-labels -Wdeclaration-after-statement -Wold-style-definition -Winvalid-pch -Wvariadic-macros -Winit-self -Wmissing-include-dirs -Wswitch-default -Wswitch-enum -Wunused-macros -Wunsafe-loop-optimizations -Wc++-compat -Wstrict-overflow -Wlogical-op -Wlarger-than=2048 -Wvla -Wsync-nand -Wframe-larger-than=16384 -Wpacked-bitfield-compat -Wstrict-overflow=5 -Wjump-misses-init -Wunsuffixed-float-constants -Wdouble-promotion -Wsuggest-attribute=const -Wtrampolines -Wstack-usage=8192 -Wvector-operation-performance -Wsuggest-attribute=pure -Wsuggest-attribute=noreturn -Wsuggest-attribute=format -Wdate-time -Wopenmp-simd -O3 -O2 -pipe -mpclmul -mpopcnt -march=native -ftree-vectorize -c -o zip_perf.o /scratch/var/tmp/portage/sci-libs/hdf5-1.8.18/work/hdf5-1.8.18/tools/perform/zip_perf.c
In file included from /scratch/var/tmp/portage/sci-libs/hdf5-1.8.18/work/hdf5-1.8.18/tools/perform/overhead.c:37:0:
/usr/include/io.h:22:6: error: nested redefinition of 'enum bigWigFile_type_enum'
 enum bigWigFile_type_enum {
      ^
/usr/include/io.h:22:6: error: redeclaration of 'enum bigWigFile_type_enum'
In file included from /scratch/var/tmp/portage/sci-libs/hdf5-1.8.18/work/hdf5-1.8.18/src/H5private.h:149:0,
                 from /scratch/var/tmp/portage/sci-libs/hdf5-1.8.18/work/hdf5-1.8.18/tools/perform/overhead.c:26:
/usr/include/io.h:22:6: note: originally defined here
 enum bigWigFile_type_enum {
      ^
In file included from /scratch/var/tmp/portage/sci-libs/hdf5-1.8.18/work/hdf5-1.8.18/tools/perform/overhead.c:37:0:
/usr/include/io.h:23:5: error: redeclaration of enumerator 'BWG_FILE'
     BWG_FILE = 0,
     ^
In file included from /scratch/var/tmp/portage/sci-libs/hdf5-1.8.18/work/hdf5-1.8.18/src/H5private.h:149:0,
                 from /scratch/var/tmp/portage/sci-libs/hdf5-1.8.18/work/hdf5-1.8.18/tools/perform/overhead.c:26:
/usr/include/io.h:23:5: note: previous definition of 'BWG_FILE' was here
     BWG_FILE = 0,
     ^
In file included from /scratch/var/tmp/portage/sci-libs/hdf5-1.8.18/work/hdf5-1.8.18/tools/perform/overhead.c:37:0:
/usr/include/io.h:24:5: error: redeclaration of enumerator 'BWG_HTTP'
     BWG_HTTP = 1,
     ^

There isn't that many tools using BigWig yet so it is doable. Thank you.

MacOS test broken

Hello,

it seems that differences in Mac break tests that otherwise work fine in Linux, I'm guessing due to the use of explicit MD5 checksums:

disorientation:libBigWig dzerbino$ make test
./test/test.py
Traceback (most recent call last):
File "./test/test.py", line 14, in
assert(md5sum == "a15cbf0021f3e80d9ddfd9dbe78057cf")
AssertionError
make: *** [test] Error 1

Cheers,

Daniel

Problem with compilation due to curl.h

Hi, I'm stuck with this install problem. I would appreciated any advice.

I'm on Debian working in conda(4.8.1) env.

I Did before make :

conda install -c anaconda curl
conda install -c anaconda libcurl

make install

/home/jean-philippe.villemin/bin/anaconda3/bin/x86_64-conda_cos6-linux-gnu-cc -I. -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -c -o io.o io.c
io.c:2:10: fatal error: curl/curl.h: No such file or directory
#include <curl/curl.h>
^~~~~~~~~~~~~
compilation terminated.
make: *** [Makefile:33: io.o] Error 1

locate curl.h

/home/jean-philippe.villemin/bin/anaconda3/include/curl/curl.h
/home/jean-philippe.villemin/bin/anaconda3/pkgs/libcurl-7.65.2-h20c2e04_0/include/curl/curl.h
/home/jean-philippe.villemin/bin/anaconda3_4.5.11/envs/majiq_env/include/curl/curl.h
/home/jean-philippe.villemin/bin/anaconda3_4.5.11/envs/outrigger-env/include/curl/curl.h
/home/jean-philippe.villemin/bin/anaconda3_4.5.11/envs/python2/include/curl/curl.h
/home/jean-philippe.villemin/bin/anaconda3_4.5.11/envs/r_env/include/curl/curl.h
/home/jean-philippe.villemin/bin/anaconda3_4.5.11/include/curl/curl.h
/home/jean-philippe.villemin/bin/anaconda3_4.5.11/pkgs/curl-7.52.1-0/include/curl/curl.h
/home/jean-philippe.villemin/bin/anaconda3_4.5.11/pkgs/curl-7.55.1-h78862de_4/include/curl/curl.h
/home/jean-philippe.villemin/bin/anaconda3_4.5.11/pkgs/curl-7.55.1-hcb0b314_2/include/curl/curl.h
/home/jean-philippe.villemin/bin/anaconda3_4.5.11/pkgs/libcurl-7.61.0-h1ad7b7a_0/include/curl/curl.h
/home/jean-philippe.villemin/bin/anaconda3_4.5.11/pkgs/libcurl-7.64.0-h01ee5af_0/include/curl/curl.h
/home/jean-philippe.villemin/bin/anaconda3_4.5.11/pkgs/libcurl-7.65.3-h20c2e04_0/include/curl/curl.h
/home/jean-philippe.villemin/bin/cmake-3.8.1/Utilities/cm_curl.h
/home/jean-philippe.villemin/bin/cmake-3.8.1/Utilities/cmcurl/include/curl/curl.h
/home/jean-philippe.villemin/bin/packages_R-3.3.1/include/curl/curl.h
/usr/include/curl/curl.h

echo $C_INCLUDE_PATH

/home/jean-philippe.villemin/bin/anaconda3/include/curl:/home/jean-philippe.villemin/bin/gsl-2.3/bin/include:/home/jean-philippe.villemin/bin/libBigWig/bin/include:/home/jean-philippe.villemin/bin/htslib/bin/include

echo $LD_LIBRARY_PATH

/home/jean-philippe.villemin/bin/jdk1.8.0_101/jre/lib/amd64:/home/jean-philippe.villemin/bin/jdk1.8.0_101/jre/lib/amd64/server:/home/jean-philippe.villemin/bin/anaconda3/lib/libreadline.so.6:/home/jean-philippe.villemin/bin/anaconda3/lib/libpng16.so.16:/home/jean-philippe.villemin/bin/jdk1.8.0_101/jre/lib/amd64:/home/jean-philippe.villemin/bin/jdk1.8.0_101/jre/lib/amd64/server:/home/jean-philippe.villemin/bin/anaconda3/lib/libreadline.so.6:/home/jean-philippe.villemin/bin/anaconda3/lib/libpng16.so.16::/home/jean-philippe.villemin/bin/libBigWig/bin/lib:/home/jean-philippe.villemin/bin/gsl-2.3/bin/lib:/home/jean-philippe.villemin/bin/htslib/bin/lib:/home/jean-philippe.villemin/bin/libBigWig/bin/lib:/home/jean-philippe.villemin/bin/gsl-2.3/bin/lib:/home/jean-philippe.villemin/bin/htslib/bin/lib

Check validity of added entries

There's currently no sanity checking performed when writing a new file. So, someone could specify entries out of order and not get an error message immediately!

Makefile: install target does not create target dirs

The Makefile's install target assumes that $(prefix)/lib and $(prefix)/include already exist. This is not necessarily the case when installing to a non-standard prefix.

It would be better if the target directories were created before installing files to them, e.g. by applying this patch to the Makefile:

From 43f628598dc3478a7a823c46d1d7e5985611045c Mon Sep 17 00:00:00 2001
From: Ricardo Wurmus <[email protected]>
Date: Thu, 25 Feb 2016 10:49:28 +0100
Subject: [PATCH] Create target directories before installing to them.

---
 Makefile | 1 +
 1 file changed, 1 insertion(+)

diff --git a/Makefile b/Makefile
index e1faaf4..731bbf8 100644
--- a/Makefile
+++ b/Makefile
@@ -68,6 +68,7 @@ clean:
    rm -f *.o libBigWig.a libBigWig.so *.pico test/testLocal test/testRemote test/testWrite test/exampleWrite example_output.bw

 install: libBigWig.a libBigWig.so
+   install -d $(prefix)/lib $(prefix)/include
    install libBigWig.a $(prefix)/lib
    install libBigWig.so $(prefix)/lib
    install *.h $(prefix)/include
-- 
2.1.0

What do you think?

Clarification sought on coordinate schemas and positions

Hi

I've been looking at building a library for accessing Big files from Perl using your library. It's gone pretty well to be honest but I have some questions about your interpretation of coordinates that's not clear from the documentation. I've pasted in an example of the docs from one of your functions below with some bits removed:

/*!
 * @brief Return bigWig entries overlapping an interval.
 * @param start The start position of the interval. This is 0-based half open, so 0 is the first base.
 * @param end The end position of the interval. Again, this is 0-based half open, so 100 will include the 100th base...which is at position 99.
 */
bwOverlappingIntervals_t *bwGetOverlappingIntervals(bigWigFile_t *fp, char *chrom, uint32_t start, uint32_t end);

I think it's your use of ...which is at position 99 is confusing me. 0-based, half open to me would suggest if you use 100 as your end value you will get the 100th base and its value should always be 100. Unless you referring to the location of base 100's values in the arrays passed back by the routine in the bwOverlappingIntervals_t struct.

Also I'm aware that when parsing BigWigs their use of coordinates can differ based on their source data. Those derived from bedGraphs retain their 0-based, half-open system where fixed and variable step use 1-start, fully-closed. I had a poke in the code and can see some mention of this but I'm unsure if you handle this internally so we need only to have to work in 0-based, half open coordinates.

Thanks and sorry for the badgering.

URL with Temporary Redirect

I am trying to fetch some bigWig file from ENCODE project, seems they hosted their files at Amazon S3, using testRemote gives me error. wondering is there a solution? Thanks.

$ ./testRemote https://www.encodeproject.org/files/ENCFF188HKC/@@download/ENCFF188HKC.bigWig
[bwHdrRead] There was an error while reading in the header!
An error occured while opening https://www.encodeproject.org/files/ENCFF188HKC/@@download/ENCFF188HKC.bigWig
$ curl -I https://www.encodeproject.org/files/ENCFF188HKC/@@download/ENCFF188HKC.bigWig
HTTP/1.1 307 Temporary Redirect
Server: nginx/1.10.1
Date: Fri, 17 Mar 2017 21:08:37 GMT
Content-Type: text/html; charset=UTF-8
Content-Length: 1196
Connection: keep-alive
X-Request-URL: https://www.encodeproject.org/files/ENCFF188HKC/@@download/ENCFF188HKC.bigWig
X-Stats: es_count=1&es_time=7226&queue_begin=1489784917370001&queue_time=830&rss_begin=492408832&rss_change=0&rss_end=492408832&wsgi_begin=1489784917370831&wsgi_end=1489784917389025&wsgi_time=18194
Access-Control-Allow-Origin: *
Access-Control-Allow-Methods: GET, HEAD
Access-Control-Allow-Headers: Accept, Origin, Range, X-Requested-With
Access-Control-Expose-Headers: Content-Length, Content-Range, Content-Type
Location: https://download.encodeproject.org/http://encode-files.s3.amazonaws.com/2017/03/05/d7be9e16-e742-4554-9e9b-347834665817/ENCFF188HKC.bigWig?Signature=Tb43S%2BXE%2BOT0jVVRYn1E1amZZss%3D&Expires=1489914517&AWSAccessKeyId=ASIAIBGS2LKKZLYIOLVA&response-content-disposition=attachment%3B%20filename%3DENCFF188HKC.bigWig&x-amz-security-token=FQoDYXdzEB4aDKo3b1EtKQK0xzlrqCK3A8jMrctRMooXvbFhPZaBtN46iqYhdsIuZVnmCBYphXlMoRFfa%2B7dyVq1ICoFY7d6wrVj2sKHs4VfVMOYlRJOHonPlRj9BvF5DYR8EHZaItBq4ouDlkzOYcrCNbo36uR1IP%2BsDlX8vwqn7hw6ri/wtQYjReE35P8wyG7D3cN4cHZFm2bAmd4xfS6o7vsgh21LfSHjhKIg%2BoQqPoxZwdNB64qlUBrKYo%2BnhDQdKDceMc/0GB9NJqy1U1n0kaXitFHSwg88LzgXR/CY2Eyk/tQVcScceLERWAupB9nLyVpsVH1uSOumFhwcSf1FXEyqFCKWf4jgUqBHJ7T7kfUHmKcLP8VbJgQs0/TB7q8OY0fn7lzugK4kTXkF3GoGdI8aUwNBo2VuA7Z1S0ldUntTMHeh%2Bl9x9nLETbzmVSOB/qIlP%2BZimKVJTMszPoj57cqhxAUd/%2B6xfdiZbjjoh6NSY5rI%2BfMWfTqZS5my6aCIVXTjkg26Bfm4HGl04bNXhJp7uTiIeorwMex0yfSxFJnybunDaLv9fuFmqsOpY7tOXNzmbina6Tp38euVaLnHD/9rAKNpTu6C7pvyUWQohJ6xxgU%3D
Strict-Transport-Security: max-age=15768000

Segmentation fault in url_fread

I am getting a segfault after multiple region requests to a remote file. The same file has no issues when accessed locally:

Stacktrace:

(gdb) bt
#0  __memcpy_sse2_unaligned () at ../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S:152
#1  0x00007fbf78d8167a in url_fread (obuf=obuf@entry=0x4b802f0, obufSize=obufSize@entry=3445, URL=0x4b48a00) at io.c:60
#2  0x00007fbf78d81788 in urlRead (URL=<optimized out>, buf=buf@entry=0x4b802f0, bufSize=bufSize@entry=3445) at io.c:94
#3  0x00007fbf78d83efb in bwRead (data=data@entry=0x4b802f0, sz=3445, nmemb=nmemb@entry=1, fp=fp@entry=0x4b45640) at bwRead.c:29
#4  0x00007fbf78d82b9f in bwGetOverlappingIntervalsCore (fp=fp@entry=0x4b45640, o=o@entry=0x4b810c0, tid=tid@entry=21,
    ostart=ostart@entry=325665, oend=oend@entry=325742) at bwValues.c:422
#5  0x00007fbf78d8358e in bwGetOverlappingIntervals (fp=fp@entry=0x4b45640, chrom=chrom@entry=0x4b49240 "", start=start@entry=325665,
    end=end@entry=325742) at bwValues.c:568
#6  0x00007fbf78d83a0a in bwGetValues (fp=fp@entry=0x4b45640, chrom=chrom@entry=0x4b49240 "", start=start@entry=325665,
    end=end@entry=325742, includeNA=includeNA@entry=1) at bwValues.c:715

The reproduction steps are a bit tricky - I can't share the data I have, but after querying a list of 3 regions in one particular order it causes a segfault. If I reverse the order, or add some more requests in the middle no segfault occurs.

My code is:

#include "bigWig.h"

int main(int argc, char *argv[]) {
    bigWigFile_t *fp = NULL;
    bwOverlappingIntervals_t *intervals = NULL;
    double *stats = NULL;
    if(argc != 2) {
        fprintf(stderr, "Usage: %s {file.bw|URL://path/file.bw}\n", argv[0]);
        return 1;
    }

    //Initialize enough space to hold 131072 bytes (taken from Bio::DB::Big default value)
    if(bwInit(131072) != 0) {
        fprintf(stderr, "Received an error in bwInit\n");
        return 1;
    }

    //Open the local/remote file
    fp = bwOpen(argv[1], NULL, "r");
    if(!fp) {
        fprintf(stderr, "An error occured while opening %s\n", argv[1]);
        return 1;
    }

    fprintf(stderr, "Fetching regions\n");

    intervals = bwGetValues(fp, "9", 214972, 215034, 1);
    bwDestroyOverlappingIntervals(intervals);

    intervals = bwGetValues(fp, "9", 317038, 317133, 1);
    bwDestroyOverlappingIntervals(intervals);

    intervals = bwGetValues(fp, "9", 325666, 325742, 1);
    bwDestroyOverlappingIntervals(intervals);

    bwClose(fp);
    bwCleanup();
    return 0;
}

Each interval request on its own does not cause a failure, just the 3 one after the other.

the URL->bufPos ends up larger than the URL->bufLen, which leads to a nonsense memcpy request:

  [url_fread] memBuf: 4425691136 bufPos: 9257, bufLen: 2249, remaining: 3445
  [url_fread] memcpy start: 4425700393 memcpy end: 18446744073709544608

in the

} else if(URL->bufLen < URL->bufPos + remaining) {

block of url_fread

Any ideas why this might be?

usr/bin/ld: cannot find -lz when installing libBigWig

I am trying to install libBigWig, as it is as dependency of WiggleTools. libBigWig requires “zlib”, “bzip2” and “libcurl”, which I have them installed already. When I try to install libBigWig with the following commands:

git clone https://github.com/dpryan79/libBigWig.git
cd libBigWig
make install

I get the following:

/home/dp456/miniconda3/envs/py37/bin/x86_64-conda-linux-gnu-cc -I. -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /home/dp456/miniconda3/envs/py37/include -DNOCURL  -c -o io.o io.c
/home/dp456/miniconda3/envs/py37/bin/x86_64-conda-linux-gnu-cc -I. -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /home/dp456/miniconda3/envs/py37/include -DNOCURL  -c -o bwValues.o bwValues.c
/home/dp456/miniconda3/envs/py37/bin/x86_64-conda-linux-gnu-cc -I. -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /home/dp456/miniconda3/envs/py37/include -DNOCURL  -c -o bwRead.o bwRead.c
/home/dp456/miniconda3/envs/py37/bin/x86_64-conda-linux-gnu-cc -I. -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /home/dp456/miniconda3/envs/py37/include -DNOCURL  -c -o bwStats.o bwStats.c
/home/dp456/miniconda3/envs/py37/bin/x86_64-conda-linux-gnu-cc -I. -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /home/dp456/miniconda3/envs/py37/include -DNOCURL  -c -o bwWrite.o bwWrite.c
/home/dp456/miniconda3/envs/py37/bin/x86_64-conda-linux-gnu-ar -rcs libBigWig.a io.o bwValues.o bwRead.o bwStats.o bwWrite.o
/home/dp456/miniconda3/envs/py37/bin/x86_64-conda-linux-gnu-ranlib libBigWig.a
/home/dp456/miniconda3/envs/py37/bin/x86_64-conda-linux-gnu-cc -I. -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /home/dp456/miniconda3/envs/py37/include -DNOCURL  -fpic -c -o io.pico io.c
/home/dp456/miniconda3/envs/py37/bin/x86_64-conda-linux-gnu-cc -I. -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /home/dp456/miniconda3/envs/py37/include -DNOCURL  -fpic -c -o bwValues.pico bwValues.c
/home/dp456/miniconda3/envs/py37/bin/x86_64-conda-linux-gnu-cc -I. -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /home/dp456/miniconda3/envs/py37/include -DNOCURL  -fpic -c -o bwRead.pico bwRead.c
/home/dp456/miniconda3/envs/py37/bin/x86_64-conda-linux-gnu-cc -I. -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /home/dp456/miniconda3/envs/py37/include -DNOCURL  -fpic -c -o bwStats.pico bwStats.c
/home/dp456/miniconda3/envs/py37/bin/x86_64-conda-linux-gnu-cc -I. -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /home/dp456/miniconda3/envs/py37/include -DNOCURL  -fpic -c -o bwWrite.pico bwWrite.c
/home/dp456/miniconda3/envs/py37/bin/x86_64-conda-linux-gnu-cc -shared  -o libBigWig.so io.pico bwValues.pico bwRead.pico bwStats.pico bwWrite.pico  -lm -lz
/home/dp456/miniconda3/envs/py37/bin/../lib/gcc/x86_64-conda-linux-gnu/7.5.0/../../../../x86_64-conda-linux-gnu/bin/ld: cannot find -lz
collect2: error: ld returned 1 exit status
make: *** [Makefile:60: libBigWig.so] Error 1

Now, I am still at the beginning of my learning curve in informatics, but as far as I can understand, it seems to be an issue with the ld linker not finding zlib. This thread here suggested installing:

sudo apt-get install zlib1g-dev
sudo apt-get install libz-dev
sudo apt-get install lib32z1-dev
sudo apt-get install zlib*

but the error still persists after trying all variations. What else can I do?? I also tried to install WiggleTools with conda, but that doesn't work as well, because there are several conflicts with other packages.

This is the OS/version of the server that I’m using it:

Welcome to Ubuntu 18.04.5 LTS (GNU/Linux 4.15.0-197-generic x86_64)

Support for extraIndex fields in BigBed files

Hi there

We've started to use BigBed and libBigWig for a lot of our flat file reading and have started to use the extraIndex feature (see the kent source for bigBedNamedItems). Currently we have to shell back out to run bigBedNamedItems to extract rows of the BigBed file we are interested in. Is there any possibility of supporting the extra indexes? It looks from https://github.com/ucscGenomeBrowser/kent/blob/master/src/lib/bigBed.c#L635 they're accessible from the file but also they require some AutoSQL parsing to process.

Anyway a general idea of how plausible support for these are would be really appreciated. Even if it's a case of no way.

Thanks

Slow loading a remote BigWig file

Sorry I seem to be finding a lot of these today. Anyway same stub program as before expect I've switched to using a UCSC example bigwig file hosted at genome.ucsc.edu. The following code took over 3 minutes to return:

#include "bigWig.h"
#include <stdio.h>
#include <inttypes.h>
#include <stdlib.h>

int main(int argc, char *argv[]) {
    bigWigFile_t *fp = NULL;
    double *stats = NULL;
    
    char file[] = "http://genome.ucsc.edu/goldenPath/help/examples/bigWigExample.bw";
    char chrom[] = "chr21";
    int start = 33031597;
    int end = 34041570;
    int full = 1;
    int bins = 100;
    
    int buffer = 1<<17;
    
    if(bwInit(buffer) != 0) {
        fprintf(stderr, "Received an error in bwInit\n");
        return 1;
    }
    
    fp = bwOpen(file, NULL, "r");
    if(!fp) {
        fprintf(stderr, "An error occured while opening %s\n", file);
        return 1;
    }
    if(full) 
      stats = bwStatsFromFull(fp, chrom, start, end, bins, mean);
    else 
      stats = bwStats(fp, chrom, start, end, bins, mean);

    if(stats)
      free(stats);

    bwClose(fp);
    bwCleanup();
    return 0;
}

My Perl code had similar issues and the following Python code had the same issues

#!/usr/bin/env python

import pyBigWig
bw = pyBigWig.open("http://genome.ucsc.edu/goldenPath/help/examples/bigWigExample.bw")
bw.stats('chr21', 33031597, 34041570, type="mean", nBins=100, exact=True)

Doing a sample of the C process in OSX showed that 100% of the CPU time is spent in __select from the system kernel and the next level up was Curl_poll so I think it's spending its time transferring. I did some info dumps on the BigWig and this is what I got back

$ bigWigInfo http://genome.ucsc.edu/goldenPath/help/examples/bigWigExample.bw
version: 1
isCompressed: no
isSwapped: 0
primaryDataSize: 56,335,300
primaryIndexSize: 227,048
zoomLevels: 7
chromCount: 1
basesCovered: 35,926,161
mean: 40.862151
min: 0.000000
max: 100.000000
std: 22.968515

The only bit I think is odd here is that it's an uncompressed bigwig file. I also tried mucking around with the buffer size and set it to 8MB. That brought the total time spent down to 15 seconds so I'm guessing there's a problem/limitation with uncompressed BigWigs and streaming them across the wire.

high memory usage during indexing

the memory use of libbigwig gets quite high during indexing.
is there anyway to reduce this? for example by flushing each chromosome to disk after the index for that tid is created in constructZoomLevels?

Optionally check new entries for consistency/sanity

I do this in pyBigWig, but I'm told that some programs using this do not check to ensure that intervals are entered in a sane order. The various add/append functions should add a checkConsistency parameter. This will result in a new minor version, due to the change in API. I should also start adding .1 or whatever to the .so file.

Test failures on other architectures

Hello there,

In Debian we run some of the tests you've included in test/ as autopkgtest for libbigwig package.
This is how we are running the tests in Debian

echo "1c52065211fdc44eea45751a9cbfffe0 test/Local.bw" >> checksums
echo "8e116bd114ffd2eb625011d451329c03 test/Write.bw" >> checksums
echo "ef104f198c6ce8310acc149d0377fc16 test/example_output.bw" >> checksums

LIB_STATIC=`find /usr/lib -name libBigWig.a`

#Compile using installed libbigwig
echo 'Compiling ...'
gcc -g -Wall   test/testLocal.c ${LIB_STATIC} -lBigWig -lcurl -lm -lz -o testlocal
gcc -g -Wall   test/testWrite.c ${LIB_STATIC} -lBigWig -lm -lz -lcurl -o testWrite
gcc -g -Wall   test/exampleWrite.c ${LIB_STATIC} -lBigWig -lcurl -lm -lz -o examplewrite

echo '------------------------------'

echo -e "Test 1"
./testlocal test/test.bw > test/Local.bw

echo -e "Test 2"
./testWrite test/test.bw test/Write.bw

echo -e "Test 3"
./examplewrite

md5sum --check checksums
echo -e "PASS"

Currently the tests are failing for i386 and s390 architectures.

The errors that we are getting are as follows

i386:

Compiling ...
------------------------------
Test 1
Test 2
Test 3
test/Local.bw: FAILED
test/Write.bw: FAILED
test/example_output.bw: OK
md5sum: WARNING: 2 computed checksums did NOT match

The output files generated have different hashsum. Is this expected ?

s390 (Big Endian) :

Compiling ...
------------------------------
Test 1
testlocal: test/testLocal.c:107: main: Assertion `bwIsBigWig(argv[1], NULL) == 1' failed.
/tmp/autopkgtest-lxc.y_h72gqc/downtmp/build.e6I/src/debian/tests/run-unit-test: line 32:  1996 Aborted                 ./testlocal test/test.bw > test/Local.bw

It would be helpful if you could help in debugging these failures :)

Thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.