dpryan79 / libbigwig Goto Github PK
View Code? Open in Web Editor NEWA C library for handling bigWig files
License: MIT License
A C library for handling bigWig files
License: MIT License
Hi there,
Firstly thank you again for this library, it is helping us dramatically speed up our bw file processing.
I have used libBigWig to create a set of tools to manipulate and generate bw files. We have recently run into what seems to be an infinite loop. See cancerit/cgpBigWig#9
Ive done some detective work (using libBigWig master branch) and it seems that the code is getting stuck here https://github.com/dpryan79/libBigWig/blob/master/bwWrite.c#L928-L934 .
Initially I though this could be due to the fact we are missing contigs (Y for example) from the input bed that are present in the .fai file. Some print statements in libBigWig showed me that the code was hitting and infinite loop at the input file line 13 19020094 115108598 4
(as seen in the cgpBigWig issue linked above cancerit/cgpBigWig#9), so the code isn't even reaching the end of the file and therefore finding no issue with missing contigs.
Here is the c code utilising libBigWig and throwing the error https://github.com/cancerit/cgpBigWig/blob/develop/c/bg2bw.c . I haven't been able to find anything obvious causing this infinite loop in my code, so I'm wondering if I've found an edge case in libBigWig, or perhaps that an experienced eye could tell me where I've gone wrong. For the record I have also tried using an bwAppendIntervals where the last contig retrieved from the input bed matches the current contig when looping through the bed file and this hasn't solved the issue.
@keiranmraine has an interest in this too
Hello @dpryan79 ,
It would appear that libBigWig struggles with remote files on S3. e.g, with the following code (compiled to toto
):
#include "bigWig.h"
void main(int c, char ** argv) {
bwOpen(argv[1], NULL, "r");
}
I get:
./toto https://encode-public.s3.amazonaws.com/2017/10/03/ad2c0f17-0824-4647-a749-74276daca7da/ENCFF278CUB.bigWig
Segmentation fault
wget https://encode-public.s3.amazonaws.com/2017/10/03/ad2c0f17-0824-4647-a749-74276daca7da/ENCFF278CUB.bigWig
[...]
./toto ENCFF278CUB.bigWig
Conversely, the Kent library has no issue with these remote bigWig files via CURL.
Would it be possible for libBigWig to handle these remote files on AWS?
Thank you,
Hi. I encountered an issue yesterday when I accidentally set my buffer size to 8MB. It had the effect of quadrupling the cost of loading a bigwig file over HTTP. Once I noticed the problem and set it to the recommended size (1<<17
) then performance was more reasonable. However I'm unclear as to how the buffer is being used internally and the impact it had on performance. When you request a portion of a remote file do you request a range that's the start+buffer size or is there something more cunning going on in the background?
Thanks and sorry for the request for clarification. I had thought I had totally messed up my Perl XS bindings for a bit of yesterday because they were worryingly slower than the equivalent kent bindings.
Hi,
Is it possible to add cache for remote bigwig and bigbed files, for faster access next time?
thanks.
Hi there @dpryan79
We have encountered a seg fault when trying to create (very) sparse bed files. In this case coverage 0 for a contig.
We are generating bigwig files on a per contig level then merging them later (because in some cases we are trying to create bed files for assemblies with ~35k contigs). In this instance the issue was encountered using Rat Rnor 5.0 on chromosome 2 where there was a value of zero for the whole contig. The seg fault was:
Invalid read of size 8
==24547== at 0x411DC9: constructZoomLevels (bwWrite.c:1008)
==24547== by 0x412BBD: bwFinalize (bwWrite.c:1228)
==24547== by 0x40DE66: bwClose (bwRead.c:287)
==24547== by 0x40AC16: main (bam2bw.c:364)
==24547== Address 0x0 is not stack'd, malloc'd or (recently) free'd
I traced this to fp->hdr->nLevels
not being a value of 0 where the size of the region was an entire contig, and the length of the contig exceeds that of the test for overflow ((uint32_t)-1)>>2
.
I believe this patch fixes the issue:
--- ../master/libBigWig/bwWrite.c 2017-05-30 15:43:45.284504000 +0100
+++ test/libBigWig/bwWrite.c 2017-05-30 16:41:04.260661000 +0100
@@ -787,7 +787,10 @@
//In reality, one level is skipped
meanBinSize *= 4;
//N.B., we must ALWAYS check that the zoom doesn't overflow a uint32_t!
- if(((uint32_t)-1)>>2 < meanBinSize) return 0; //No zoom levels!
+ if(((uint32_t)-1)>>2 < meanBinSize){
+ fp->hdr->nLevels = 0;
+ return 0;
+ }//No zoom levels!
if(meanBinSize*4 > zoom) zoom = multiplier*meanBinSize;
fp->hdr->zoomHdrs = calloc(1, sizeof(bwZoomHdr_t));
Hello,
I have developed a library to compute whole genome statistics from multiple BigWig files WiggleTools. To do this, it needs to be efficient with memory, and therefore uses iterators intensively.
WiggleTools uses the Kent source tree, but this entails quite a few dependencies that I would like to get rid of. I would be very keen to switch to libBigWig.
Would it be possible to create iterator functions within libBigWig?
Typically, an iterator could be created either as a whole genome iterator, or over a region of interest.
Instead of returning all results in a single bwOverlappingIntervals_t struct, it could return a sequence of bwOverlappingIntervals_t structs, each object covering a number of consecutive compressed blocks on disk. FWIW, my code currently looks like (using Kent functions):
struct fileOffsetSize *blockList, *block, *beforeGap, *afterGap;
// Search for linked list of blocks overlapping region of interest
blockList = bbiOverlappingBlocks(file_handle, search_tree, chrom, start, finish, NULL);
for (block = blockList; block; block=afterGap) {
/* Read contiguous blocks into mergedBuf. */
fileOffsetSizeFindGap(block, &beforeGap, &afterGap);
// Little hack to limit the number of blocks read at any time
struct fileOffsetSize * blockPtr, * prevBlock;
int blockCounter = 0;
prevBlock = block;
// Count max blocks or until you hit a gap in the disk
for (blockPtr = block; blockPtr != afterGap && blockCounter < MAX_BLOCKS; blockPtr = blockPtr->next) {
blockCounter++;
prevBlock = blockPtr;
}
// If you stopped before the gap, pretend you hit a gap
if (blockCounter == MAX_BLOCKS) {
beforeGap = prevBlock;
afterGap = blockPtr;
}
bits64 mergedSize = beforeGap->offset + beforeGap->size - block->offset;
if (downloadBlockRun(data, chrom, block, afterGap, mergedSize)) {
slFreeList(blockList);
return true;
}
}
Thanks in advance for considering my request,
Daniel
Whilst querying for stats from a BigWig file from the Ensembl FTP site I got an error from libBigWig. I found it first in my Perl bindings but I've been able to replicate this in C below:
#include "bigWig.h"
#include <stdio.h>
#include <inttypes.h>
#include <stdlib.h>
int main(int argc, char *argv[]) {
bigWigFile_t *fp = NULL;
double *stats = NULL;
char file[] = "http://ftp.ensembl.org/pub/data_files/homo_sapiens/GRCh38/rnaseq/GRCh38.illumina.sk_muscle.1.bam.bw";
char chrom[] = "21";
int start = 33031597;
int end = 34041570;
int full = 0;
int bins = 100;
if(bwInit(1<<17) != 0) {
fprintf(stderr, "Received an error in bwInit\n");
return 1;
}
fp = bwOpen(file, NULL, "r");
if(!fp) {
fprintf(stderr, "An error occured while opening %s\n", file);
return 1;
}
if(full)
stats = bwStatsFromFull(fp, chrom, 33031597, 34041570, bins, mean);
else
stats = bwStats(fp, chrom, 33031597, 34041570, bins, mean);
if(stats)
free(stats);
bwClose(fp);
bwCleanup();
return 0;
}
This was compiled using cc -g -Wall -O3 -Wsign-compare -o query query.c libBigWig.a -lcurl -lm -lz
. When run the error I currently get back is
got an error in bwStatsFromZoom in the range 33031597-33041696: Operation now in progress
When I switch to requesting full stats the message goes away. The BigWig file in question is a coverage plot from a BAM file and was created using the kent utils. I've also checked with the kent binaries and it's able to report statistics for the requested region.
Any help is appreciated.
Hey, I'm trying to write a CLI tool to merge to bigWig files. Is it something that is possible with this library? If so, would you be able to provide a basic example?
Otherwise, do you know where I can find a formal definition of the .bigWig
format?
hi Devon, can you given guidance on reasonable values / tradeoffs for a few parameters?
maxZooms
to bwCreateHdr
. I assume 7 is reasonable default?blocksPerIteration
to bwOverlappingIntervalsIterator
. higher == more memory, any other tradeoffs?bwStatsFromFull
differ from bwStats
?bwAppendInterval*
with a single interval?thanks again for the library.
Hello,
Tied in to ticket #11, would it be possible for libBigWig to read BigBeds?
Hopefully, it should not be too painful: the search tree / compression block structure is identical, only the content of the compressed blocks is different.
Thanks in advance for considering my request,
Daniel
Hello @dpryan79, I was wondering if this could potentially speed up the library.
During the walkRTreeNodes
process, the library seems to iterate through all children to figure out if a block overlaps with the current region or not either in overlapsLeaf or overlapsNonLeaf
Instead can this use a binary search approach to find an overlapping block and search the neighborhood for all overlaps ? I'm trying to think of a scenario why this would fail but let me know if you have any thoughts on this
Hi,
I have trouble to compile 3rd-party code (hdf5-1.8.18) on a system where I previously installed BigWig. It is unfortunate BigWig introduces io.h header. Please rename it. (Placing it into a subdirectory could help but I assume still in some cases users/compilers will pick this file and not the system-one.)
x86_64-pc-linux-gnu-gcc -DHAVE_CONFIG_H -I. -I/scratch/var/tmp/portage/sci-libs/hdf5-1.8.18/work/hdf5-1.8.18/tools/perform -I../../src -D_GNU_SOURCE -D_POSIX_C_SOURCE=200112L -DNDEBUG -UH5_DEBUG_API -I/scratch/var/tmp/portage/sci-libs/hdf5-1.8.18/work/hdf5-1.8.18/src -I/scratch/var/tmp/portage/sci-libs/hdf5-1.8.18/work/hdf5-1.8.18/test -I/scratch/var/tmp/portage/sci-libs/hdf5-1.8.18/work/hdf5-1.8.18/tools/lib -std=c99 -pedantic -Wall -Wextra -Wundef -Wshadow -Wpointer-arith -Wbad-function-cast -Wcast-qual -Wcast-align -Wwrite-strings -Wconversion -Waggregate-return -Wstrict-prototypes -Wmissing-prototypes -Wmissing-declarations -Wredundant-decls -Wnested-externs -Winline -Wfloat-equal -Wmissing-format-attribute -Wmissing-noreturn -Wpacked -Wdisabled-optimization -Wformat=2 -Wunreachable-code -Wendif-labels -Wdeclaration-after-statement -Wold-style-definition -Winvalid-pch -Wvariadic-macros -Winit-self -Wmissing-include-dirs -Wswitch-default -Wswitch-enum -Wunused-macros -Wunsafe-loop-optimizations -Wc++-compat -Wstrict-overflow -Wlogical-op -Wlarger-than=2048 -Wvla -Wsync-nand -Wframe-larger-than=16384 -Wpacked-bitfield-compat -Wstrict-overflow=5 -Wjump-misses-init -Wunsuffixed-float-constants -Wdouble-promotion -Wsuggest-attribute=const -Wtrampolines -Wstack-usage=8192 -Wvector-operation-performance -Wsuggest-attribute=pure -Wsuggest-attribute=noreturn -Wsuggest-attribute=format -Wdate-time -Wopenmp-simd -O3 -O2 -pipe -mpclmul -mpopcnt -march=native -ftree-vectorize -c -o zip_perf.o /scratch/var/tmp/portage/sci-libs/hdf5-1.8.18/work/hdf5-1.8.18/tools/perform/zip_perf.c
In file included from /scratch/var/tmp/portage/sci-libs/hdf5-1.8.18/work/hdf5-1.8.18/tools/perform/overhead.c:37:0:
/usr/include/io.h:22:6: error: nested redefinition of 'enum bigWigFile_type_enum'
enum bigWigFile_type_enum {
^
/usr/include/io.h:22:6: error: redeclaration of 'enum bigWigFile_type_enum'
In file included from /scratch/var/tmp/portage/sci-libs/hdf5-1.8.18/work/hdf5-1.8.18/src/H5private.h:149:0,
from /scratch/var/tmp/portage/sci-libs/hdf5-1.8.18/work/hdf5-1.8.18/tools/perform/overhead.c:26:
/usr/include/io.h:22:6: note: originally defined here
enum bigWigFile_type_enum {
^
In file included from /scratch/var/tmp/portage/sci-libs/hdf5-1.8.18/work/hdf5-1.8.18/tools/perform/overhead.c:37:0:
/usr/include/io.h:23:5: error: redeclaration of enumerator 'BWG_FILE'
BWG_FILE = 0,
^
In file included from /scratch/var/tmp/portage/sci-libs/hdf5-1.8.18/work/hdf5-1.8.18/src/H5private.h:149:0,
from /scratch/var/tmp/portage/sci-libs/hdf5-1.8.18/work/hdf5-1.8.18/tools/perform/overhead.c:26:
/usr/include/io.h:23:5: note: previous definition of 'BWG_FILE' was here
BWG_FILE = 0,
^
In file included from /scratch/var/tmp/portage/sci-libs/hdf5-1.8.18/work/hdf5-1.8.18/tools/perform/overhead.c:37:0:
/usr/include/io.h:24:5: error: redeclaration of enumerator 'BWG_HTTP'
BWG_HTTP = 1,
^
There isn't that many tools using BigWig yet so it is doable. Thank you.
Hello,
it seems that differences in Mac break tests that otherwise work fine in Linux, I'm guessing due to the use of explicit MD5 checksums:
disorientation:libBigWig dzerbino$ make test
./test/test.py
Traceback (most recent call last):
File "./test/test.py", line 14, in
assert(md5sum == "a15cbf0021f3e80d9ddfd9dbe78057cf")
AssertionError
make: *** [test] Error 1
Cheers,
Daniel
Hi, I'm stuck with this install problem. I would appreciated any advice.
I'm on Debian working in conda(4.8.1) env.
I Did before make :
conda install -c anaconda curl
conda install -c anaconda libcurl
make install
/home/jean-philippe.villemin/bin/anaconda3/bin/x86_64-conda_cos6-linux-gnu-cc -I. -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -c -o io.o io.c
io.c:2:10: fatal error: curl/curl.h: No such file or directory
#include <curl/curl.h>
^~~~~~~~~~~~~
compilation terminated.
make: *** [Makefile:33: io.o] Error 1
locate curl.h
/home/jean-philippe.villemin/bin/anaconda3/include/curl/curl.h
/home/jean-philippe.villemin/bin/anaconda3/pkgs/libcurl-7.65.2-h20c2e04_0/include/curl/curl.h
/home/jean-philippe.villemin/bin/anaconda3_4.5.11/envs/majiq_env/include/curl/curl.h
/home/jean-philippe.villemin/bin/anaconda3_4.5.11/envs/outrigger-env/include/curl/curl.h
/home/jean-philippe.villemin/bin/anaconda3_4.5.11/envs/python2/include/curl/curl.h
/home/jean-philippe.villemin/bin/anaconda3_4.5.11/envs/r_env/include/curl/curl.h
/home/jean-philippe.villemin/bin/anaconda3_4.5.11/include/curl/curl.h
/home/jean-philippe.villemin/bin/anaconda3_4.5.11/pkgs/curl-7.52.1-0/include/curl/curl.h
/home/jean-philippe.villemin/bin/anaconda3_4.5.11/pkgs/curl-7.55.1-h78862de_4/include/curl/curl.h
/home/jean-philippe.villemin/bin/anaconda3_4.5.11/pkgs/curl-7.55.1-hcb0b314_2/include/curl/curl.h
/home/jean-philippe.villemin/bin/anaconda3_4.5.11/pkgs/libcurl-7.61.0-h1ad7b7a_0/include/curl/curl.h
/home/jean-philippe.villemin/bin/anaconda3_4.5.11/pkgs/libcurl-7.64.0-h01ee5af_0/include/curl/curl.h
/home/jean-philippe.villemin/bin/anaconda3_4.5.11/pkgs/libcurl-7.65.3-h20c2e04_0/include/curl/curl.h
/home/jean-philippe.villemin/bin/cmake-3.8.1/Utilities/cm_curl.h
/home/jean-philippe.villemin/bin/cmake-3.8.1/Utilities/cmcurl/include/curl/curl.h
/home/jean-philippe.villemin/bin/packages_R-3.3.1/include/curl/curl.h
/usr/include/curl/curl.h
echo $C_INCLUDE_PATH
/home/jean-philippe.villemin/bin/anaconda3/include/curl:/home/jean-philippe.villemin/bin/gsl-2.3/bin/include:/home/jean-philippe.villemin/bin/libBigWig/bin/include:/home/jean-philippe.villemin/bin/htslib/bin/include
echo $LD_LIBRARY_PATH
/home/jean-philippe.villemin/bin/jdk1.8.0_101/jre/lib/amd64:/home/jean-philippe.villemin/bin/jdk1.8.0_101/jre/lib/amd64/server:/home/jean-philippe.villemin/bin/anaconda3/lib/libreadline.so.6:/home/jean-philippe.villemin/bin/anaconda3/lib/libpng16.so.16:/home/jean-philippe.villemin/bin/jdk1.8.0_101/jre/lib/amd64:/home/jean-philippe.villemin/bin/jdk1.8.0_101/jre/lib/amd64/server:/home/jean-philippe.villemin/bin/anaconda3/lib/libreadline.so.6:/home/jean-philippe.villemin/bin/anaconda3/lib/libpng16.so.16::/home/jean-philippe.villemin/bin/libBigWig/bin/lib:/home/jean-philippe.villemin/bin/gsl-2.3/bin/lib:/home/jean-philippe.villemin/bin/htslib/bin/lib:/home/jean-philippe.villemin/bin/libBigWig/bin/lib:/home/jean-philippe.villemin/bin/gsl-2.3/bin/lib:/home/jean-philippe.villemin/bin/htslib/bin/lib
There's currently no sanity checking performed when writing a new file. So, someone could specify entries out of order and not get an error message immediately!
Opening a file for writing and then closing it without adding a header/chromosome list or entries causes a segfault. This shouldn't happen.
The link to the CDN for rendered docs is has some kind of headers incorrect and is passing the file as preformatted HTML in a wrapper, instead of as HTML directly:
https://cdn.jsdelivr.net/gh/dpryan79/libBigWig@master/docs/html/index.html
yields:
The Makefile's install
target assumes that $(prefix)/lib
and $(prefix)/include
already exist. This is not necessarily the case when installing to a non-standard prefix.
It would be better if the target directories were created before installing files to them, e.g. by applying this patch to the Makefile:
From 43f628598dc3478a7a823c46d1d7e5985611045c Mon Sep 17 00:00:00 2001
From: Ricardo Wurmus <[email protected]>
Date: Thu, 25 Feb 2016 10:49:28 +0100
Subject: [PATCH] Create target directories before installing to them.
---
Makefile | 1 +
1 file changed, 1 insertion(+)
diff --git a/Makefile b/Makefile
index e1faaf4..731bbf8 100644
--- a/Makefile
+++ b/Makefile
@@ -68,6 +68,7 @@ clean:
rm -f *.o libBigWig.a libBigWig.so *.pico test/testLocal test/testRemote test/testWrite test/exampleWrite example_output.bw
install: libBigWig.a libBigWig.so
+ install -d $(prefix)/lib $(prefix)/include
install libBigWig.a $(prefix)/lib
install libBigWig.so $(prefix)/lib
install *.h $(prefix)/include
--
2.1.0
What do you think?
Hi
I've been looking at building a library for accessing Big files from Perl using your library. It's gone pretty well to be honest but I have some questions about your interpretation of coordinates that's not clear from the documentation. I've pasted in an example of the docs from one of your functions below with some bits removed:
/*!
* @brief Return bigWig entries overlapping an interval.
* @param start The start position of the interval. This is 0-based half open, so 0 is the first base.
* @param end The end position of the interval. Again, this is 0-based half open, so 100 will include the 100th base...which is at position 99.
*/
bwOverlappingIntervals_t *bwGetOverlappingIntervals(bigWigFile_t *fp, char *chrom, uint32_t start, uint32_t end);
I think it's your use of ...which is at position 99
is confusing me. 0-based, half open to me would suggest if you use 100 as your end value you will get the 100th base and its value should always be 100. Unless you referring to the location of base 100's values in the arrays passed back by the routine in the bwOverlappingIntervals_t
struct.
Also I'm aware that when parsing BigWigs their use of coordinates can differ based on their source data. Those derived from bedGraphs retain their 0-based, half-open system where fixed and variable step use 1-start, fully-closed. I had a poke in the code and can see some mention of this but I'm unsure if you handle this internally so we need only to have to work in 0-based, half open coordinates.
Thanks and sorry for the badgering.
I am trying to fetch some bigWig file from ENCODE project, seems they hosted their files at Amazon S3, using testRemote gives me error. wondering is there a solution? Thanks.
$ ./testRemote https://www.encodeproject.org/files/ENCFF188HKC/@@download/ENCFF188HKC.bigWig
[bwHdrRead] There was an error while reading in the header!
An error occured while opening https://www.encodeproject.org/files/ENCFF188HKC/@@download/ENCFF188HKC.bigWig
$ curl -I https://www.encodeproject.org/files/ENCFF188HKC/@@download/ENCFF188HKC.bigWig
HTTP/1.1 307 Temporary Redirect
Server: nginx/1.10.1
Date: Fri, 17 Mar 2017 21:08:37 GMT
Content-Type: text/html; charset=UTF-8
Content-Length: 1196
Connection: keep-alive
X-Request-URL: https://www.encodeproject.org/files/ENCFF188HKC/@@download/ENCFF188HKC.bigWig
X-Stats: es_count=1&es_time=7226&queue_begin=1489784917370001&queue_time=830&rss_begin=492408832&rss_change=0&rss_end=492408832&wsgi_begin=1489784917370831&wsgi_end=1489784917389025&wsgi_time=18194
Access-Control-Allow-Origin: *
Access-Control-Allow-Methods: GET, HEAD
Access-Control-Allow-Headers: Accept, Origin, Range, X-Requested-With
Access-Control-Expose-Headers: Content-Length, Content-Range, Content-Type
Location: https://download.encodeproject.org/http://encode-files.s3.amazonaws.com/2017/03/05/d7be9e16-e742-4554-9e9b-347834665817/ENCFF188HKC.bigWig?Signature=Tb43S%2BXE%2BOT0jVVRYn1E1amZZss%3D&Expires=1489914517&AWSAccessKeyId=ASIAIBGS2LKKZLYIOLVA&response-content-disposition=attachment%3B%20filename%3DENCFF188HKC.bigWig&x-amz-security-token=FQoDYXdzEB4aDKo3b1EtKQK0xzlrqCK3A8jMrctRMooXvbFhPZaBtN46iqYhdsIuZVnmCBYphXlMoRFfa%2B7dyVq1ICoFY7d6wrVj2sKHs4VfVMOYlRJOHonPlRj9BvF5DYR8EHZaItBq4ouDlkzOYcrCNbo36uR1IP%2BsDlX8vwqn7hw6ri/wtQYjReE35P8wyG7D3cN4cHZFm2bAmd4xfS6o7vsgh21LfSHjhKIg%2BoQqPoxZwdNB64qlUBrKYo%2BnhDQdKDceMc/0GB9NJqy1U1n0kaXitFHSwg88LzgXR/CY2Eyk/tQVcScceLERWAupB9nLyVpsVH1uSOumFhwcSf1FXEyqFCKWf4jgUqBHJ7T7kfUHmKcLP8VbJgQs0/TB7q8OY0fn7lzugK4kTXkF3GoGdI8aUwNBo2VuA7Z1S0ldUntTMHeh%2Bl9x9nLETbzmVSOB/qIlP%2BZimKVJTMszPoj57cqhxAUd/%2B6xfdiZbjjoh6NSY5rI%2BfMWfTqZS5my6aCIVXTjkg26Bfm4HGl04bNXhJp7uTiIeorwMex0yfSxFJnybunDaLv9fuFmqsOpY7tOXNzmbina6Tp38euVaLnHD/9rAKNpTu6C7pvyUWQohJ6xxgU%3D
Strict-Transport-Security: max-age=15768000
I am getting a segfault after multiple region requests to a remote file. The same file has no issues when accessed locally:
Stacktrace:
(gdb) bt
#0 __memcpy_sse2_unaligned () at ../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S:152
#1 0x00007fbf78d8167a in url_fread (obuf=obuf@entry=0x4b802f0, obufSize=obufSize@entry=3445, URL=0x4b48a00) at io.c:60
#2 0x00007fbf78d81788 in urlRead (URL=<optimized out>, buf=buf@entry=0x4b802f0, bufSize=bufSize@entry=3445) at io.c:94
#3 0x00007fbf78d83efb in bwRead (data=data@entry=0x4b802f0, sz=3445, nmemb=nmemb@entry=1, fp=fp@entry=0x4b45640) at bwRead.c:29
#4 0x00007fbf78d82b9f in bwGetOverlappingIntervalsCore (fp=fp@entry=0x4b45640, o=o@entry=0x4b810c0, tid=tid@entry=21,
ostart=ostart@entry=325665, oend=oend@entry=325742) at bwValues.c:422
#5 0x00007fbf78d8358e in bwGetOverlappingIntervals (fp=fp@entry=0x4b45640, chrom=chrom@entry=0x4b49240 "", start=start@entry=325665,
end=end@entry=325742) at bwValues.c:568
#6 0x00007fbf78d83a0a in bwGetValues (fp=fp@entry=0x4b45640, chrom=chrom@entry=0x4b49240 "", start=start@entry=325665,
end=end@entry=325742, includeNA=includeNA@entry=1) at bwValues.c:715
The reproduction steps are a bit tricky - I can't share the data I have, but after querying a list of 3 regions in one particular order it causes a segfault. If I reverse the order, or add some more requests in the middle no segfault occurs.
My code is:
#include "bigWig.h"
int main(int argc, char *argv[]) {
bigWigFile_t *fp = NULL;
bwOverlappingIntervals_t *intervals = NULL;
double *stats = NULL;
if(argc != 2) {
fprintf(stderr, "Usage: %s {file.bw|URL://path/file.bw}\n", argv[0]);
return 1;
}
//Initialize enough space to hold 131072 bytes (taken from Bio::DB::Big default value)
if(bwInit(131072) != 0) {
fprintf(stderr, "Received an error in bwInit\n");
return 1;
}
//Open the local/remote file
fp = bwOpen(argv[1], NULL, "r");
if(!fp) {
fprintf(stderr, "An error occured while opening %s\n", argv[1]);
return 1;
}
fprintf(stderr, "Fetching regions\n");
intervals = bwGetValues(fp, "9", 214972, 215034, 1);
bwDestroyOverlappingIntervals(intervals);
intervals = bwGetValues(fp, "9", 317038, 317133, 1);
bwDestroyOverlappingIntervals(intervals);
intervals = bwGetValues(fp, "9", 325666, 325742, 1);
bwDestroyOverlappingIntervals(intervals);
bwClose(fp);
bwCleanup();
return 0;
}
Each interval request on its own does not cause a failure, just the 3 one after the other.
the URL->bufPos ends up larger than the URL->bufLen, which leads to a nonsense memcpy request:
[url_fread] memBuf: 4425691136 bufPos: 9257, bufLen: 2249, remaining: 3445
[url_fread] memcpy start: 4425700393 memcpy end: 18446744073709544608
in the
} else if(URL->bufLen < URL->bufPos + remaining) {
block of url_fread
Any ideas why this might be?
Hi, do you mind making a new release with the latest changes? :)
I am trying to install libBigWig, as it is as dependency of WiggleTools. libBigWig requires “zlib”, “bzip2” and “libcurl”, which I have them installed already. When I try to install libBigWig with the following commands:
git clone https://github.com/dpryan79/libBigWig.git
cd libBigWig
make install
I get the following:
/home/dp456/miniconda3/envs/py37/bin/x86_64-conda-linux-gnu-cc -I. -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /home/dp456/miniconda3/envs/py37/include -DNOCURL -c -o io.o io.c
/home/dp456/miniconda3/envs/py37/bin/x86_64-conda-linux-gnu-cc -I. -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /home/dp456/miniconda3/envs/py37/include -DNOCURL -c -o bwValues.o bwValues.c
/home/dp456/miniconda3/envs/py37/bin/x86_64-conda-linux-gnu-cc -I. -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /home/dp456/miniconda3/envs/py37/include -DNOCURL -c -o bwRead.o bwRead.c
/home/dp456/miniconda3/envs/py37/bin/x86_64-conda-linux-gnu-cc -I. -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /home/dp456/miniconda3/envs/py37/include -DNOCURL -c -o bwStats.o bwStats.c
/home/dp456/miniconda3/envs/py37/bin/x86_64-conda-linux-gnu-cc -I. -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /home/dp456/miniconda3/envs/py37/include -DNOCURL -c -o bwWrite.o bwWrite.c
/home/dp456/miniconda3/envs/py37/bin/x86_64-conda-linux-gnu-ar -rcs libBigWig.a io.o bwValues.o bwRead.o bwStats.o bwWrite.o
/home/dp456/miniconda3/envs/py37/bin/x86_64-conda-linux-gnu-ranlib libBigWig.a
/home/dp456/miniconda3/envs/py37/bin/x86_64-conda-linux-gnu-cc -I. -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /home/dp456/miniconda3/envs/py37/include -DNOCURL -fpic -c -o io.pico io.c
/home/dp456/miniconda3/envs/py37/bin/x86_64-conda-linux-gnu-cc -I. -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /home/dp456/miniconda3/envs/py37/include -DNOCURL -fpic -c -o bwValues.pico bwValues.c
/home/dp456/miniconda3/envs/py37/bin/x86_64-conda-linux-gnu-cc -I. -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /home/dp456/miniconda3/envs/py37/include -DNOCURL -fpic -c -o bwRead.pico bwRead.c
/home/dp456/miniconda3/envs/py37/bin/x86_64-conda-linux-gnu-cc -I. -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /home/dp456/miniconda3/envs/py37/include -DNOCURL -fpic -c -o bwStats.pico bwStats.c
/home/dp456/miniconda3/envs/py37/bin/x86_64-conda-linux-gnu-cc -I. -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /home/dp456/miniconda3/envs/py37/include -DNOCURL -fpic -c -o bwWrite.pico bwWrite.c
/home/dp456/miniconda3/envs/py37/bin/x86_64-conda-linux-gnu-cc -shared -o libBigWig.so io.pico bwValues.pico bwRead.pico bwStats.pico bwWrite.pico -lm -lz
/home/dp456/miniconda3/envs/py37/bin/../lib/gcc/x86_64-conda-linux-gnu/7.5.0/../../../../x86_64-conda-linux-gnu/bin/ld: cannot find -lz
collect2: error: ld returned 1 exit status
make: *** [Makefile:60: libBigWig.so] Error 1
Now, I am still at the beginning of my learning curve in informatics, but as far as I can understand, it seems to be an issue with the ld linker not finding zlib. This thread here suggested installing:
sudo apt-get install zlib1g-dev
sudo apt-get install libz-dev
sudo apt-get install lib32z1-dev
sudo apt-get install zlib*
but the error still persists after trying all variations. What else can I do?? I also tried to install WiggleTools with conda, but that doesn't work as well, because there are several conflicts with other packages.
This is the OS/version of the server that I’m using it:
Welcome to Ubuntu 18.04.5 LTS (GNU/Linux 4.15.0-197-generic x86_64)
Configuring with -DBUILD_SHARED_LIBS=ON
builds test executables, nut it isn't clear how to run tests.
Hi there
We've started to use BigBed and libBigWig for a lot of our flat file reading and have started to use the extraIndex feature (see the kent source for bigBedNamedItems). Currently we have to shell back out to run bigBedNamedItems
to extract rows of the BigBed file we are interested in. Is there any possibility of supporting the extra indexes? It looks from https://github.com/ucscGenomeBrowser/kent/blob/master/src/lib/bigBed.c#L635 they're accessible from the file but also they require some AutoSQL parsing to process.
Anyway a general idea of how plausible support for these are would be really appreciated. Even if it's a case of no way.
Thanks
Sorry I seem to be finding a lot of these today. Anyway same stub program as before expect I've switched to using a UCSC example bigwig file hosted at genome.ucsc.edu. The following code took over 3 minutes to return:
#include "bigWig.h"
#include <stdio.h>
#include <inttypes.h>
#include <stdlib.h>
int main(int argc, char *argv[]) {
bigWigFile_t *fp = NULL;
double *stats = NULL;
char file[] = "http://genome.ucsc.edu/goldenPath/help/examples/bigWigExample.bw";
char chrom[] = "chr21";
int start = 33031597;
int end = 34041570;
int full = 1;
int bins = 100;
int buffer = 1<<17;
if(bwInit(buffer) != 0) {
fprintf(stderr, "Received an error in bwInit\n");
return 1;
}
fp = bwOpen(file, NULL, "r");
if(!fp) {
fprintf(stderr, "An error occured while opening %s\n", file);
return 1;
}
if(full)
stats = bwStatsFromFull(fp, chrom, start, end, bins, mean);
else
stats = bwStats(fp, chrom, start, end, bins, mean);
if(stats)
free(stats);
bwClose(fp);
bwCleanup();
return 0;
}
My Perl code had similar issues and the following Python code had the same issues
#!/usr/bin/env python
import pyBigWig
bw = pyBigWig.open("http://genome.ucsc.edu/goldenPath/help/examples/bigWigExample.bw")
bw.stats('chr21', 33031597, 34041570, type="mean", nBins=100, exact=True)
Doing a sample of the C process in OSX showed that 100% of the CPU time is spent in __select
from the system kernel and the next level up was Curl_poll
so I think it's spending its time transferring. I did some info dumps on the BigWig and this is what I got back
$ bigWigInfo http://genome.ucsc.edu/goldenPath/help/examples/bigWigExample.bw
version: 1
isCompressed: no
isSwapped: 0
primaryDataSize: 56,335,300
primaryIndexSize: 227,048
zoomLevels: 7
chromCount: 1
basesCovered: 35,926,161
mean: 40.862151
min: 0.000000
max: 100.000000
std: 22.968515
The only bit I think is odd here is that it's an uncompressed bigwig file. I also tried mucking around with the buffer size and set it to 8MB. That brought the total time spent down to 15 seconds so I'm guessing there's a problem/limitation with uncompressed BigWigs and streaming them across the wire.
the memory use of libbigwig gets quite high during indexing.
is there anyway to reduce this? for example by flushing each chromosome to disk after the index for that tid is created in constructZoomLevels?
I do this in pyBigWig, but I'm told that some programs using this do not check to ensure that intervals are entered in a sane order. The various add/append functions should add a checkConsistency
parameter. This will result in a new minor version, due to the change in API. I should also start adding .1 or whatever to the .so file.
Hello there,
In Debian we run some of the tests you've included in test/
as autopkgtest for libbigwig package.
This is how we are running the tests in Debian
echo "1c52065211fdc44eea45751a9cbfffe0 test/Local.bw" >> checksums
echo "8e116bd114ffd2eb625011d451329c03 test/Write.bw" >> checksums
echo "ef104f198c6ce8310acc149d0377fc16 test/example_output.bw" >> checksums
LIB_STATIC=`find /usr/lib -name libBigWig.a`
#Compile using installed libbigwig
echo 'Compiling ...'
gcc -g -Wall test/testLocal.c ${LIB_STATIC} -lBigWig -lcurl -lm -lz -o testlocal
gcc -g -Wall test/testWrite.c ${LIB_STATIC} -lBigWig -lm -lz -lcurl -o testWrite
gcc -g -Wall test/exampleWrite.c ${LIB_STATIC} -lBigWig -lcurl -lm -lz -o examplewrite
echo '------------------------------'
echo -e "Test 1"
./testlocal test/test.bw > test/Local.bw
echo -e "Test 2"
./testWrite test/test.bw test/Write.bw
echo -e "Test 3"
./examplewrite
md5sum --check checksums
echo -e "PASS"
Currently the tests are failing for i386 and s390 architectures.
The errors that we are getting are as follows
i386:
Compiling ...
------------------------------
Test 1
Test 2
Test 3
test/Local.bw: FAILED
test/Write.bw: FAILED
test/example_output.bw: OK
md5sum: WARNING: 2 computed checksums did NOT match
The output files generated have different hashsum. Is this expected ?
s390 (Big Endian) :
Compiling ...
------------------------------
Test 1
testlocal: test/testLocal.c:107: main: Assertion `bwIsBigWig(argv[1], NULL) == 1' failed.
/tmp/autopkgtest-lxc.y_h72gqc/downtmp/build.e6I/src/debian/tests/run-unit-test: line 32: 1996 Aborted ./testlocal test/test.bw > test/Local.bw
It would be helpful if you could help in debugging these failures :)
Thanks
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.