Comments (7)
The FromBuffer
function in roaring
attempts to read a 'cookie' which is 4 byte long.
You would get the error you are reporting if you were to call FromBuffer
from an empty byte array, or from a byte array having fewer than 4 bytes.
If the bytes come from a disk or the network, this could indicate an IO failure.
The FromBuffer function thus:
func (rb *Bitmap) FromBuffer(buf []byte) (p int64, err error) {
stream := internal.ByteBufferPool.Get().(*internal.ByteBuffer)
stream.Reset(buf)
p, err = rb.highlowcontainer.readFrom(stream)
internal.ByteBufferPool.Put(stream)
return
}
It calls readFrom
which starts as follows...
func (ra *roaringArray) readFrom(stream internal.ByteInput, cookieHeader ...byte) (int64, error) {
var cookie uint32
var err error
if len(cookieHeader) > 0 && len(cookieHeader) != 4 {
return int64(len(cookieHeader)), fmt.Errorf("error in roaringArray.readFrom: could not read initial cookie: incorrect size of cookie header")
}
if len(cookieHeader) == 4 {
cookie = binary.LittleEndian.Uint32(cookieHeader)
} else {
cookie, err = stream.ReadUInt32()
if err != nil {
return stream.GetReadBytes(), fmt.Errorf("error in roaringArray.readFrom: could not read initial cookie: %s", err)
}
}
from roaring.
Thanks @lemire we will have another look at the situation with your info in mind. One more thing, could this error also be caused if the storing of roaringBytes was done using one version of the roaring (v0.4.23), and the readFrom API being used belongs to a later version (v0.9.4)? That is, we are reading an older format using a newer package version
from roaring.
From the error, it seems that you input had fewer than 4 bytes. It is not possible for a serialized bitmap to use fewer than 4 bytes.
Thanks @lemire we will have another look at the situation with your info in mind. One more thing, could this error also be caused if the storing of roaringBytes was done using one version of the roaring (v0.4.23), and the readFrom API being used belongs to a later version (v0.9.4)? That is, we are reading an older format using a newer package version
The data format is not dependent on the version, it is specified here:
https://github.com/RoaringBitmap/RoaringFormatSpec
You can read and write roaring bitmaps from Java, Python, Rust, C, C++, Go... and it is all interoperable.
The format hasn't changed.
from roaring.
The most likely cause for the error you report is that you had an empty byte array (though an array made of 1, 2 or 3 bytes is also possible).
It is possible, but I would say very unlikely, that you have found a bug in roaring
. The most likely scenario is some kind of unguarded system failure that results in an empty array being passed to roaring
as a serialized bitmap.
The functions you refer to are purely deterministic, which means that given an input, you should always get the same result. Thus you should get a copy of the byte array that is being passed. I am 99% certain that you will find that it is an empty byte array.
I have looked at the code you point to...
var postingsLen uint64
postingsLen, read = binary.Uvarint(d.sb.mem[postingsOffset+n : postingsOffset+n+binary.MaxVarintLen64])
n += uint64(read)
roaringBytes := d.sb.mem[postingsOffset+n : postingsOffset+n+postingsLen]
rv.incrementBytesRead(n + postingsLen)
if rv.postings == nil {
rv.postings = roaring.NewBitmap()
}
_, err := rv.postings.FromBuffer(roaringBytes)
if err != nil {
return fmt.Errorf("error loading roaring bitmap: %v", err)
}
What happens in this code if postingsLen
is zero? You get that roaringBytes
is an empty array. And then you get the error you report from roaring
.
If I am wrong, please provide evidence to the contrary.
from roaring.
Thanks for the info and context @lemire! the thing is that we are still in finding the root cause for this obscure case where it's hitting that error. Meanwhile I encountered another crash (which appears intermittently) from our logs which we are also trying to figure out
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x68 pc=0x6bcd34]
goroutine 69314 [running]:
github.com/RoaringBitmap/roaring.(*Bitmap).GetCardinality(...)
...../github.com/!roaring!bitmap/[email protected]/roaring.go:667
github.com/blevesearch/zapx/v15.(*PostingsList).Count(0xc0137072c0?)
...../github.com/blevesearch/zapx/[email protected]/posting.go:239 +0xd4
github.com/blevesearch/bleve/v2/index/scorch.(*IndexSnapshotTermFieldReader).Count(0x7322cd?)
...../github.com/blevesearch/bleve/[email protected]/index/scorch/snapshot_index_tfr.go:177 +0x6d
github.com/blevesearch/bleve/v2/search/searcher.(*TermSearcher).Count(0x5?)
I observe that this crash occurs in the GetCardinality() API, can you please help me about the cases where the container can be nil and at which exported API (which other packages would call) leads to the initialization of this field containers (just to I can debug on the caller side (bleve) as well)
from roaring.
I see that the Add() API initialises containers field in the bitmap, is that right?
Because I think Add() is invoked right after we have done a roaring.New() in our codebase, so just confirming above bit.
from roaring.
I recommend you consider updating to the latest version of roaring
. It should not break your code and you might benefit from some bug fixes accumulated over the last couple of years. There has been remarkably few bugs found, but there were some and you should bump the version accordingly.
can you please help me about the cases where the container can be nil
You get a problem here:
// GetCardinality returns the number of integers contained in the bitmap
func (rb *Bitmap) GetCardinality() uint64 {
size := uint64(0)
for _, c := range rb.highlowcontainer.containers {
size += uint64(c.getCardinality()) //<============== c is nil?
}
return size
}
In turn, rb.highlowcontainer.containers
is just an array of containers.
None of the containers within it are allowed to be nil, ever.
A nil container would quickly break a lot of code. We have never had this reported and I would be interested in having a reproducible test case. So your roaring
instance in this case is invalid. This could happen in different ways but the most likely manners are unsafe multithreaded code (e.g., a data race) or unchecked bad IO.
I see that the Add() API initialises containers field in the bitmap, is that right?
If you are asking whether...
rb := roaring.New()
rb.Add(x)
... is safe, then the answer is yes, absolutely.
The containers are not allowed to be nil.
I am going to close this issue.
@Thejas-bhat : Please consider updating to the latest version of the library. If you still encounter bugs, please try to isolate them and produce a reproducible test case.
from roaring.
Related Issues (20)
- upper bound memory estimate HOT 3
- question: what is Freeze? HOT 2
- Failed to read runtime container content: unexpected EOF HOT 1
- External-memory roaring data structure HOT 2
- Add Bitmap.NextAbsentValue HOT 5
- [roaring64] Why Or function modify bitmap "a" in this example? HOT 7
- Regarding memory use of maximum size and removal of bit number HOT 2
- UnmarshalBinary has containers with needCopyOnWrite set to true HOT 1
- Implement roaring_bitmap_internal_validate HOT 2
- error in roaringArray.readFrom: did not find expected serialCookie in header HOT 2
- "error in roaringArray.readFrom: did not find expected serialCookie in header" HOT 4
- make qa fails for release 1.6.0
- incorrect GetSizeInBytes() value HOT 1
- "error in roaringArray.readFrom: did not find expected serialCookie in header" when reading a bitmap written by roaring64 HOT 5
- "Could not deserialize bitmap for key #0: error in roaringArray.readFrom: did not find expected serialCookie in header" on v1.8.0 when reading a bitmap written by roaring64 HOT 1
- Go get error HOT 2
- Feature request : mmap roaring bitmap for use in multi threaded inter-process/separate program HOT 1
- Feature request for 128bit for ipv6 usage. HOT 3
- possible to do an mmap version of roaring bitmap for golang? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from roaring.