GithubHelp home page GithubHelp logo

Comments (7)

lemire avatar lemire commented on June 11, 2024

The FromBuffer function in roaring attempts to read a 'cookie' which is 4 byte long.

You would get the error you are reporting if you were to call FromBuffer from an empty byte array, or from a byte array having fewer than 4 bytes.

If the bytes come from a disk or the network, this could indicate an IO failure.

The FromBuffer function thus:

func (rb *Bitmap) FromBuffer(buf []byte) (p int64, err error) {
	stream := internal.ByteBufferPool.Get().(*internal.ByteBuffer)
	stream.Reset(buf)
	p, err = rb.highlowcontainer.readFrom(stream)
	internal.ByteBufferPool.Put(stream)
	return
}

It calls readFrom which starts as follows...

func (ra *roaringArray) readFrom(stream internal.ByteInput, cookieHeader ...byte) (int64, error) {
	var cookie uint32
	var err error
	if len(cookieHeader) > 0 && len(cookieHeader) != 4 {
		return int64(len(cookieHeader)), fmt.Errorf("error in roaringArray.readFrom: could not read initial cookie: incorrect size of cookie header")
	}
	if len(cookieHeader) == 4 {
		cookie = binary.LittleEndian.Uint32(cookieHeader)
	} else {
		cookie, err = stream.ReadUInt32()
		if err != nil {
			return stream.GetReadBytes(), fmt.Errorf("error in roaringArray.readFrom: could not read initial cookie: %s", err)
		}
	}

from roaring.

Thejas-bhat avatar Thejas-bhat commented on June 11, 2024

Thanks @lemire we will have another look at the situation with your info in mind. One more thing, could this error also be caused if the storing of roaringBytes was done using one version of the roaring (v0.4.23), and the readFrom API being used belongs to a later version (v0.9.4)? That is, we are reading an older format using a newer package version

from roaring.

lemire avatar lemire commented on June 11, 2024

From the error, it seems that you input had fewer than 4 bytes. It is not possible for a serialized bitmap to use fewer than 4 bytes.

Thanks @lemire we will have another look at the situation with your info in mind. One more thing, could this error also be caused if the storing of roaringBytes was done using one version of the roaring (v0.4.23), and the readFrom API being used belongs to a later version (v0.9.4)? That is, we are reading an older format using a newer package version

The data format is not dependent on the version, it is specified here:

https://github.com/RoaringBitmap/RoaringFormatSpec

You can read and write roaring bitmaps from Java, Python, Rust, C, C++, Go... and it is all interoperable.

The format hasn't changed.

from roaring.

lemire avatar lemire commented on June 11, 2024

The most likely cause for the error you report is that you had an empty byte array (though an array made of 1, 2 or 3 bytes is also possible).

It is possible, but I would say very unlikely, that you have found a bug in roaring. The most likely scenario is some kind of unguarded system failure that results in an empty array being passed to roaring as a serialized bitmap.

The functions you refer to are purely deterministic, which means that given an input, you should always get the same result. Thus you should get a copy of the byte array that is being passed. I am 99% certain that you will find that it is an empty byte array.

I have looked at the code you point to...

	var postingsLen uint64
	postingsLen, read = binary.Uvarint(d.sb.mem[postingsOffset+n : postingsOffset+n+binary.MaxVarintLen64])
	n += uint64(read)

	roaringBytes := d.sb.mem[postingsOffset+n : postingsOffset+n+postingsLen]

	rv.incrementBytesRead(n + postingsLen)

	if rv.postings == nil {
		rv.postings = roaring.NewBitmap()
	}
	_, err := rv.postings.FromBuffer(roaringBytes)
	if err != nil {
		return fmt.Errorf("error loading roaring bitmap: %v", err)
	}

What happens in this code if postingsLen is zero? You get that roaringBytes is an empty array. And then you get the error you report from roaring.

If I am wrong, please provide evidence to the contrary.

from roaring.

Thejas-bhat avatar Thejas-bhat commented on June 11, 2024

Thanks for the info and context @lemire! the thing is that we are still in finding the root cause for this obscure case where it's hitting that error. Meanwhile I encountered another crash (which appears intermittently) from our logs which we are also trying to figure out

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x68 pc=0x6bcd34]

goroutine 69314 [running]:
github.com/RoaringBitmap/roaring.(*Bitmap).GetCardinality(...)
	...../github.com/!roaring!bitmap/[email protected]/roaring.go:667
github.com/blevesearch/zapx/v15.(*PostingsList).Count(0xc0137072c0?)
	...../github.com/blevesearch/zapx/[email protected]/posting.go:239 +0xd4
github.com/blevesearch/bleve/v2/index/scorch.(*IndexSnapshotTermFieldReader).Count(0x7322cd?)
	...../github.com/blevesearch/bleve/[email protected]/index/scorch/snapshot_index_tfr.go:177 +0x6d
github.com/blevesearch/bleve/v2/search/searcher.(*TermSearcher).Count(0x5?)

I observe that this crash occurs in the GetCardinality() API, can you please help me about the cases where the container can be nil and at which exported API (which other packages would call) leads to the initialization of this field containers (just to I can debug on the caller side (bleve) as well)

from roaring.

Thejas-bhat avatar Thejas-bhat commented on June 11, 2024

I see that the Add() API initialises containers field in the bitmap, is that right?

Because I think Add() is invoked right after we have done a roaring.New() in our codebase, so just confirming above bit.

from roaring.

lemire avatar lemire commented on June 11, 2024

I recommend you consider updating to the latest version of roaring. It should not break your code and you might benefit from some bug fixes accumulated over the last couple of years. There has been remarkably few bugs found, but there were some and you should bump the version accordingly.

can you please help me about the cases where the container can be nil

You get a problem here:

// GetCardinality returns the number of integers contained in the bitmap
func (rb *Bitmap) GetCardinality() uint64 {
	size := uint64(0)
	for _, c := range rb.highlowcontainer.containers {
		size += uint64(c.getCardinality()) //<============== c is nil?
	}
	return size
}

In turn, rb.highlowcontainer.containers is just an array of containers.

None of the containers within it are allowed to be nil, ever.

A nil container would quickly break a lot of code. We have never had this reported and I would be interested in having a reproducible test case. So your roaring instance in this case is invalid. This could happen in different ways but the most likely manners are unsafe multithreaded code (e.g., a data race) or unchecked bad IO.

I see that the Add() API initialises containers field in the bitmap, is that right?

If you are asking whether...

    rb := roaring.New()
    rb.Add(x)

... is safe, then the answer is yes, absolutely.

The containers are not allowed to be nil.

I am going to close this issue.

@Thejas-bhat : Please consider updating to the latest version of the library. If you still encounter bugs, please try to isolate them and produce a reproducible test case.

from roaring.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.