Comments (3)
I've added a Block.orlen()
, a bitsetunioncount()
, a RoaringBitmap.union_len()
and a RoaringBitmap.jaccard_dist()
which are working for bitmaps containing "dense", "positive" or both kinds of blocks. I have not been able to generate a roaring bitmap that uses "inverted" blocks so I haven't been able to test that. For the other kinds of blocks:
from roaringbitmap import RoaringBitmap
import random
# dense blocks
A = RoaringBitmap(random.sample(range(100000),20000))
# positive blocks
B = RoaringBitmap(random.sample(range(10000),200))
# many dense blocks, positive at end
D = RoaringBitmap(random.sample(range(4000000),400000))
@andreasvc, how can I generate a bitmap that uses inverted blocks? (The code is a lot like positive blocks so probably works already). I've tried to maintain coding style etc. so this should be suitable for a pull request eventually.
from roaringbitmap.
That's great.
In the meantime I have been completely reworking the implementation to use structs and manual memory management instead of python objects. This hopefully will make pickling more efficient and will allow releasing the GIL. However, I'm running into hard-to-debug errors, so it still makes sense to merge your code.
An inverted block is created whenever it contains 61141 or more elements.
So in theory a=RoaringBitmap(range(61141))
creates one.
However... this runs into a memory error. What is needed is a constructor to efficiently add a range of elements, and the unittests should use this to test inverted blocks.
from roaringbitmap.
Sounds good! I'll update the benchmarks and tests (only manual tests so far . . .) then submit a pull request (my first 😄). Should be sometime today (tonight).
from roaringbitmap.
Related Issues (20)
- arrays with elements <4 bytes cause MultiRoaringBitmap.jaccard_dist() seg fault HOT 3
- Feature request: multi-threaded MultiRoaringBitmap e.g., jaccard_dist() HOT 4
- Strange .clamp() behaviour with some ranges HOT 1
- Intersection of 2 large sets causing aborts HOT 4
- pickle/unpickle bug HOT 2
- Bug slicing a RB into ranges HOT 1
- Compute intersection for pairs within a MultiRoaringBitmap HOT 2
- Strange behavior using git version of roaring bitmap HOT 4
- Bug in __getitem__ with slices
- xor, difference are incorrect on large bitmaps HOT 3
- Bug in difference_update HOT 1
- Bug in intersection_update
- Run length encoding HOT 3
- access RoaringBitmap from cython for static typing HOT 2
- some specific values can reliably segfault clamp() HOT 4
- MultiRoaringBitmap slicing return type HOT 4
- len(bitmap) != bitmap.numelem() HOT 1
- `MultiRoaringBitmap.jaccard_dist` against a query coming from an external `RoaringBitmap` HOT 2
- Apple M1 does not support -march=native HOT 8
- Segfault on in-place difference
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from roaringbitmap.