GithubHelp home page GithubHelp logo

ajalab / fid Goto Github PK

View Code? Open in Web Editor NEW
4.0 2.0 2.0 44 KB

FID (Fully Indexable Dictionary) implementation for Rust

License: Apache License 2.0

Rust 100.00%
succinct-data-structure fully-indexable-dictionary rust

fid's Introduction

FID

Crates.io docs.rs Build Status

This crate provides a succinct data structure for bit vectors that support two kinds of bit operations in constant-time:

  • rank(i) computes the number of 0s (or 1s) in [0..i)
  • select(r) locates the (r+1)-th position of 0 (or 1).

Structures supporting these two operations are called FID (fully indexable dictionary).

Usage

In your Cargo.toml

[dependencies]
fid = "0.1"

then

extern crate fid;

use fid::{BitVector, FID};

let mut bv = BitVector::new();
// 01101101
bv.push(false); bv.push(true); bv.push(true); bv.push(false);
bv.push(true); bv.push(true); bv.push(false); bv.push(true);

assert_eq!(bv.rank0(5), 2);
assert_eq!(bv.rank1(5), 3);
assert_eq!(bv.select0(2), 6);
assert_eq!(bv.select1(2), 4);

Credits

The basic compression and computation algorithms for BitVector are originally from [1], and its practical implementation techniques are from [2].

In BitVector, bits are divided in small and large blocks. Each small block is identified by a class (number of 1s in the block) and an index within the class. Classes are stored in ceil(log(SBLOCK_WIDTH + 1)) bits. Indices are stored in log(C(SBLOCK_WIDTH, index)) bits with enumerative code if its compressed size is less than MAX_CODE_SIZE. Otherwise the bit pattern of the small block is explicitly stored as an index for the sake of efficiency. This idea originally comes from [2]. For each large block, we store the number of 1s up to its beginning and a pointer for the index of the first small block.

[1] Gonzalo Navarro and Eliana Providel. 2012. Fast, small, simple rank/select on bitmaps. In Proceedings of the 11th international conference on Experimental Algorithms (SEA'12), Ralf Klasing (Ed.). Springer-Verlag, Berlin, Heidelberg, 295-306. DOI=http://dx.doi.org/10.1007/978-3-642-30850-5_26

[2] rsdic by Daisuke Okanohara. https://github.com/hillbig/rsdic

Benchmark

10,000 operations on bit vectors of length (1,000,000 and 100,000,000) and of density (dense: 99%, normal: 50%, sparse: 1% 1s).

$ rustup nightly run cargo bench
running 12 tests
test rank_100000000_dense    ... bench:     752,410 ns/iter (+/- 39,871)
test rank_100000000_normal   ... bench:     865,107 ns/iter (+/- 34,210)
test rank_100000000_sparse   ... bench:     714,583 ns/iter (+/- 17,977)
test rank_1000000_dense      ... bench:     670,544 ns/iter (+/- 18,139)
test rank_1000000_normal     ... bench:     376,054 ns/iter (+/- 8,969)
test rank_1000000_sparse     ... bench:     635,294 ns/iter (+/- 15,752)
test select_100000000_dense  ... bench:   1,026,957 ns/iter (+/- 740,011)
test select_100000000_normal ... bench:   2,193,391 ns/iter (+/- 63,561)
test select_100000000_sparse ... bench:   1,971,993 ns/iter (+/- 60,703)
test select_1000000_dense    ... bench:     805,135 ns/iter (+/- 20,085)
test select_1000000_normal   ... bench:   1,456,985 ns/iter (+/- 33,205)
test select_1000000_sparse   ... bench:   1,791,824 ns/iter (+/- 44,174)

fid's People

Contributors

ajalab avatar lucacappelletti94 avatar zommiommy avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

zommiommy keats

fid's Issues

Would it be possible to have a `set_bit` operation on BitVector?

The particular use case in which I'm thinking about this is for quick parallel construction of a BitVector. Specifically, let's say we had 10 bitvecs and we wanted to or them all together to produce a single bitvec containing all the set 1s.

The way I'm currently doing that is like this:

for i in 0..len {
  let mut found = false;
  for bv in bitvecs {
      if bv.get(i) {
          found = true;
          break;
      }
  }
  new_bitvec.push(found)
}

But unfortunately this approach is fairly slow. Ideally I could construct the new bitvector in blocks in parallel, calling set_bit within those blocks. Would something like this be possible in this crate? To summarize I'm basically looking to construct a BitVector in parallel. Thanks in advance for any pointers / ideas.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.