GithubHelp home page GithubHelp logo

bosque's Introduction

Bosque

"The clearest way into the Universe is through a forest wilderness." โ€“ John Muir

What is it?

Bosque (bohs-keh, spanish for "forest") is a fast, parallel in-place kDTree Library available for Rust, C, Python, and Julia. This library is intended to be an improvement over FNNTW. It achieves its high performance through similar avenues such as parallel builds and queries, but achieves superior performance by mutating the original data (hence, inplace kDTree). By using a select algorithm to partition the original dataset into buckets that are contiguous within its original allocated buffer, bosque acheives incredible cache locality. This same algorithmic choice, along with a round robin split dimension, results in a kDTree that can be stored with no additional metadata; after building a tree, you can always get the left and right side of a tree or subtree by spliting at the median with the split dimension being how many times you've split the tree in half so far. Furthermore, this implies that pre-built trees can be zero-copy deserialized (on and across systems with the same endianness) in a matter of nanoseconds. If only a few queries need to be done, memory-mapped I/O can be used to limit the total number of bytes read from disk, resulting in a further speedup.

At present, since cosmology is the primary use case, the library only supports 3D data. However, it can easily be generalized. The author intends to make the code generic over all dimensions, but feel free to submit an issue + pull request if you want this ASAP!


Getting started

Because this is a Rust library, we show an example in Rust here. This example can be found at examples/rust/simple.rs. In the examples directory, there are examples for C, Python, and Julia.

// uses the `rand` crate.

fn main() {
    // Initialize some data in [0, 1]^3
    let mut pos: Vec<[f64; 3]> = (0..100_000)
        .map(|_| [(); 3].map(|_| rand::random::<f64>()))
        .collect();

    // Build tree in-place!
    bosque::tree::build_tree(&mut pos);

    // Query the tree
    let query = [0.5, 0.5, 0.5];
    let (dist_metric, id) = bosque::tree::nearest_one(&pos, &query);

    // Note that there is a 'sqrt-dist' feature for
    // returning square euclidean or euclidean distance!
    let dist = if cfg!(feature = "sqrt-dist") {
        dist_metric
    } else {
        dist_metric.sqrt()
    };

    println!("closest point is {dist:.2e} units away, corresponding to data point #{id}");
}

Performance

Using the same benchmark as in FNNTW, we extend the list. We exclude the original kiddo benchmark, as there is now a much faster version 2 of kiddo -- which we highly recommend for use cases that cannot be in-place or that require other query functionality e.g. nearest_within which bosque does not offer.

From FNNTW with minor modifications:

We use

  • A mock dataset of 100,000 uniform random points in the unit cube.
  • A query set of 1,000,000 uniform random points in the unit cube.

Over 100 realizations of the datasets and query points, the following results are obtained for the average build and 1NN query times on an AMD EPYC 7502P using 48 threads. The results are sorted by the combined build and query time.

Code Build (ms) Query (ms) Total (ms)
Bosque 3.5 10 13.5
FNNTW 12 22 34
pykdtree (python) 12 35 47
nabo-rs (rust) 25 30 55
Scipy's cKDTree (python) 31 38 69

bosque's People

Contributors

cavemanloverboy avatar yipihey avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Forkers

yipihey

bosque's Issues

Zero-Cost monomorphization

The generic branch contains code to unify the kdtree library under a single module that is generic over CP32, f32, and f64. However, in the presence of generics, it appears the monomorphized version of the build and query functions are not as fast as those explicitly typed. This is true even for the build function into_tree which has only a simple CP32 -> T: PartialOrd bound, as that's the only thing required to build the tree.

We must investigate why this is the case.

Release

Would it be possible to get a release of the current master? The tree construction is quite different compared to the last release on crates.io, and I have a dependent crate which I'd like to go up (bosque is one of several optional backends) but can't without a release.

Tree.query fails in Python interface

Hi! I just tried taking bosque for a spin in Python and ran into an issue:

>>> from bosque_py import Tree
>>> import numpy as np
>>> DATA = 100_000
>>> QUERY = 1_000_000

>>> data = np.random.uniform(size=(DATA, 3))
>>> query = np.random.uniform(size=(QUERY, 3))

>>> tree = Tree(data)
>>> tree.query(query, 1)
thread '<unnamed>' panicked at /Users/philipps/.cargo/registry/src/index.crates.io-6f17d22bba15001f/ndarray-0.15.6/src/impl_constructors.rs:516:9:
assertion failed: dimension::can_index_slice(&v, &dim, &strides).is_ok()
---------------------------------------------------------------------------
PanicException                            Traceback (most recent call last)
Cell In[2], line 10
      7 query = np.random.uniform(size=(QUERY, 3))
      9 tree = Tree(data)
---> 10 tree.query(query, 1)

PanicException: assertion failed: dimension::can_index_slice(&v, &dim, &strides).is_ok()

This is with commit a8f8fca of bosque, rustc 1.73.0, maturin 1.3.0 and numpy 1.26.1.

Any ideas?

Add comparison against kd-tree

The somewhat older kd_tree crate has mentioned optimisations as well (it uses quickselect, as well as has parallel build via par_build_by_ordered_float) so it could be interesting to compare its performance to this implementation, but I noticed it's currently missing in README.

(Originally posted at cavemanloverboy/FNNTW#8)

value incorrect

Great fast kdtree implementation!

However, in python api, the tree=Tree(X) will sort the data X and result in wrong index.

Please check.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.