GithubHelp home page GithubHelp logo

supercluster.kdtree's Introduction

Supercluster.KDTree

This is a KD-Tree written completely in C#. This project originally started as a fork of the KD-Tree Written by CodeandCats, however, the structure and intention of this project has changed drastically from its origin.

About the Project

This is a KD-Tree that is optimized for machine learning applications, however, it can used for less intensive purposes as well. (In fact, I am writing this for my machine learning library Supercluster) In machine learning data-sets are often built, re-built, and built again. Also, in machine learning, algorithms need to be fast for look ups but it is more acceptable to be slow for construction. Thus the tree bas been designed with this philosophy in mind. General characteristics are:

  • The tree is extremely fast for search.

    • For a 10,000 3-nearest-neighbor searches on a 1,000,000 node 2-Dimensional tree using floats is about 7.5 times than the KD-Tree by CodeandCats (number based off of a 1,000,000 sample independent T-test for mean comparisons, equal variance assumed).
    • The nearest-neighbor list is a custom data structure (called a BoundedPriorityList) that remains sorted and has O(log n) insert, but it is often much faster than O(log n) as an item is ignored if it is larger than the lists current max-element.
    • Utilizes .NET's new aggressive inlining optimization where appropriate.
    • The KD-Tree is implemented as an array. Index arithmetic is used to traverse nodes. This is faster (only slightly, but a statistically significant difference) than traversing node objects. This also leads to less memory usage.
    • The tree is built in the standard-way using an exact median finding algorithm. This is slower than the quickselect or median-of-medians algorithm but is more likely to create a balanced tree. This allows for search time to be as close to O(log n) as possible.
  • There is no delete method. If you want to change the tree, rebuild it. Many KD-Tree implementations simply rebuild the tree to "balance" the tree after deletion. This is because balancing a KD-Tree is much more complicated than AVL or Red-Black trees. There do exist adaptive KD-Trees which auto-balance, look it up if you need one.

  • There is no node object used in the KDTree class. but there is a NodeNavigator class which allows you to traverse the tree (or any array) using familiar, left, right, parent properties of a node.

  • The tree is generic. Only IComparable<T> is required.

  • The tree requires a metric (a distance measure function) Func. KD-Trees are spatial data-structures and one only needs a metric function to implicitly define the metric space in which the KD-Tree lives.

  • The code is unit tested and well documented. Style-cop, unit-test, wiki tutorials and MSDN style docs. It's all here.

Documentation and Tutorial:

Special Thanks

  • Thanks to CodeandCats for the original implementation that this was based off. I had fun tearing apart reading your code. ;)
  • Thanks to Prof. Hanan Samet for writing an amazing book on spatial and metric data structures. The book provided much insight and knowledge.
  • Thanks to César Souza for your work on machine learning for .NET. It has inspired me to try and do better!
  • Also a small thanks to BlueRaja. While I didn't use any of your code your high speed priority queue inspired me to write my own custom data structure for the nearest-neighbor list which turned out to be way faster than any "off the shelf" solution.

supercluster.kdtree's People

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.