GithubHelp home page GithubHelp logo

pombredanne / fast_succinct_trie Goto Github PK

View Code? Open in Web Editor NEW

This project forked from kampersanda/fast_succinct_trie

0.0 1.0 0.0 499 KB

String map implementation through Fast Succinct Trie

License: Apache License 2.0

CMake 2.40% C++ 97.29% Shell 0.31%

fast_succinct_trie's Introduction

fast_succinct_trie

experimental

This library provides a string map through Fast Succinct Trie (FST), proposed in SIGMOD 2018. The library is implemented by modifying the original FST implementation efficient/SuRF and applying a compact minimal-prefix trie form to simulate a trie-based string map (c.f. Section 2.2 of KAIS 2017).

What is FST?

FST is a succinct trie data structure proposed in the paper,

Zhang, Lim, Leis, Andersen, Kaminsky, Keeton and Pavlo: SuRF: Practical Range Query Filtering with Fast Succinct Trie, In SIGMOD 2018, pp. 323-336.

Briefly, FST is a practical variant of LOUDS-trie. FST uses two LOUDS implementations: one is fast and the other is space-efficient. FST partitions a trie into two layers at a level and applies the fast one to the top layer and the space-efficient one to the bottom layer.

More specific explanations can be found in the slide of the author.

Since FST was developed for succinct range query filtering, the original implementation efficient/SuRF allows us to include false positives in query solutions. The library kampersanda/fast_succinct_trie modifies it and provides a string map based on FST.

Install

This library consists of only header files. Please through the path to the directory include.

Build instructions

You can download and compile this library as the following commands:

$ git clone --recursive https://github.com/kampersanda/fast_succinct_trie.git
$ cd fast_succinct_trie
$ mkdir build && cd build
$ cmake .. -DFST_ENABLE_BENCH=ON
$ make

Compiling third-party libraries at bench will be heavy. Please set -DFST_ENABLE_BENCH=OFF if you do not need to run benchmarks.

Sample usage

#include <fstream>
#include <iostream>

#include <fst.hpp>

int main() {
    std::vector<std::string> keys = {
        "ACML",  "AISTATS", "DS",    "DSAA",   "ICDM",   "ICML",  //
        "PAKDD", "SDM",     "SIGIR", "SIGKDD", "SIGMOD",
    };

    // a trie-index constructed from string keys sorted
    fst::Trie trie(keys);

    // keys are mapped to unique integers in the range [0,#keys)
    std::cout << "[searching]" << std::endl;
    for (size_t i = 0; i < keys.size(); ++i) {
        fst::position_t key_id = trie.exactSearch(keys[i]);
        std::cout << " - " << keys[i] << ": " << key_id << std::endl;
    }

    std::cout << "[statistics]" << std::endl;
    std::cout << " - number of keys: " << trie.getNumKeys() << std::endl;
    std::cout << " - number of nodes: " << trie.getNumNodes() << std::endl;
    std::cout << " - number of suffix bytes: " << trie.getSuffixBytes() << std::endl;
    std::cout << " - memory usage in bytes: " << trie.getMemoryUsage() << std::endl;
    std::cout << " - output file size in bytes: " << trie.getSizeIO() << std::endl;

    std::cout << "[configure]" << std::endl;
    trie.debugPrint(std::cout);

    // write the trie-index to a file
    {
        std::ofstream ofs("fst.idx");
        trie.save(ofs);
    }

    // read the trie-index from a file
    {
        fst::Trie other;
        std::ifstream ifs("fst.idx");
        other.load(ifs);
    }

    std::remove("fst.idx");
    return 0;
}

The output will be

[searching]
 - ACML: 1
 - AISTATS: 2
 - DS: 4
 - DSAA: 5
 - ICDM: 6
 - ICML: 7
 - PAKDD: 0
 - SDM: 3
 - SIGIR: 8
 - SIGKDD: 9
 - SIGMOD: 10
[statistics]
 - number of keys: 11
 - number of nodes: 19
 - number of suffix bytes: 24
 - memory usage in bytes: 587
 - output file size in bytes: 312
[configure]
-- LoudsDense (heigth=1) --
LABEL: A D I P S | 
CHILD: 1 1 1 0 1 | 
PREFX: 0         |
-- LoudsSparse --
LABEL: C I S C D I ? A D M G I K M ? 
CHILD: 0 0 1 1 0 1 0 0 0 0 1 0 0 0 
LOUDS: 1 0 1 1 1 0 1 0 1 0 1 1 0 0 
-- Suffixes --
POINTERS: 17 11 1 9 0 22 9 12 7 19 14 
SUFFIXES: ? S T A T S ? R ? M ? M L ? O D ? A K D D ? A ?  

Todo

  • Support more operations
  • Implement a normal trie form also

Licensing

This library is free software provided under Apache License 2.0, following the License of efficient/SuRF. The modifications are shown in each source file.

fast_succinct_trie's People

Contributors

kampersanda avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.