GithubHelp home page GithubHelp logo

nicolasvidal / trie-rs Goto Github PK

View Code? Open in Web Editor NEW

This project forked from laysakura/trie-rs

0.0 1.0 0.0 4.55 MB

Memory efficient trie (prefix tree) library based on LOUDS

Home Page: https://crates.io/crates/trie-rs

License: Apache License 2.0

Rust 97.59% Shell 2.41%

trie-rs's Introduction

trie-rs

Memory efficient trie (prefix tree) library based on LOUDS.

Master API Docs | Released API Docs | Benchmark Results | Changelog

Build Status Crates.io Version Crates.io Downloads Minimum rustc version License: MIT License: Apache 2.0

Quickstart

To use trie-rs, add the following to your Cargo.toml file:

[dependencies]
trie-rs = "0.1"  # NOTE: Replace to latest minor version.

Usage Overview

use std::str;
use trie_rs::TrieBuilder;

let mut builder = TrieBuilder::new();  // Inferred `TrieBuilder<u8>` automatically
builder.push("すし");
builder.push("すしや");
builder.push("すしだね");
builder.push("すしづめ");
builder.push("すしめし");
builder.push("すしをにぎる");
builder.push("すし");  // Word `push`ed twice is just ignored.
builder.push("🍣");

let trie = builder.build();

// exact_match(): Find a word exactly match to query.
assert_eq!(trie.exact_match("すし"), true);
assert_eq!(trie.exact_match("🍣"), true);
assert_eq!(trie.exact_match("🍜"), false);

// predictive_search(): Find words which include `query` as their prefix.
let results_in_u8s: Vec<Vec<u8>> = trie.predictive_search("すし");
let results_in_str: Vec<&str> = results_in_u8s
    .iter()
    .map(|u8s| str::from_utf8(u8s).unwrap())
    .collect();
assert_eq!(
    results_in_str,
    vec![
        "すし",
        "すしだね",
        "すしづめ",
        "すしめし",
        "すしや",
        "すしをにぎる"
    ]  // Sorted by `Vec<u8>`'s order
);

// common_prefix_search(): Find words which is included in `query`'s prefix.
let results_in_u8s: Vec<Vec<u8>> = trie.common_prefix_search("すしや");
let results_in_str: Vec<&str> = results_in_u8s
    .iter()
    .map(|u8s| str::from_utf8(u8s).unwrap())
    .collect();
assert_eq!(
    results_in_str,
    vec![
        "すし",
        "すしや",
    ]  // Sorted by `Vec<u8>`'s order
);

Using with Various Data Types

TrieBuilder is implemented using generic type like following:

impl<Label: Ord + Clone> TrieBuilder<Label> {
    ...
    pub fn push<Arr: AsRef<[Label]>>(&mut self, word: Arr) { ... }
    ...
}

In the above Usage Overview example, we used Label=u8, Arr=&str.

Here shows other Label and Arr type examples.

Label=&str, Arr=Vec<&str>

Say Label is English words and Arr is English phrases.

use trie_rs::TrieBuilder;

let mut builder = TrieBuilder::new();
builder.push(vec!["a", "woman"]);
builder.push(vec!["a", "woman", "on", "the", "beach"]);
builder.push(vec!["a", "woman", "on", "the", "run"]);

let trie = builder.build();

assert_eq!(
    trie.exact_match(vec!["a", "woman", "on", "the", "beach"]),
    true
);
assert_eq!(
    trie.predictive_search(vec!["a", "woman", "on"]),
    vec![
        ["a", "woman", "on", "the", "beach"],
        ["a", "woman", "on", "the", "run"],
    ],
);
assert_eq!(
    trie.common_prefix_search(vec!["a", "woman", "on", "the", "beach"]),
    vec![vec!["a", "woman"], vec!["a", "woman", "on", "the", "beach"]],
);

Label=u8, Arr=[u8; n]

Say Label is a digit in Pi (= 3.14...) and Arr is a window to separate pi's digit by 10.

use trie_rs::TrieBuilder;

let mut builder = TrieBuilder::<u8>::new(); // Pi = 3.14...

builder.push([1, 4, 1, 5, 9, 2, 6, 5, 3, 5]);
builder.push([8, 9, 7, 9, 3, 2, 3, 8, 4, 6]);
builder.push([2, 6, 4, 3, 3, 8, 3, 2, 7, 9]);
builder.push([6, 9, 3, 9, 9, 3, 7, 5, 1, 0]);
builder.push([5, 8, 2, 0, 9, 7, 4, 9, 4, 4]);
builder.push([5, 9, 2, 3, 0, 7, 8, 1, 6, 4]);
builder.push([0, 6, 2, 8, 6, 2, 0, 8, 9, 9]);
builder.push([8, 6, 2, 8, 0, 3, 4, 8, 2, 5]);
builder.push([3, 4, 2, 1, 1, 7, 0, 6, 7, 9]);
builder.push([8, 2, 1, 4, 8, 0, 8, 6, 5, 1]);
builder.push([3, 2, 8, 2, 3, 0, 6, 6, 4, 7]);
builder.push([0, 9, 3, 8, 4, 4, 6, 0, 9, 5]);
builder.push([5, 0, 5, 8, 2, 2, 3, 1, 7, 2]);
builder.push([5, 3, 5, 9, 4, 0, 8, 1, 2, 8]);

let trie = builder.build();

assert_eq!(trie.exact_match([5, 3, 5, 9, 4, 0, 8, 1, 2, 8]), true);
assert_eq!(
    trie.predictive_search([3]),
    vec![
        [3, 2, 8, 2, 3, 0, 6, 6, 4, 7],
        [3, 4, 2, 1, 1, 7, 0, 6, 7, 9],
    ],
);
assert_eq!(
    trie.common_prefix_search([1, 4, 1, 5, 9, 2, 6, 5, 3, 5]),
    vec![[1, 4, 1, 5, 9, 2, 6, 5, 3, 5]],
);

Features

  • Generic type support: As the above examples show, trie-rs can be used for searching not only UTF-8 string but also other data types.
  • Based on louds-rs, which is fast, parallelized, and memory efficient.
  • Latest benchmark results are always accessible: trie-rs is continuously benchmarked in Travis CI using Criterion.rs. Graphical benchmark results are published here.

Acknowledgments

edict.furigana is used for benchmark. This file is constructed in the following step:

  1. Download edict.gz from EDICT.
  2. Convert it from original EUC into UTF-8.
  3. Translate it into CSV file with edict-to-csv.
  4. Extract field $1 for Hiragana/Katakana words, and field $3 for other (like Kanji) words.
  5. Translate Katakana into Hiragana with kana2hira.

Many thanks for these dictionaries and tools.

Versions

trie-rs uses semantic versioning.

Since current major version is 0, minor version update might involve breaking public API change (although it is carefully avoided).

Rust Version Supports

trie-rs is continuously tested with these Rust versions in Travis CI:

  • 1.33.0
  • Latest stable version

So it expectedly works with Rust 1.33.0 and any newer versions.

Older versions may also work, but are not tested or guaranteed.

Contributing

Any kind of pull requests are appreciated.

Guidelines

  • README.md is generated from $ cargo readme command. Do not manually update README.md but edit src/lib.rs and then $ cargo readme > README.md.
  • Travis CI automatically does the following commit & push to your pull-requests:
    • $ cargo readme > README.md
    • $ cargo fmt --all

License

MIT OR Apache-2.0

trie-rs's People

Contributors

laysakura avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.