GithubHelp home page GithubHelp logo

bmatheson / whatlang-rs Goto Github PK

View Code? Open in Web Editor NEW

This project forked from greyblake/whatlang-rs

0.0 1.0 0.0 1.11 MB

Natural language detection library for Rust. Try demo online: https://www.greyblake.com/whatlang/

Home Page: https://www.greyblake.com/whatlang/

License: MIT License

Ruby 0.33% Rust 99.24% HTML 0.43%

whatlang-rs's Introduction

Whatlang

Build Status License Documentation Demo

Natural language detection for Rust with focus on simplicity and performance.

Features

  • Supports 87 languages
  • 100% written in Rust
  • Lightweight, fast and simple
  • Recognizes not only a language, but also a script (Latin, Cyrillic, etc)
  • Provides reliability information

Get started

Add to you Cargo.toml:

[dependencies]

whatlang = "0.11.1"

Example:

extern crate whatlang;

use whatlang::{detect, Lang, Script};

fn main() {
    let text = "Ĉu vi ne volas eklerni Esperanton? Bonvolu! Estas unu de la plej bonaj aferoj!";

    let info = detect(text).unwrap();
    assert_eq!(info.lang(), Lang::Epo);
    assert_eq!(info.script(), Script::Latin);
    assert_eq!(info.confidence(), 1.0);
    assert!(info.is_reliable());
}

For more details (e.g. how to blacklist some languages) please check the documentation.

Requirements

The latest whatlang library works with rust 1.31.0 or higher.

How does it work?

How does the language recognition work?

The algorithm is based on the trigram language models, which is a particular case of n-grams. To understand the idea, please check the original whitepaper Cavnar and Trenkle '94: N-Gram-Based Text Categorization'.

How is_reliable calculated?

It is based on the following factors:

  • How many unique trigrams are in the given text
  • How big is the difference between the first and the second(not returned) detected languages? This metric is called rate in the code base.

Therefore, it can be presented as 2d space with threshold functions, that splits it into "Reliable" and "Not reliable" areas. This function is a hyperbola and it looks like the following one:

Language recognition whatlang rust

For more details, please check a blog article Introduction to Rust Whatlang Library and Natural Language Identification Algorithms.

Running benchmarks

This is mostly useful to test performance optimizations.

cargo bench

Comparison with alternatives

Whatlang CLD2 CLD3
Implementation language Rust C++ C++
Languages 87 83 107
Algorithm trigrams quadgrams neural network
Supported Encoding UTF-8 UTF-8 ?
HTML support no yes ?

Ports and clones

Derivation

Whatlang is a derivative work from Franc (JavaScript, MIT) by Titus Wormer.

License

MIT © Sergey Potapov

Contributors

whatlang-rs's People

Contributors

greyblake avatar dr-emann avatar elegaanz avatar xrmx avatar batisteo avatar darccio avatar kitaitimakoto avatar ohsayan avatar s3bk avatar turingzhu avatar godofdream avatar kud1ing avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.