GithubHelp home page GithubHelp logo

yeondudad / franc Goto Github PK

View Code? Open in Web Editor NEW

This project forked from wooorm/franc

0.0 2.0 0.0 1008 KB

Detect the language of text

Home Page: http://wooorm.github.io/franc/

License: MIT License

JavaScript 96.18% Shell 3.82%

franc's Introduction

franc

Build Status Coverage Status Code Climate

Detect the language of text.

What’s so cool about franc?

  1. franc supports more languages(†) than any other library, or Google;
  2. franc is easily forked to support 335 languages;
  3. franc is just as fast as the competition.

† - If humans write in the language, on the web, and the language has more than one million speakers, franc detects it.

Installation

npm:

$ npm install franc

Component:

$ component install wooorm/franc

Bower:

$ bower install franc

Usage

var franc = require('franc');

franc('Alle menslike wesens word vry'); // "afr"
franc('এটি একটি ভাষা একক IBM স্ক্রিপ্ট'); // "ben"
franc('Alle mennesker er født frie og'); // "nno"
franc(''); // "und"

franc.all('O Brasil caiu 26 posições em');
/*
 * [
 *   [ 'por', 1 ],
 *   [ 'glg', 0.7362599377808503 ],
 *   [ 'src', 0.7286553750432078 ],
 *   [ 'lav', 0.6944348427238161 ],
 *   [ 'cat', 0.6802627030763913 ],
 *   [ 'spa', 0.6633252678880055 ],
 *   [ 'bos', 0.6536467334946423 ],
 *   [ 'tpi', 0.6477704804701002 ],
 *   [ 'hrv', 0.6456965088143796 ],
 *   [ 'snn', 0.6374006221914967 ],
 *   [ 'bam', 0.5900449360525406 ],
 *   [ 'sco', 0.5893536121673004 ],
 *   ...
 * ]
 */

/* "und" is returned for too-short input: */
franc.all(''); // [ [ 'und', 1 ] ]

/* Provide a whitelist: */
franc.all('O Brasil caiu 26 posições em', {
    'whitelist' : ['por', 'src', 'glg', 'spa']
});
/*
 * [
 *   [ 'por', 1 ],
 *   [ 'glg', 0.7362599377808503 ],
 *   [ 'src', 0.7286553750432078 ],
 *   [ 'spa', 0.6633252678880055 ]
 * ]
*/

/* Provide a blacklist: */
franc.all('O Brasil caiu 26 posições em', {
    'blacklist' : ['src', 'glg', 'lav']
});
/*
 * [
 *   [ 'por', 1 ],
 *   [ 'cat', 0.6802627030763913 ],
 *   [ 'spa', 0.6633252678880055 ],
 *   [ 'bos', 0.6536467334946423 ],
 *   [ 'tpi', 0.6477704804701002 ],
 *   [ 'hrv', 0.6456965088143796 ],
 *   [ 'snn', 0.6374006221914967 ],
 *   [ 'bam', 0.5900449360525406 ],
 *   [ 'sco', 0.5893536121673004 ],
 *   ...
 * ]
 */

CLI

Install:

$ npm install --global franc

Use:

Usage: franc [options] string

Detect the language of text

Options:

  -h, --help                    output usage information
  -v, --version                 output version number
  -w, --whitelist <string>      allow languages
  -b, --blacklist <string>      disallow languages

Usage:

# output language of value
$ franc "Alle menslike wesens word vry"
# afr

# output language from stdin
$ echo "এটি একটি ভাষা একক IBM স্ক্রিপ্ট" | franc
# ben

# blacklist certain languages
$ franc --blacklist por,glg "O Brasil caiu 26 posições em"
# src

# whitelist certain languages and use stdin
$ echo "Alle mennesker er født frie og" | franc --whitelist nob,dan
# nob

Supported languages

franc supports 175 “languages”. For a complete list, check out Supported-Languages.md.

Supporting more or less languages

Supporting more or less languages is easy: fork the project and run the following:

$ npm install # Install development dependencies.
$ THRESHOLD=100000 npm run build # Run the `build` script with an environment variable.

The above would create a version of franc with support for any language with 100,000 or more speakers. To support all languages, even dead ones like Latin, specify -1.

Benchmark

On a MacBook Air, it runs 175 paragraphs 2 times per second (total: 350 op/s).

         benchmarks * 175 paragraphs in different languages
  2 op/s » franc -- this module
  2 op/s » guesslanguage
  2 op/s » languagedetect
  2 op/s » vac

(I’ll work on a better benchmark soon)

Derivation

Franc is a derivative work from guess-language (Python, LGPL), guesslanguage (C++, LGPL), and Language::Guess (Perl, GPL). Their creators granted me the rights to distribute franc under the MIT license: respectively, Maciej Ceglowski, Jacob R. Rideout, and Kent S. Johnson.

License

MIT © Titus Wormer

franc's People

Contributors

jeffhuys avatar kamilbielawski avatar wooorm avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.