GithubHelp home page GithubHelp logo

isabella232 / fuzzy-string-match Goto Github PK

View Code? Open in Web Editor NEW

This project forked from whitepages/fuzzy-string-match

0.0 0.0 0.0 118 KB

fuzzy string matching library for ruby

License: Apache License 2.0

fuzzy-string-match's Introduction

What is fuzzy-string-match

Build Status

  • fuzzy-string-match is a fuzzy string matching library for ruby.
  • It is fast. ( written in C with RubyInline )
  • It supports only Jaro-Winkler distance algorithm.
  • This program was ported by hand from lucene-3.0.2. (lucene is Java product)
  • If you want to add another string distance algorithm, please fork it on github and port by yourself.

The reason why i developed fuzzy-string-match

  • I tried amatch-0.2.5, but it contains some issues.
    1. memory leaks.
    2. I felt difficult to maintain it.
  • So, I decide to create another gem by porting lucene-3.0.x.

Installing

gem install fuzzy-string-match

Installing (pure ruby version)

gem install fuzzy-string-match_pure

Features

  • Calculate Jaro-Winkler distance of two strings.
    • Pure ruby version can handle both ASCII and UTF8 strings. (and slow)
    • Native version can only ASCII strings. (but it is fast)

Sample code

Native version

require 'fuzzystringmatch'
jarow = FuzzyStringMatch::JaroWinkler.create( :native )
p jarow.getDistance(  "jones",      "johnson" )

Pure ruby version

require 'fuzzystringmatch'
jarow = FuzzyStringMatch::JaroWinkler.create( :pure )
p jarow.getDistance(  "jones",      "johnson" )
p jarow.getDistance(  "ああ",        "あい"        )

Sample on irb

irb(main):001:0> require 'fuzzystringmatch'
require 'fuzzystringmatch'
=> true

irb(main):002:0> jarow = FuzzyStringMatch::JaroWinkler.create( :native )
jarow = FuzzyStringMatch::JaroWinkler.create( :native )
=> #<FuzzyStringMatch::JaroWinklerNative:0x000001011b0010>

irb(main):003:0> jarow.getDistance( "al",        "al"        )
jarow.getDistance( "al",        "al"        )
=> 1.0

irb(main):004:0> jarow.getDistance( "dixon",     "dicksonx"  )
jarow.getDistance( "dixon",     "dicksonx"  )
=> 0.8133333333333332

Benchmarks

$ rake bench
ruby ./benchmark/vs_amatch.rb
 --- 
 --- Each match functions will be called 1Mega times. --- 
 --- 
[Amatch]
      user     system      total        real
  1.160000   0.050000   1.210000 (  1.218259)
[this Module (pure)]
      user     system      total        real
 39.940000   0.160000  40.100000 ( 40.542448)
[this Module (native)]
      user     system      total        real
  0.480000   0.000000   0.480000 (  0.484187)

Requires

for CRuby

  • RubyInline
  • Ruby 1.9.1 or higher

for JRuby

  • JRuby 1.6.6 or higher

Author

  • Copyright (C) Kiyoka Nishiyama [email protected]
  • I ported from java source code of lucene-3.0.2.

See also

License

  • Apache 2.0 LICENSE

fuzzy-string-match's People

Contributors

kiyoka avatar msch avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.