GithubHelp home page GithubHelp logo

bevacqua / fuzzysearch Goto Github PK

View Code? Open in Web Editor NEW
2.7K 43.0 86.0 15 KB

:crystal_ball: Tiny and blazing-fast fuzzy search in JavaScript

Home Page: https://ponyfoo.com

License: MIT License

JavaScript 100.00%

fuzzysearch's Introduction

fuzzysearch

Tiny and blazing-fast fuzzy search in JavaScript

Fuzzy searching allows for flexibly matching a string with partial input, useful for filtering data very quickly based on lightweight user input.

Demo

To see fuzzysearch in action, head over to bevacqua.github.io/horsey, which is a demo of an autocomplete component that uses fuzzysearch to filter out results based on user input.

Install

From npm

npm install --save fuzzysearch

fuzzysearch(needle, haystack)

Returns true if needle matches haystack using a fuzzy-searching algorithm. Note that this program doesn't implement levenshtein distance, but rather a simplified version where there's no approximation. The method will return true only if each character in the needle can be found in the haystack and occurs after the preceding matches.

fuzzysearch('twl', 'cartwheel') // <- true
fuzzysearch('cart', 'cartwheel') // <- true
fuzzysearch('cw', 'cartwheel') // <- true
fuzzysearch('ee', 'cartwheel') // <- true
fuzzysearch('art', 'cartwheel') // <- true
fuzzysearch('eeel', 'cartwheel') // <- false
fuzzysearch('dog', 'cartwheel') // <- false

An exciting application for this kind of algorithm is to filter options from an autocomplete menu, check out horsey for an example on how that might look like.

But! RegExps...!

chart showing abysmal performance for regexp-based implementation

The current implementation uses the algorithm suggested by Mr. Aleph, a crazy russian compiler engineer working at V8.

License

MIT

fuzzysearch's People

Contributors

bevacqua avatar jayrhynas avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

fuzzysearch's Issues

Circuit breaker improvement

You can return a little earlier when there is no match by replacing the second breaker with:

if (qlen === tlen) {
   return query === text;
}

Since if the lengths are equal but the query does not equal the text, there is no way the query will match the text.

needle === haystack ?

I don't get this part:

if (nlen === hlen) {
    return needle === haystack;
  }

this means:
"apple" === "appel" false

Shouldn't this example return true?

Why is nlen === hlen important ?

Support unicode fuzzy search

I've just took a look at the source code and I saw str.charCodeAt is used instead of str.codePointAt. So fuzzy searching unicode characters (multibytes characters for example) is probably not supported by this algorithm.

Regex seem to be almost as performant

From my tests on a ~400 kb dataset, the the mraleph's approach was barely faster than regexps, not several orders of magnitude as the graph in the README shows.

As for the jsperf test results, then that might be due to toReg and escapeRegExp. That should be done once, not on every iteration.

Edit: very odd. When the needle is english the regex doesn't perform well. My initial tests were with an Arabic needle.

Edit two: so regexes turned out to be quite close in performance, but it depends on the needle and query.

indexOf is just as fast

I realize the so-called "high resolution" timers in JS are fucked with due to security reasons, but see below:

var timeOrigin = performance.now() ;fuzzysearch("wrt", "wrtten") ; var timeOrigin2 = performance.now()

timeOrigin2 - timeOrigin
0.0050000089686363935

var timeOrigin = performance.now() ;"wrtten".indexOf("wrt") ; var timeOrigin2 = performance.now()

timeOrigin2 - timeOrigin
0.0050000089686363935

Different from indexOf?

I might be misunderstanding something here, but how is this different from:
haystack.indexOf(needle) >= 0

Is it faster?

Interesing, and I got one too:

Mine was in a functional one written in CirruScript(compiles with http://repo.cirru.org/script/):

= generateSearch $ \ (text query info index)
  -- "return if matches"
  if (is text query)
    do $ return $ object (:match false) (:start 0)
  -- "reurn false if text is using up"
  if (and (is text.length 0) (> query.length 0))
    do $ return $ object (:match false) (:start 0)
  -- "return true if query is using up"
  if (is query.length 0)
    do $ return info

  = nextIndex $ + index 1
  -- "first letter matches, keep going without first letters"
  if (is (. text 0) (. query 0))
    do
      if (< nextIndex info.start)
        do $ = info.start nextIndex
      return $ generateSearch (text.substr 1) (query.substr 1) info nextIndex
  -- "first letter does not match, but keep going without first letter of text"
  generateSearch (text.substr 1) query info nextIndex

= exports.fuzzyStart $ \ (list query)
  = result $ list.map $ \ (text)
    = info $ object (:start 10) (:match true) (:text text)
    generateSearch text query info 0
  -- "filter and sort"
  = result $ result.filter $ \ (item) item.match

  = result $ result.sort $ \ (a b)
    cond
      (< a.start b.start) -1
      (> a.start b.start) 1
      else 0

  return $ result.map $ \ (item) item.text

What about...

function fuzzysearch(query, data) {
    return (new RegExp(query.split('').join('.*'))).test(data);
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.