GithubHelp home page GithubHelp logo

laserpants / fuzzyset-haskell Goto Github PK

View Code? Open in Web Editor NEW
9.0 2.0 3.0 212 KB

:sheep: A fuzzy string set implementation in Haskell.

Home Page: http://hackage.haskell.org/package/fuzzyset

License: BSD 3-Clause "New" or "Revised" License

Haskell 100.00%
haskell fuzzy-matching search string

fuzzyset-haskell's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

fuzzyset-haskell's Issues

Something work wrong for large sets

Maybe I just didn't get idea good enough, but it looks like bug for me:

let fzr132 = (fromList . take 132 . repeat) (pack "John Smith")
getWithMinScore 0.72 (fzr132 `add` (pack "Joseph Dombrowski")) (pack "Joe Dombrowski")
-- [(0.8235294117647058,"Joseph Dombrowski")]
let fzr133 = (fromList . take 133 . repeat) (pack "John Smith")
getWithMinScore 0.72 (fzr133 `add` (pack "Joseph Dombrowski")) (pack "Joe Dombrowski")
-- []

"Stack overflow" error when GHC option `rtsopts` is low

When testing our project, we set ghc-options: -with-rtsopts=-1K1 to avoid space leaks. Everything worked correctly for version 0.2.3, but when we upgraded to 0.3.2, we started to get "Stack overflow errors" when replacing getOne with the new findOne and closestMatch functions (discussion for reference: PostgREST/postgrest#3329).

To use a reproducible example. In version 0.2.4:

{-# LANGUAGE OverloadedStrings #-}

module Main where

import qualified Data.FuzzySet as Fuzzy

main :: IO ()
main = do
  putStrLn $ show suggestVal
  where
    fuzzySet = Fuzzy.fromList ["authors_books_number","grandchild_entities","label_screen","authors_have_book_in_decade2","second_1","agents","Escap3e;","materialized_projects","books","tournaments","Foo","client","unit_workdays","datarep_todos","job","students_info","car_models_car_dealers","organizations","articleStars","plate","being_part","publishers","car_racers","person","sponsors","family_tree","datarep_next_two_todos","users_tasks","actors","descendant","simple_pk","capital","big_projects","budget_categories","v2","contract","second","profiles","whatev_sites","limited_article_stars","students_view","authors_have_book_in_decade","suppliers_trade_unions","v1","datarep_todos_computed","baz","has_fk","shop_bles","screens","status","touched_files","alpha_projects","b","users_projects","contract_view","designers","sixties_books","suppliers","group_yard","player_view","projects_view_with_delete_trigger","items","projects_view_with_all_triggers_with_pk","end_2","posters","entities","videogames","projects_view_with_all_triggers_without_pk","being","person_detail","filtered_tasks","table_b","i2459_self_v2","odd_years_publications","schauspieler","products_suppliers","projects_auto_updatable_view_with_pk","contact","authors","janedoe","subscriptions","yards","i2459_composite_v2","departments","activities","ghostBusters","first_1","users","addresses","i2459_composite_v1","consumers_extra_view","bar","items2","unit_workdays_fst_shift","projects_view_with_insert_trigger","user_friend","whatev_projects","tasks","clients","forties_books","projects","zeta_projects","series_popularity","adaptation_notifications","message","child_entities","trade_unions","articles","trash_details","files","consumers_view_view","products","projects_view_alt","schedules","part","view_test","projects_view_with_update_trigger","projects_auto_updatable_view_without_pk","first","test_null_pk_competitors_sponsors","web_content","authors_w_entities","filme","i2459_simple_v2","vb","pages","sites","va","i2459_simple_v1","orders","well","jobs","personnages","plate_plan_step","projects_view_without_triggers","consumers_view","projects_view","foos","i2459_self_v1","child_entities_view","table_a","auto_incrementing_pk","main_jobs","trash","students_info_view","fee","referrals","insertable_view_with_join","competitors","end_1","car_model_sales","bars","budget_expenses","bad_subquery","orders_view","projects_count_grouped_by","shops","project_invoices","whatev_jobs","johnsmith","country","car_models","car_brands","series","films","test","students","fifties_books","a","managers","labels","forties_and_fifties_books","space","clientinfo","groups","comments","zone","car_dealers"]
    suggestVal = Fuzzy.getOne fuzzySet "some_non_existent_value"

With the ghc-options: -with-rtsopts=-1K1 setting, this prints:

Nothing

But, in version 0.3.0, with the same settings:

{-# LANGUAGE OverloadedStrings #-}

module Main where

import qualified Data.FuzzySet.Simple as Fuzzy

main :: IO ()
main = do
  putStrLn $ show suggestVal
  where
    fuzzySet = Fuzzy.fromList ["authors_books_number","grandchild_entities","label_screen","authors_have_book_in_decade2","second_1","agents","Escap3e;","materialized_projects","books","tournaments","Foo","client","unit_workdays","datarep_todos","job","students_info","car_models_car_dealers","organizations","articleStars","plate","being_part","publishers","car_racers","person","sponsors","family_tree","datarep_next_two_todos","users_tasks","actors","descendant","simple_pk","capital","big_projects","budget_categories","v2","contract","second","profiles","whatev_sites","limited_article_stars","students_view","authors_have_book_in_decade","suppliers_trade_unions","v1","datarep_todos_computed","baz","has_fk","shop_bles","screens","status","touched_files","alpha_projects","b","users_projects","contract_view","designers","sixties_books","suppliers","group_yard","player_view","projects_view_with_delete_trigger","items","projects_view_with_all_triggers_with_pk","end_2","posters","entities","videogames","projects_view_with_all_triggers_without_pk","being","person_detail","filtered_tasks","table_b","i2459_self_v2","odd_years_publications","schauspieler","products_suppliers","projects_auto_updatable_view_with_pk","contact","authors","janedoe","subscriptions","yards","i2459_composite_v2","departments","activities","ghostBusters","first_1","users","addresses","i2459_composite_v1","consumers_extra_view","bar","items2","unit_workdays_fst_shift","projects_view_with_insert_trigger","user_friend","whatev_projects","tasks","clients","forties_books","projects","zeta_projects","series_popularity","adaptation_notifications","message","child_entities","trade_unions","articles","trash_details","files","consumers_view_view","products","projects_view_alt","schedules","part","view_test","projects_view_with_update_trigger","projects_auto_updatable_view_without_pk","first","test_null_pk_competitors_sponsors","web_content","authors_w_entities","filme","i2459_simple_v2","vb","pages","sites","va","i2459_simple_v1","orders","well","jobs","personnages","plate_plan_step","projects_view_without_triggers","consumers_view","projects_view","foos","i2459_self_v1","child_entities_view","table_a","auto_incrementing_pk","main_jobs","trash","students_info_view","fee","referrals","insertable_view_with_join","competitors","end_1","car_model_sales","bars","budget_expenses","bad_subquery","orders_view","projects_count_grouped_by","shops","project_invoices","whatev_jobs","johnsmith","country","car_models","car_brands","series","films","test","students","fifties_books","a","managers","labels","forties_and_fifties_books","space","clientinfo","groups","comments","zone","car_dealers"]
    suggestVal = Fuzzy.closestMatch "some_non_existent_value" fuzzySet

It prints:

Stack space overflow: current size 33624 bytes.
Relink with -rtsopts and use `+RTS -Ksize -RTS' to increase it.

In conclusion the library may be having space leaks in the new versions (~33KB) compared to the older ones (less than 1KB).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.