laserpants / fuzzyset-haskell Goto Github PK
View Code? Open in Web Editor NEW:sheep: A fuzzy string set implementation in Haskell.
Home Page: http://hackage.haskell.org/package/fuzzyset
License: BSD 3-Clause "New" or "Revised" License
:sheep: A fuzzy string set implementation in Haskell.
Home Page: http://hackage.haskell.org/package/fuzzyset
License: BSD 3-Clause "New" or "Revised" License
Maybe I just didn't get idea good enough, but it looks like bug for me:
let fzr132 = (fromList . take 132 . repeat) (pack "John Smith")
getWithMinScore 0.72 (fzr132 `add` (pack "Joseph Dombrowski")) (pack "Joe Dombrowski")
-- [(0.8235294117647058,"Joseph Dombrowski")]
let fzr133 = (fromList . take 133 . repeat) (pack "John Smith")
getWithMinScore 0.72 (fzr133 `add` (pack "Joseph Dombrowski")) (pack "Joe Dombrowski")
-- []
When testing our project, we set ghc-options: -with-rtsopts=-1K1
to avoid space leaks. Everything worked correctly for version 0.2.3
, but when we upgraded to 0.3.2
, we started to get "Stack overflow errors" when replacing getOne
with the new findOne
and closestMatch
functions (discussion for reference: PostgREST/postgrest#3329).
To use a reproducible example. In version 0.2.4
:
{-# LANGUAGE OverloadedStrings #-}
module Main where
import qualified Data.FuzzySet as Fuzzy
main :: IO ()
main = do
putStrLn $ show suggestVal
where
fuzzySet = Fuzzy.fromList ["authors_books_number","grandchild_entities","label_screen","authors_have_book_in_decade2","second_1","agents","Escap3e;","materialized_projects","books","tournaments","Foo","client","unit_workdays","datarep_todos","job","students_info","car_models_car_dealers","organizations","articleStars","plate","being_part","publishers","car_racers","person","sponsors","family_tree","datarep_next_two_todos","users_tasks","actors","descendant","simple_pk","capital","big_projects","budget_categories","v2","contract","second","profiles","whatev_sites","limited_article_stars","students_view","authors_have_book_in_decade","suppliers_trade_unions","v1","datarep_todos_computed","baz","has_fk","shop_bles","screens","status","touched_files","alpha_projects","b","users_projects","contract_view","designers","sixties_books","suppliers","group_yard","player_view","projects_view_with_delete_trigger","items","projects_view_with_all_triggers_with_pk","end_2","posters","entities","videogames","projects_view_with_all_triggers_without_pk","being","person_detail","filtered_tasks","table_b","i2459_self_v2","odd_years_publications","schauspieler","products_suppliers","projects_auto_updatable_view_with_pk","contact","authors","janedoe","subscriptions","yards","i2459_composite_v2","departments","activities","ghostBusters","first_1","users","addresses","i2459_composite_v1","consumers_extra_view","bar","items2","unit_workdays_fst_shift","projects_view_with_insert_trigger","user_friend","whatev_projects","tasks","clients","forties_books","projects","zeta_projects","series_popularity","adaptation_notifications","message","child_entities","trade_unions","articles","trash_details","files","consumers_view_view","products","projects_view_alt","schedules","part","view_test","projects_view_with_update_trigger","projects_auto_updatable_view_without_pk","first","test_null_pk_competitors_sponsors","web_content","authors_w_entities","filme","i2459_simple_v2","vb","pages","sites","va","i2459_simple_v1","orders","well","jobs","personnages","plate_plan_step","projects_view_without_triggers","consumers_view","projects_view","foos","i2459_self_v1","child_entities_view","table_a","auto_incrementing_pk","main_jobs","trash","students_info_view","fee","referrals","insertable_view_with_join","competitors","end_1","car_model_sales","bars","budget_expenses","bad_subquery","orders_view","projects_count_grouped_by","shops","project_invoices","whatev_jobs","johnsmith","country","car_models","car_brands","series","films","test","students","fifties_books","a","managers","labels","forties_and_fifties_books","space","clientinfo","groups","comments","zone","car_dealers"]
suggestVal = Fuzzy.getOne fuzzySet "some_non_existent_value"
With the ghc-options: -with-rtsopts=-1K1
setting, this prints:
Nothing
But, in version 0.3.0
, with the same settings:
{-# LANGUAGE OverloadedStrings #-}
module Main where
import qualified Data.FuzzySet.Simple as Fuzzy
main :: IO ()
main = do
putStrLn $ show suggestVal
where
fuzzySet = Fuzzy.fromList ["authors_books_number","grandchild_entities","label_screen","authors_have_book_in_decade2","second_1","agents","Escap3e;","materialized_projects","books","tournaments","Foo","client","unit_workdays","datarep_todos","job","students_info","car_models_car_dealers","organizations","articleStars","plate","being_part","publishers","car_racers","person","sponsors","family_tree","datarep_next_two_todos","users_tasks","actors","descendant","simple_pk","capital","big_projects","budget_categories","v2","contract","second","profiles","whatev_sites","limited_article_stars","students_view","authors_have_book_in_decade","suppliers_trade_unions","v1","datarep_todos_computed","baz","has_fk","shop_bles","screens","status","touched_files","alpha_projects","b","users_projects","contract_view","designers","sixties_books","suppliers","group_yard","player_view","projects_view_with_delete_trigger","items","projects_view_with_all_triggers_with_pk","end_2","posters","entities","videogames","projects_view_with_all_triggers_without_pk","being","person_detail","filtered_tasks","table_b","i2459_self_v2","odd_years_publications","schauspieler","products_suppliers","projects_auto_updatable_view_with_pk","contact","authors","janedoe","subscriptions","yards","i2459_composite_v2","departments","activities","ghostBusters","first_1","users","addresses","i2459_composite_v1","consumers_extra_view","bar","items2","unit_workdays_fst_shift","projects_view_with_insert_trigger","user_friend","whatev_projects","tasks","clients","forties_books","projects","zeta_projects","series_popularity","adaptation_notifications","message","child_entities","trade_unions","articles","trash_details","files","consumers_view_view","products","projects_view_alt","schedules","part","view_test","projects_view_with_update_trigger","projects_auto_updatable_view_without_pk","first","test_null_pk_competitors_sponsors","web_content","authors_w_entities","filme","i2459_simple_v2","vb","pages","sites","va","i2459_simple_v1","orders","well","jobs","personnages","plate_plan_step","projects_view_without_triggers","consumers_view","projects_view","foos","i2459_self_v1","child_entities_view","table_a","auto_incrementing_pk","main_jobs","trash","students_info_view","fee","referrals","insertable_view_with_join","competitors","end_1","car_model_sales","bars","budget_expenses","bad_subquery","orders_view","projects_count_grouped_by","shops","project_invoices","whatev_jobs","johnsmith","country","car_models","car_brands","series","films","test","students","fifties_books","a","managers","labels","forties_and_fifties_books","space","clientinfo","groups","comments","zone","car_dealers"]
suggestVal = Fuzzy.closestMatch "some_non_existent_value" fuzzySet
It prints:
Stack space overflow: current size 33624 bytes.
Relink with -rtsopts and use `+RTS -Ksize -RTS' to increase it.
In conclusion the library may be having space leaks in the new versions (~33KB) compared to the older ones (less than 1KB).
The "fuzzy" library has a cool feature to enclose the matched strings with custom text: https://www.stackage.org/haddock/lts-15.1/fuzzy-0.1.0.0/Text-Fuzzy.html#v:match
Any chance you will support this as well? ๐
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.