GithubHelp home page GithubHelp logo

fuse's People

Contributors

crocodile2u avatar dependabot-preview[bot] avatar dependabot[bot] avatar hjardines avatar loilo avatar shadowalker89 avatar vinkla avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

fuse's Issues

Exceeding the max pattern length in search string doesn't throw error

I've found a case whereby the threshold of 0 is not being respected.

I initialise the Fuse object like so
$fuseTest = new Fuse\Fuse([['name' => '8Th Street Clinic']], ['keys' => ['name'], 'threshold' => 0, 'includeScore' => true]);

and search for
$fuseTest->search('Tumaini Medial Clinic (Miriga Mieru West)')

I get back

 [
     [
       "item" => [
         "name" => "8Th Street Clinic",
       ],
       "score" => 0.5,
     ],
] 

Which as you can see has a score of 0.5 but is still returned.

Pattern exceed max length of 32

The pattern exceed max lenght of 32 is showing for the search query that has more than 32 characters with some unicode characters and in UTF-8 encoding. This most likely happen in BitapSearch::__construct() during the chunking of characters. Upon further checking, it seems that UTF-8 encoding has problems with PHP's substr and it might be better to use mb_substr, see topic

$query = 'send more money to my pàypàl account';
$fuzzyMatcher = new Fuse(['Sample collection only, please ignore'], []);

$fuzzyMatcher->search($query);

PHP 8 support

Hi! When will be released a stable version with PHP 8 support?

includeScore

I tried to use includeScore but it behaves differently than expected: that is, it doesn't return 0 or 1, but a float > 1 for the first match and a float between 0 and 1 for any other match.

Search is not working well

Right now I don't know how to set up the arguments for Search method:

Keywords: roofing

Current Results:

  • Undertile External Waterproofing Membranes
  • Roofing Membrane
  • Liquid waterproofing membrane
  • External waterproofing membrane
  • Liquid waterproofing membrane
    ======================================
    Expected Results:
  • Roofing Membrane
    ======================================
    Can someone help me sort this problem out? Thanks.

Fuse in Drupal

I am using this project on my local Drupal build. The Fuse array does not show a "Results" index in the array after the search is complete. Can you confirm that the results of the search should be in a "Results" index in the Fuse array?

PHP 5.3

Can I use this with php 5.3 if I change all array constructs from [] to array()?
Would there be any other 5.3 limitations?

preg_match fails when string has "/" in it

loilo/fuse/src/Bitap/regex_search.php: 7

Example trace

  Fuse\Bitap\regex_search("R. Aazyol", "Atelier Richard Woodman Burbridge/Harrods", " +")
  vendor/loilo/fuse/src/Bitap/Bitap.php : 63

Performance question

Hello,

I'm not quite sure if this is the right place to ask this, but I have a question about the performance of the search.

I currently have an array of ~81k indexes, with 3 keys each, only searching by one ("Name"). It currently takes ~3.8s to return 0 results (if I search something that doesn't match, and ~4.9s if I search something that does), with these settings:
$fuse = new \Fuse\Fuse($locationData, [ "keys" => ["Name"] , "threshold" => 0.2 , "distance" => 10 , "includeScore" => true , "includeMatches" => true ]);

My data (imported as JSON) looks like this:
Array ( [0] => Array ( [Name] => ACCOMACK [ListingCount] => 191 [Type] => county ) )

It looks like that all the time simply just comes from the size of the data, as with logging the times, I constantly got ~4s to reach the 9th match in a search. It also looks like that 'Fuse->analyze()' is run twice, as with a simple increment, the 9th match returns that it was the 160663 run of the function. Although, bypassing that (is_string was false, and is_list reactivated analyze as a string) didn't seem to improve the speed at all.

Here's an example of the timings it took to search (only logging when a match was found): http://prntscr.com/n2v14q

Is there a better format I can use for the data, or is this just a limitation of having 81k+ items to search through?

I'm not sure what other information you need, but let me know if I can provide anything else. I don't have any public page set up with this to test it against at the moment, but if you need, I can probably get that set up.

The data that I'm searching through has 5 distinct parts, so I tried also searching through each separately. When logging the time specifically, increased the total time by ~0.3s. When not logging, specifically, reduced the total time by ~0.8s (while also returning 40 more results)

PHP 7.2
Running on a DreamHost server (not sure of specs)

Thanks,
Tony.

Performance

Dear,

The library is great and the precision is very good. However, i have performances issues.

I'm running the tool on a list of 15,000 entries (2-3 words per entry, 1 field only) that i want to fuzzy against ifself.
Here is my configuration

$thresholdFuzzy = 0.6; // lower down that value to be more strict. 0 = exact match
$FuzzyParameters = [
  "keys" => [ "keywordName"],
  "includeScore" => true,			
  "maxPatternLength" => 32,
  "threshold" => $thresholdFuzzy,
  "minMatchCharLength" => 1
];

$fuse = new \Fuse\Fuse($dataToFuzzyNoExactMatch, $FuzzyParameters);
$results = $fuse->search($kw);

I executed a loop of fuzzy on (the first) 100 entries compared to 15,000.
It took 500 seconds to compute.

Any idea how i can improve the speed please?

Can't get a result

I have been trying to get a result from a search of my data set but I have yet to be able to get a result. I plugged my data set and options into the Fuse.js demo page and was able to get the expected results.

My data(small portion of it):

array(169) {
  [0]=>
  array(4) {
    ["name"]=>
    string(17) "Grace Watson Hall"
    ["description"]=>
    string(7) "GWH/025"
    ["mdo_id"]=>
    string(1) "3"
    ["coordinates"]=>
    array(2) {
      [0]=>
      float(-77.66901)
      [1]=>
      float(43.083652)
    }
  }
  [1]=>
  array(4) {
    ["name"]=>
    string(15) "Wallace Library"
    ["description"]=>
    string(7) "WAL/005"
    ["mdo_id"]=>
    string(1) "5"
    ["coordinates"]=>
    array(2) {
      [0]=>
      float(-77.676315)
      [1]=>
      float(43.083927)
    }
  }

initialization and search

$options = ["keys" => ["name","description","mdo_id"]];
$fuse =  new \Fuse\Fuse($data,$options);
$result = $fuse->search("Grace Watson Hall");
//result returns nothing and I get a 500 error

Modification to "analyze" method in Fuse Class

-- // Guard against "Undefined offset: 0" Error when $scoresLen == 0
-- // Guard against "Division By Zero" Error when $scoresLen == 0

$scoresLen = sizeof($scores);

            // Guide against "Undefined offset: 0" Error when $scoresLen == 0
            if($scoresLen > 0)
            $averageScore = $scores[0];
            
            for($i = 1; $i < $scoresLen; $i++) {
                $averageScore += $scores[$i];
            }
            
            // Guide against "Division By Zero" Error when $scoresLen == 0
            if($scoresLen > 0)
            $averageScore /= $scoresLen;

Thanks for the PHP port. Its very useful for me. Please don't forget to Update us when Fuse.js has major changes. Regards

Error that needs to be fixed

Hi
I am using your project that it is very useful but i noticed a problem that needs to be fixed.
In Fuse.php file at line 226 your code is
$averageScore = $averageScore / $scoresLen;

I noticed that sometimes the $scoresLen in my project is 0 so the the action of division prints out an error
I suggest adding the following in order to avoid this error
if($scoresLen !=0){
$averageScore = $averageScore / $scoresLen;
}
Best Regards

Getting weighted search

Hi,
I have been playing around with Fuse for a few days now and I can't get weighted search to work. I have a needle array that looks like this:

Array
(
[1] => Array
(
[company] => Array
(
[name] => STARTUPLIFERS IN SWEDEN
[registrationid] => 55XXXX-XXXX
[id] => 1
)
),
[company] => Array
(
[name] => Google
[registrationid] => 55XXXX-XXXX
[id] => 1
),
[tags] => Array
(
[name] => Startup
)
)
......
[7992] => Array
(
[stopwords] => Array
(
[name] => Startup
)
)
...

So essentially an array with companies and I have added "stop words" in the array.

I have created my key array as follows:
'keys' => [
[
'name' => 'stopword.name',
'weight' => 1
],
[
'name' => 'company.name',
'weight' => 0.5
],
[
'name' => 'tags.name',
'weight' => 1
]
]

But when I search for the term "Startup" it does not matter how I weight my keys, the company STARTUPLIFERS is always the one with the lowest score. followed by my stop word that is a 100% match. Im using the other options for my query:
"tokenize" => true,
"matchAllTokens" => true,
"caseSensitive" => false,
"includeScore" => true,
"shouldSort" => true

It looks like this is a limitation perhaps in Fuse.js (Will try to replicate it in JS) as there is no way to force a "word boundary" search that would have helped solving this specific problem.

Fuzzy search slow with large data

Hello,
We have a very large array with around 10 thousand records, each having following fields,
array(
array('name' => 'name1', 'email' => '[email protected]'),
array('name' => 'second name', 'email' => '[email protected]')
)
When searching this user's array for keys 'name' or 'email' with following settings Fuse is really slow it takes around 30, 40 seconds to find all records, I have following settings,

// initialize fuse
$settings['keys'] = array("name", "email");
$settings['distance'] = 50;
$settings['maxPatternLength'] = 16;
$settings['threshold'] = 0.6;
$fuse = new Fuse($usersArray, $settings);

// search user
$usersFound = $fuse->search("my name");

Is there a way or any setting which will help me make Fuse work faster? I tried same with JS version of Fuse and it was not that much slower took about few seconds to search for 10 thousand records

not really sure about the settings of arguments with this repo

So right now I have a requirement about Fuzzy search like this. However, I'm really confused about the Settings of blow list.

  • threshold: (type: float, default: 0.6)
  • distance (type: int, default: 100)
  • ... and other lots of arguments here

This is our pattern for searching:

To simplify our scope, if we are searching the phrase "A B C":
1.       All permutations of “A B C” allowed:
1.1   3 words in any order, e.g. A C B, B C A
1.2   3 words in any order +any other word(s), e.g. A C B+XY, B C A+Z
1.3   Any of the 2 words, in any order, e.g. A B, C B, C A
1.4   Any of the 2 words, in any order + any other word(s), e.g. A B+XY, CB+XZ, CA+Z
 
2.       Permutations of “A B C” not allowed:
2.1   Single word only, e.g. Just A or B, or C
2.2   One word + any other word(s), e.g. A+XY, B+YZ, C+Z

Thanks for your time from you guys.

Mixed scores for matching results.

I have matches that come through with a score of 2.2+ and then other concrete matches that show a score of 0.01 - I have some code that looks for anything over 2 and anything under 0.1, but are you aware of this?

What's Wrong With My Array?

I have an array that outputs like this:

    [1106] => Array
        (
            [id] => 1106
            [url] => 3-brides-for-3-bad-boys-160488
            [description] => 0
            [title] => 3 Brides for 3 Bad Boys
            [author] => 3 Brides for 3 Bad Boys (mf)
            [author_lname] => Boys
            [language] => en
        )

    [1107] => Array
        (
            [id] => 1107
            [url] => 3-brisingr-3-12630502066281
            [description] => 0
            [title] => 3-Brisingr-3
            [author] => Unknown
            [author_lname] => 
            [language] => eng
        )

    [1108] => Array
        (
            [id] => 1108
            [url] => 3-claus-of-death-120101
            [description] => 0
            [title] => 3 Claus of Death
            [author] => Gayle Trent
            [author_lname] => Trent
            [language] => en
        )

But that results in:

PHP Notice: Undefined offset: -1 in /var/www/search/vendor/loilo/fuse/src/Bitap/matched_indices.php on line 24
PHP Notice: Undefined offset: -1 in /var/www/search/vendor/loilo/fuse/src/Bitap/matched_indices.php on line 24
PHP Notice: Undefined offset: -1 in /var/www/search/vendor/loilo/fuse/src/Bitap/matched_indices.php on line 24
PHP Notice: Undefined offset: -1 in /var/www/search/vendor/loilo/fuse/src/Bitap/matched_indices.php on line 24
PHP Notice: Undefined offset: -1 in /var/www/search/vendor/loilo/fuse/src/Bitap/matched_indices.php on line 24
PHP Notice: Undefined offset: -1 in /var/www/search/vendor/loilo/fuse/src/Bitap/matched_indices.php on line 24
PHP Notice: Undefined offset: -1 in /var/www/search/vendor/loilo/fuse/src/Bitap/matched_indices.php on line 24
PHP Notice: Undefined offset: -1 in /var/www/search/vendor/loilo/fuse/src/Bitap/matched_indices.php on line 24
PHP Notice: Undefined offset: -1 in /var/www/search/vendor/loilo/fuse/src/Bitap/matched_indices.php on line 24
PHP Notice: Undefined offset: -1 in /var/www/search/vendor/loilo/fuse/src/Bitap/matched_indices.php on line 24

I was wondering what I need to do to? I have tried putting my array into a parent array but that changes everything to:

/var/www/search# php search.php
PHP Notice: Undefined offset: 0 in /var/www/search/vendor/loilo/fuse/src/Fuse.php on line 97
PHP Notice: Undefined offset: 0 in /var/www/search/vendor/loilo/fuse/src/Fuse.php on line 123

Undefinex index 0 when using 'matchAllTokens' option

"type": "Whoops\\Exception\\ErrorException",
"message": "Undefined offset: 0",
"file": "/var/www/app/vendor/loilo/fuse/src/Fuse.php",
"line": 231,

The line is:
$averageScore = $scores[0];

But when the option matchAllTokens is used, $scores will be empty (in my tests)

I just initialized $averageScore = 0 and then check that index 0 is set in the $scores array. Seems ok?
I can make a pull request

`limit` option does not work

No matter what limit is specified, the results array return all the items. Basically limit option is not taken into effect.

$list = [
    [
        'title' => "Old Man's War",
        'author' => 'John Scalzi',
    ],
    [
        'title' => 'The Lock Artist',
        'author' => 'Steve Hamilton',
    ],
    [
        'title' => 'HTML5',
        'author' => 'Remy Sharp',
    ],
    [
        'title' => 'Right Ho Jeeves',
        'author' => 'P.D Woodhouse',
    ],
];

$options = [
    'keys' => ['author'],
    'limit' => 1,
];

$fuse = new \Fuse\Fuse($list, $options);

echo "<pre>";
print_r($fuse->search('o'));
echo "</pre>";




$list = [
    [
        'title' => "Old Man's War",
        'author' => 'John Scalzi',
    ],
    [
        'title' => 'The Lock Artist',
        'author' => 'Steve Hamilton',
    ],
    [
        'title' => 'HTML5',
        'author' => 'Remy Sharp',
    ],
    [
        'title' => 'Right Ho Jeeves',
        'author' => 'P.D Woodhouse',
    ],
];

$options = [
    'keys' => ['author'],
    'limit' => 2,
];

$fuse = new \Fuse\Fuse($list, $options);

echo "<pre>";
print_r($fuse->search('o'));
echo "</pre>";

Both examples above return the same result:

Array
(
    [0] => Array
        (
            [item] => Array
                (
                    [title] => Old Man's War
                    [author] => John Scalzi
                )

            [refIndex] => 0
        )

    [1] => Array
        (
            [item] => Array
                (
                    [title] => Right Ho Jeeves
                    [author] => P.D Woodhouse
                )

            [refIndex] => 3
        )

    [2] => Array
        (
            [item] => Array
                (
                    [title] => The Lock Artist
                    [author] => Steve Hamilton
                )

            [refIndex] => 1
        )

)

Is this a bug or am I missing something?

Error when using an array with missing keys

When you pass an array with a missing key, something like this

$list = [
0 => ['name' => "Jack"],
2 => ['name' => "John"] 
];

As you can see there is no 1 key

This will cause this error

ErrorException : Undefined offset: 1
 F:\projects\quick-conference\src\quick-conference\vendor\loilo\fuse\src\Fuse.php:131
 F:\projects\quick-conference\src\quick-conference\vendor\loilo\fuse\src\Fuse.php:64

For me the solution is the wrap the $list variable in a array_values() and use the return of the function to the Fuse library.

I think that the library should handle the missing keys, because not all the time it's going to receive arrays with ordered keys.

In my situation i'm passing a list of names from the form submitted by users.
Users can leave empty rows in between which causes the have gaps in the array keys

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.