loilo / fuse Goto Github PK
View Code? Open in Web Editor NEW🔍 Fuzzy search for PHP, ported from Fuse.js
License: Apache License 2.0
🔍 Fuzzy search for PHP, ported from Fuse.js
License: Apache License 2.0
I've found a case whereby the threshold of 0 is not being respected.
I initialise the Fuse object like so
$fuseTest = new Fuse\Fuse([['name' => '8Th Street Clinic']], ['keys' => ['name'], 'threshold' => 0, 'includeScore' => true]);
and search for
$fuseTest->search('Tumaini Medial Clinic (Miriga Mieru West)')
I get back
[
[
"item" => [
"name" => "8Th Street Clinic",
],
"score" => 0.5,
],
]
Which as you can see has a score of 0.5 but is still returned.
The pattern exceed max lenght of 32 is showing for the search query that has more than 32 characters with some unicode characters and in UTF-8 encoding. This most likely happen in BitapSearch::__construct()
during the chunking of characters. Upon further checking, it seems that UTF-8 encoding has problems with PHP's substr and it might be better to use mb_substr, see topic
$query = 'send more money to my pàypàl account';
$fuzzyMatcher = new Fuse(['Sample collection only, please ignore'], []);
$fuzzyMatcher->search($query);
Hi! When will be released a stable version with PHP 8 support?
I tried to use includeScore but it behaves differently than expected: that is, it doesn't return 0 or 1, but a float > 1 for the first match and a float between 0 and 1 for any other match.
Current Results:
Good job, but I would like to use this library in php 5.6 but it is impossible because of the use of the coalesce operator that appears in php 7.
Do you think it is possible to make it compatible like you describe it in your composer.json ?
Thank you.
I am using this project on my local Drupal build. The Fuse array does not show a "Results" index in the array after the search is complete. Can you confirm that the results of the search should be in a "Results" index in the Fuse array?
Can I use this with php 5.3 if I change all array constructs from [] to array()?
Would there be any other 5.3 limitations?
loilo/fuse/src/Bitap/regex_search.php: 7
Example trace
Fuse\Bitap\regex_search("R. Aazyol", "Atelier Richard Woodman Burbridge/Harrods", " +")
vendor/loilo/fuse/src/Bitap/Bitap.php : 63
Hello,
I'm not quite sure if this is the right place to ask this, but I have a question about the performance of the search.
I currently have an array of ~81k indexes, with 3 keys each, only searching by one ("Name"). It currently takes ~3.8s to return 0 results (if I search something that doesn't match, and ~4.9s if I search something that does), with these settings:
$fuse = new \Fuse\Fuse($locationData, [ "keys" => ["Name"] , "threshold" => 0.2 , "distance" => 10 , "includeScore" => true , "includeMatches" => true ]);
My data (imported as JSON) looks like this:
Array ( [0] => Array ( [Name] => ACCOMACK [ListingCount] => 191 [Type] => county ) )
It looks like that all the time simply just comes from the size of the data, as with logging the times, I constantly got ~4s to reach the 9th match in a search. It also looks like that 'Fuse->analyze()' is run twice, as with a simple increment, the 9th match returns that it was the 160663 run of the function. Although, bypassing that (is_string was false, and is_list reactivated analyze as a string) didn't seem to improve the speed at all.
Here's an example of the timings it took to search (only logging when a match was found): http://prntscr.com/n2v14q
Is there a better format I can use for the data, or is this just a limitation of having 81k+ items to search through?
I'm not sure what other information you need, but let me know if I can provide anything else. I don't have any public page set up with this to test it against at the moment, but if you need, I can probably get that set up.
The data that I'm searching through has 5 distinct parts, so I tried also searching through each separately. When logging the time specifically, increased the total time by ~0.3s. When not logging, specifically, reduced the total time by ~0.8s (while also returning 40 more results)
PHP 7.2
Running on a DreamHost server (not sure of specs)
Thanks,
Tony.
Dear,
The library is great and the precision is very good. However, i have performances issues.
I'm running the tool on a list of 15,000 entries (2-3 words per entry, 1 field only) that i want to fuzzy against ifself.
Here is my configuration
$thresholdFuzzy = 0.6; // lower down that value to be more strict. 0 = exact match
$FuzzyParameters = [
"keys" => [ "keywordName"],
"includeScore" => true,
"maxPatternLength" => 32,
"threshold" => $thresholdFuzzy,
"minMatchCharLength" => 1
];
$fuse = new \Fuse\Fuse($dataToFuzzyNoExactMatch, $FuzzyParameters);
$results = $fuse->search($kw);
I executed a loop of fuzzy on (the first) 100 entries compared to 15,000.
It took 500 seconds to compute.
Any idea how i can improve the speed please?
I have been trying to get a result from a search of my data set but I have yet to be able to get a result. I plugged my data set and options into the Fuse.js demo page and was able to get the expected results.
My data(small portion of it):
array(169) {
[0]=>
array(4) {
["name"]=>
string(17) "Grace Watson Hall"
["description"]=>
string(7) "GWH/025"
["mdo_id"]=>
string(1) "3"
["coordinates"]=>
array(2) {
[0]=>
float(-77.66901)
[1]=>
float(43.083652)
}
}
[1]=>
array(4) {
["name"]=>
string(15) "Wallace Library"
["description"]=>
string(7) "WAL/005"
["mdo_id"]=>
string(1) "5"
["coordinates"]=>
array(2) {
[0]=>
float(-77.676315)
[1]=>
float(43.083927)
}
}
initialization and search
$options = ["keys" => ["name","description","mdo_id"]];
$fuse = new \Fuse\Fuse($data,$options);
$result = $fuse->search("Grace Watson Hall");
//result returns nothing and I get a 500 error
-- // Guard against "Undefined offset: 0" Error when $scoresLen == 0
-- // Guard against "Division By Zero" Error when $scoresLen == 0
$scoresLen = sizeof($scores);
// Guide against "Undefined offset: 0" Error when $scoresLen == 0
if($scoresLen > 0)
$averageScore = $scores[0];
for($i = 1; $i < $scoresLen; $i++) {
$averageScore += $scores[$i];
}
// Guide against "Division By Zero" Error when $scoresLen == 0
if($scoresLen > 0)
$averageScore /= $scoresLen;
Thanks for the PHP port. Its very useful for me. Please don't forget to Update us when Fuse.js has major changes. Regards
Hi
I am using your project that it is very useful but i noticed a problem that needs to be fixed.
In Fuse.php file at line 226 your code is
$averageScore = $averageScore / $scoresLen;
I noticed that sometimes the $scoresLen in my project is 0 so the the action of division prints out an error
I suggest adding the following in order to avoid this error
if($scoresLen !=0){
$averageScore = $averageScore / $scoresLen;
}
Best Regards
Hi,
I have been playing around with Fuse for a few days now and I can't get weighted search to work. I have a needle array that looks like this:
Array
(
[1] => Array
(
[company] => Array
(
[name] => STARTUPLIFERS IN SWEDEN
[registrationid] => 55XXXX-XXXX
[id] => 1
)
),
[company] => Array
(
[name] => Google
[registrationid] => 55XXXX-XXXX
[id] => 1
),
[tags] => Array
(
[name] => Startup
)
)
......
[7992] => Array
(
[stopwords] => Array
(
[name] => Startup
)
)
...
So essentially an array with companies and I have added "stop words" in the array.
I have created my key array as follows:
'keys' => [
[
'name' => 'stopword.name',
'weight' => 1
],
[
'name' => 'company.name',
'weight' => 0.5
],
[
'name' => 'tags.name',
'weight' => 1
]
]
But when I search for the term "Startup" it does not matter how I weight my keys, the company STARTUPLIFERS is always the one with the lowest score. followed by my stop word that is a 100% match. Im using the other options for my query:
"tokenize" => true,
"matchAllTokens" => true,
"caseSensitive" => false,
"includeScore" => true,
"shouldSort" => true
It looks like this is a limitation perhaps in Fuse.js (Will try to replicate it in JS) as there is no way to force a "word boundary" search that would have helped solving this specific problem.
Hello,
We have a very large array with around 10 thousand records, each having following fields,
array(
array('name' => 'name1', 'email' => '[email protected]'),
array('name' => 'second name', 'email' => '[email protected]')
)
When searching this user's array for keys 'name' or 'email' with following settings Fuse is really slow it takes around 30, 40 seconds to find all records, I have following settings,
// initialize fuse
$settings['keys'] = array("name", "email");
$settings['distance'] = 50;
$settings['maxPatternLength'] = 16;
$settings['threshold'] = 0.6;
$fuse = new Fuse($usersArray, $settings);
// search user
$usersFound = $fuse->search("my name");
Is there a way or any setting which will help me make Fuse work faster? I tried same with JS version of Fuse and it was not that much slower took about few seconds to search for 10 thousand records
So right now I have a requirement about Fuzzy search like this. However, I'm really confused about the Settings of blow list.
threshold
: (type: float, default: 0.6)distance
(type: int, default: 100)This is our pattern for searching:
To simplify our scope, if we are searching the phrase "A B C":
1. All permutations of “A B C” allowed:
1.1 3 words in any order, e.g. A C B, B C A
1.2 3 words in any order +any other word(s), e.g. A C B+XY, B C A+Z
1.3 Any of the 2 words, in any order, e.g. A B, C B, C A
1.4 Any of the 2 words, in any order + any other word(s), e.g. A B+XY, CB+XZ, CA+Z
2. Permutations of “A B C” not allowed:
2.1 Single word only, e.g. Just A or B, or C
2.2 One word + any other word(s), e.g. A+XY, B+YZ, C+Z
Thanks for your time from you guys.
I have matches that come through with a score of 2.2+ and then other concrete matches that show a score of 0.01 - I have some code that looks for anything over 2 and anything under 0.1, but are you aware of this?
I have an array that outputs like this:
[1106] => Array
(
[id] => 1106
[url] => 3-brides-for-3-bad-boys-160488
[description] => 0
[title] => 3 Brides for 3 Bad Boys
[author] => 3 Brides for 3 Bad Boys (mf)
[author_lname] => Boys
[language] => en
)
[1107] => Array
(
[id] => 1107
[url] => 3-brisingr-3-12630502066281
[description] => 0
[title] => 3-Brisingr-3
[author] => Unknown
[author_lname] =>
[language] => eng
)
[1108] => Array
(
[id] => 1108
[url] => 3-claus-of-death-120101
[description] => 0
[title] => 3 Claus of Death
[author] => Gayle Trent
[author_lname] => Trent
[language] => en
)
But that results in:
PHP Notice: Undefined offset: -1 in /var/www/search/vendor/loilo/fuse/src/Bitap/matched_indices.php on line 24
PHP Notice: Undefined offset: -1 in /var/www/search/vendor/loilo/fuse/src/Bitap/matched_indices.php on line 24
PHP Notice: Undefined offset: -1 in /var/www/search/vendor/loilo/fuse/src/Bitap/matched_indices.php on line 24
PHP Notice: Undefined offset: -1 in /var/www/search/vendor/loilo/fuse/src/Bitap/matched_indices.php on line 24
PHP Notice: Undefined offset: -1 in /var/www/search/vendor/loilo/fuse/src/Bitap/matched_indices.php on line 24
PHP Notice: Undefined offset: -1 in /var/www/search/vendor/loilo/fuse/src/Bitap/matched_indices.php on line 24
PHP Notice: Undefined offset: -1 in /var/www/search/vendor/loilo/fuse/src/Bitap/matched_indices.php on line 24
PHP Notice: Undefined offset: -1 in /var/www/search/vendor/loilo/fuse/src/Bitap/matched_indices.php on line 24
PHP Notice: Undefined offset: -1 in /var/www/search/vendor/loilo/fuse/src/Bitap/matched_indices.php on line 24
PHP Notice: Undefined offset: -1 in /var/www/search/vendor/loilo/fuse/src/Bitap/matched_indices.php on line 24
I was wondering what I need to do to? I have tried putting my array into a parent array but that changes everything to:
/var/www/search# php search.php
PHP Notice: Undefined offset: 0 in /var/www/search/vendor/loilo/fuse/src/Fuse.php on line 97
PHP Notice: Undefined offset: 0 in /var/www/search/vendor/loilo/fuse/src/Fuse.php on line 123
"type": "Whoops\\Exception\\ErrorException",
"message": "Undefined offset: 0",
"file": "/var/www/app/vendor/loilo/fuse/src/Fuse.php",
"line": 231,
The line is:
$averageScore = $scores[0];
But when the option matchAllTokens is used, $scores will be empty (in my tests)
I just initialized $averageScore = 0 and then check that index 0 is set in the $scores array. Seems ok?
I can make a pull request
No matter what limit is specified, the results array return all the items. Basically limit
option is not taken into effect.
$list = [
[
'title' => "Old Man's War",
'author' => 'John Scalzi',
],
[
'title' => 'The Lock Artist',
'author' => 'Steve Hamilton',
],
[
'title' => 'HTML5',
'author' => 'Remy Sharp',
],
[
'title' => 'Right Ho Jeeves',
'author' => 'P.D Woodhouse',
],
];
$options = [
'keys' => ['author'],
'limit' => 1,
];
$fuse = new \Fuse\Fuse($list, $options);
echo "<pre>";
print_r($fuse->search('o'));
echo "</pre>";
$list = [
[
'title' => "Old Man's War",
'author' => 'John Scalzi',
],
[
'title' => 'The Lock Artist',
'author' => 'Steve Hamilton',
],
[
'title' => 'HTML5',
'author' => 'Remy Sharp',
],
[
'title' => 'Right Ho Jeeves',
'author' => 'P.D Woodhouse',
],
];
$options = [
'keys' => ['author'],
'limit' => 2,
];
$fuse = new \Fuse\Fuse($list, $options);
echo "<pre>";
print_r($fuse->search('o'));
echo "</pre>";
Both examples above return the same result:
Array
(
[0] => Array
(
[item] => Array
(
[title] => Old Man's War
[author] => John Scalzi
)
[refIndex] => 0
)
[1] => Array
(
[item] => Array
(
[title] => Right Ho Jeeves
[author] => P.D Woodhouse
)
[refIndex] => 3
)
[2] => Array
(
[item] => Array
(
[title] => The Lock Artist
[author] => Steve Hamilton
)
[refIndex] => 1
)
)
Is this a bug or am I missing something?
Any change you might be updating this awesome project to the latest version of Fuse.js?
ErrorException: Array to string conversion in file /var/www/html/vendor/loilo/fuse/src/Fuse.php on line 124
When you pass an array with a missing key, something like this
$list = [
0 => ['name' => "Jack"],
2 => ['name' => "John"]
];
As you can see there is no 1
key
This will cause this error
ErrorException : Undefined offset: 1
F:\projects\quick-conference\src\quick-conference\vendor\loilo\fuse\src\Fuse.php:131
F:\projects\quick-conference\src\quick-conference\vendor\loilo\fuse\src\Fuse.php:64
For me the solution is the wrap the $list
variable in a array_values()
and use the return of the function to the Fuse library.
I think that the library should handle the missing keys, because not all the time it's going to receive arrays with ordered keys.
In my situation i'm passing a list of names from the form submitted by users.
Users can leave empty rows in between which causes the have gaps in the array keys
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.