Comments (2)
The documentation was not super clear on this. ratio
calculates a similarity based on the indel distance which only allows insertions and deletions (behaves the same as levenshtein with substitution weight 2).
distance
calculates the normal Levenshtein distance, so I understand why this is surprising. This is done for legacy reason, since this has always been the behavior in this library. I would recommend you to use rapidfuzz
instead which is used here internally as well:
>> from rapidfuzz.distance import Levenshtein
>> Levenshtein.normalized_similarity("abcde", "abcd")
0.8
>>> Levenshtein.normalized_similarity("abcde", "abcde 2")
0.7142857142857143
while indel gives you:
>>> from rapidfuzz.distance import Indel
>>> Indel.normalized_similarity("abcde", "abcd")
0.8888888888888888
>>> Indel.normalized_similarity("abcde", "abcde 2")
0.8333333333333334
Note that uniform levenshtein is normalized using max(len1, len2)
, while the indel distance is normalized using len1 + len2
. The reason for this is that it uses the maximum possible score for normalization. For Indel that is removing all characters and then inserting all of the other string, while for Levenshtein it's replacing the common part and removing/inserting the length difference.
from levenshtein.
Thank you for your prompt reply, Max. That's very clear. As deletion counts for 2, it all adds up.
from levenshtein.
Related Issues (20)
- Mypy complaines with newest release (20.07), code still works HOT 3
- Extension to word-level HOT 1
- score_cutoff argument not seeming to work for ratio HOT 6
- Non-standard function signature for get_requires_for_build_wheel() HOT 2
- Mismatch between different implementations of Levenshtein HOT 5
- Support Java JNI call HOT 1
- Module 'Levenshtein' has no attribute 'distance' HOT 13
- Fails to build debian11/python 3.9: Could NOT find Python (missing: Interpreter Development.Module) HOT 15
- dependency rapidfuzz 3.0 HOT 1
- Compatibility with rapidfuzz-cpp 2.0.0 HOT 2
- Make it a rust crate HOT 6
- License HOT 1
- Compatibility with rapidfuzz-cpp 3 HOT 2
- please provide a source tarball including external dependencies HOT 3
- `Callable` missing type argument, making some methods partially-Unknown HOT 6
- Citation HOT 1
- Damerau–Levenshtein distance HOT 1
- jaro_winkler gives values larger than 1. HOT 2
- Distance finds min cost not min number HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from levenshtein.