Comments (5)
Why don't Levenshtein library use Levenshtein for ratio?
from levenshtein.
Why don't Levenshtein library use Levenshtein for ratio?
The original author implemented it like this for some unknown reason and I kept it like this for backwards compatibility. I agree it is pretty confusing, so at least I made sure to mention the indel distance in the documentation.
I think it should be mentioned in the readme too, where people would read first. I mean, I would expect Levenshtein library to use Levenshtein unless specified otherwise. It's not like "Indel" is in the function name.
from levenshtein.
Levenshtein.ratio
is based upon the Indel distance and not on the Levenshtein distance. This is basically the same as the Levenshtein distance without substitutions. You can get the normalized Levenshtein distance using rapidfuzz:
>>> from rapidfuzz.distance import Levenshtein, Indel
>>> Levenshtein.normalized_similarity(a, b)
0.9375
>>> Levenshtein.normalized_similarity(a, b, weights=(1,1,2))
0.967741935483871
>>> Indel.normalized_similarity(a, b)
0.967741935483871
when processing large amounts of data I recommend using one of the processing functions in rapidfuzz.process
. They are a lot faster, since they do not need to switch between Python and C++ for each string comparision.
from levenshtein.
Thanks for the clarification and the lead. I will test this one. Thanks again
from levenshtein.
Why don't Levenshtein library use Levenshtein for ratio?
The original author implemented it like this for some unknown reason and I kept it like this for backwards compatibility. I agree it is pretty confusing, so at least I made sure to mention the indel distance in the documentation.
from levenshtein.
Related Issues (20)
- Mypy complaines with newest release (20.07), code still works HOT 3
- Extension to word-level HOT 1
- score_cutoff argument not seeming to work for ratio HOT 6
- Non-standard function signature for get_requires_for_build_wheel() HOT 2
- Support Java JNI call HOT 1
- Module 'Levenshtein' has no attribute 'distance' HOT 13
- Fails to build debian11/python 3.9: Could NOT find Python (missing: Interpreter Development.Module) HOT 15
- dependency rapidfuzz 3.0 HOT 1
- Compatibility with rapidfuzz-cpp 2.0.0 HOT 2
- Make it a rust crate HOT 6
- License HOT 1
- Levenshtein realisation counts substitution as 2 edits instead of 1 HOT 2
- Compatibility with rapidfuzz-cpp 3 HOT 2
- please provide a source tarball including external dependencies HOT 3
- `Callable` missing type argument, making some methods partially-Unknown HOT 6
- Citation HOT 1
- Damerau–Levenshtein distance HOT 1
- jaro_winkler gives values larger than 1. HOT 2
- Distance finds min cost not min number HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from levenshtein.