Comments (6)
This would require at least partially porting rapidfuzz
to rust. I was working on this a couple of years ago in https://github.com/maxbachmann/rapidfuzz-rs, but this would basically need to be completely rewritten.
The algorithms themselves should be fairly easy to port. However I am completely unfamiliar with rust and the C++ implementation uses a couple of more advanced C++ features:
- conditional compilation using
if constexpr
to reduce code duplication (Worst case this could be duplicated in rust) - should work for any numeric iterables not just strings. In C++ this is defined as:
template <typename InputIt1, typename InputIt2>
double ratio(InputIt1 first1, InputIt1 last1, InputIt2 first2, InputIt2 last2, double score_cutoff = 0);
not sure how this would need to be implemented with rust generics to work both for strings and any other iterables of numeric items
This topic already came up a couple of months ago here: rapidfuzz/RapidFuzz#331, but I never heard back from @bcvanmeurs again. While I am absolutely willing to work on the algorithms themselves, I think it would be good to have someone with more rust knowledge help with the API design.
from levenshtein.
I had a look at this this evening. I will probably work on this, but it will certainly take a while to port.
from levenshtein.
It’s fantastic news. I am extremely grateful for your willingness to take on this task. As a beginner, I feel limited in my abilities to contribute, but I am truly appreciative of your efforts. If there’s any way I can assist or support the project, please let me know. Thank you once again for your dedication and expertise in this endeavor!
from levenshtein.
I assume you are working with relatively large strings.
I started working on the rust rewrite in https://github.com/maxbachmann/rapidfuzz-rs/tree/rewrite. So far I have the normal Levenshtein distance pretty much fully implemented, but there are still quite a few features missing. I will probably implement a decent chunk of them, but make a release on cargo before implementing all features.
Performance of the Levenshtein implementation looks to be on par with the C++ implementation (generates largely the same assembler when compiled with llvm). Here is an initial comparision with strsim which appears to be a widely used library for string similarities in rust:
from levenshtein.
Perfect! This is a huge progress. Looking forward to the release on cargo!
from levenshtein.
I published a first version. It mostly includes the basic metrics. Still missing are:
- most of the fuzz module
- simd implementations for
Many x Many
comparisons of short strings (the python implementation uses this in process.cdist under the hood) - editops / opcodes to find edit operations needed for a conversion
You can find it here: https://crates.io/crates/rapidfuzz. Let me know if you run into any issues or have suggestions for improvements for the API.
from levenshtein.
Related Issues (20)
- Typing stubs are not distributed with the package HOT 1
- Mypy complaines with newest release (20.07), code still works HOT 3
- Extension to word-level HOT 1
- score_cutoff argument not seeming to work for ratio HOT 6
- Non-standard function signature for get_requires_for_build_wheel() HOT 2
- Mismatch between different implementations of Levenshtein HOT 5
- Support Java JNI call HOT 1
- Module 'Levenshtein' has no attribute 'distance' HOT 13
- Fails to build debian11/python 3.9: Could NOT find Python (missing: Interpreter Development.Module) HOT 15
- dependency rapidfuzz 3.0 HOT 1
- Compatibility with rapidfuzz-cpp 2.0.0 HOT 2
- License HOT 1
- Levenshtein realisation counts substitution as 2 edits instead of 1 HOT 2
- Compatibility with rapidfuzz-cpp 3 HOT 2
- please provide a source tarball including external dependencies HOT 3
- `Callable` missing type argument, making some methods partially-Unknown HOT 6
- Citation HOT 1
- Damerau–Levenshtein distance HOT 1
- jaro_winkler gives values larger than 1. HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from levenshtein.