GithubHelp home page GithubHelp logo

Make it a rust crate about levenshtein HOT 6 CLOSED

rapidfuzz avatar rapidfuzz commented on May 27, 2024
Make it a rust crate

from levenshtein.

Comments (6)

maxbachmann avatar maxbachmann commented on May 27, 2024

This would require at least partially porting rapidfuzz to rust. I was working on this a couple of years ago in https://github.com/maxbachmann/rapidfuzz-rs, but this would basically need to be completely rewritten.

The algorithms themselves should be fairly easy to port. However I am completely unfamiliar with rust and the C++ implementation uses a couple of more advanced C++ features:

  • conditional compilation using if constexpr to reduce code duplication (Worst case this could be duplicated in rust)
  • should work for any numeric iterables not just strings. In C++ this is defined as:
template <typename InputIt1, typename InputIt2>
double ratio(InputIt1 first1, InputIt1 last1, InputIt2 first2, InputIt2 last2, double score_cutoff = 0);

not sure how this would need to be implemented with rust generics to work both for strings and any other iterables of numeric items

This topic already came up a couple of months ago here: rapidfuzz/RapidFuzz#331, but I never heard back from @bcvanmeurs again. While I am absolutely willing to work on the algorithms themselves, I think it would be good to have someone with more rust knowledge help with the API design.

from levenshtein.

maxbachmann avatar maxbachmann commented on May 27, 2024

I had a look at this this evening. I will probably work on this, but it will certainly take a while to port.

from levenshtein.

G2GreenTea avatar G2GreenTea commented on May 27, 2024

It’s fantastic news. I am extremely grateful for your willingness to take on this task. As a beginner, I feel limited in my abilities to contribute, but I am truly appreciative of your efforts. If there’s any way I can assist or support the project, please let me know. Thank you once again for your dedication and expertise in this endeavor!

from levenshtein.

maxbachmann avatar maxbachmann commented on May 27, 2024

I assume you are working with relatively large strings.

I started working on the rust rewrite in https://github.com/maxbachmann/rapidfuzz-rs/tree/rewrite. So far I have the normal Levenshtein distance pretty much fully implemented, but there are still quite a few features missing. I will probably implement a decent chunk of them, but make a release on cargo before implementing all features.

Performance of the Levenshtein implementation looks to be on par with the C++ implementation (generates largely the same assembler when compiled with llvm). Here is an initial comparision with strsim which appears to be a widely used library for string similarities in rust:

lines2

from levenshtein.

G2GreenTea avatar G2GreenTea commented on May 27, 2024

Perfect! This is a huge progress. Looking forward to the release on cargo!

from levenshtein.

maxbachmann avatar maxbachmann commented on May 27, 2024

I published a first version. It mostly includes the basic metrics. Still missing are:

  • most of the fuzz module
  • simd implementations for Many x Many comparisons of short strings (the python implementation uses this in process.cdist under the hood)
  • editops / opcodes to find edit operations needed for a conversion

You can find it here: https://crates.io/crates/rapidfuzz. Let me know if you run into any issues or have suggestions for improvements for the API.

from levenshtein.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.