Comments (11)
It is not possible to use python-Levenshtein in a MIT Licensed library, which is the reason FuzzyWuzzy has the GPL license.
The MIT licensed version only uses difflib (slow), but is still a bit questionable, since it is based on a FuzzyWuzzy version, which was already GPL licensed (I do not think Seatgeek cares, since they originally released FuzzyWuzzy under the MIT licensed).
I wrote a faster alternative implementation for C++/Python (https://github.com/maxbachmann/RapidFuzz). This implementation could be ported to Java. However since I am not very familiar with Java, this would require someone else to maintain the implementation (I am willing to help with questions regarding the algorithms)
from fuzzywuzzy.
@maxbachmann One could also look into directly using the C code with a Java Native Interface without porting the code, i don't have a lot of experience what that means performance wise though
from fuzzywuzzy.
@Chase22 I do something similar already for Python. When using small strings with a fast similarity metric there is a relevant performance impact. However the main reason for this is that Python calls functions with a list of arguments and a hashmap of named arguments, which has to be parsed on each call.
I could think of the following advantages/disadvantages of the JNI:
+ probably less maintenance since it reuses a big part of the code. Note however that in Python the Wrapper to call the C++ code from Python is actually much bigger than the code (partially because much of the code is generated). The C++ library has around 5k lines of code, while the wrapper has over 50k lines of code.
- I guess it would have to be compiled for each platform, which can be a pain
-/+ performance wise I am unsure as well. The JNI might add relevant overhead (e.g. in case all strings have to be copied). However the algorithms make heavy use of bitwise operations, which might be slower in pure Java. So this might go either way.
from fuzzywuzzy.
GPL applies to the original Python code ONLY, but if you re-write this in Java it is no longer considered using the original code, but completely different code base? GPL does not apply to the algorithm, only the use of the original code "as is".
from fuzzywuzzy.
GPL applies to any derivative work. https://github.com/xdrop/fuzzywuzzy/blob/e8376dfdc1c0cb72f7924f3a347bfcd39855dbeb/diffutils/src/me/xdrop/diffutils/DiffUtils.java is pretty much a 1:1 copy of a GPL licensed implementation. It doesn't hide this either:
This is a port of all the functions needed from python-levenshtein C implementation.
The code was ported line by line but unfortunately it was mostly undocumented,
so it is mostly non readable (eg. var names)
I am pretty confident this counts as derivative work.
from fuzzywuzzy.
Try this in your favourite AI tool:
"If I rewrite some original Python code, licensed under GPL, into a different language (Java), does the original GPL still apply to the new codebase?"
Here is what Google's Gemini said:
"No, rewriting the code in a different language typically results in a new copyrighted work, and the GPL license wouldn't apply to the new Java codebase in itself. However, the GPL license might still affect how you can distribute your Java code if the original Python code interacts with GPL-licensed libraries or functions.
In essence, the GPL license applies to the specific source code you received under that license. If you rewrite the code from scratch in Java, you're creating a new work. But, if you use any substantial portions of the original Python code or its functionality, the GPL license terms might still influence how you distribute your Java code.
Here's a recommendation: To ensure you're compliant with the GPL license, it's always a good idea to consult with a lawyer specializing in open-source licensing.
"
from fuzzywuzzy.
- if you use AI you should at the very least write correct prompts. This is NOT a rewrite, but a line for line translation of the library
- follow the guidance oft the AI to consult a lawyer
I certainly wouldn't change the license of a mechanical translation of source code without consulting a lawyer. I mean you can do whatever you want to do in your own risk 🤷♂️
from fuzzywuzzy.
You can include "line-by-line rewrite" in the prompt, same result. But I understand your reasons for GPL license, unfortunately. Thank you for the quick answers!
from fuzzywuzzy.
@maxbachmann Btw, the authors of the original GPL Python library, fuzzywuzzy perhaps realized their mistake with GPL, and transitioned the code to a new MIT-licensed repository - thefuzz. This is all under the same company - SeatGeek.
So, you can re-align your code with the new repository (even pick up some recent fixes, as thefuzz seems frequently maintained) and re-publish your nice Java library under a MIT license too?
from fuzzywuzzy.
I am not the author of this library. I am the author of rapidfuzz
, which includes an MIT licensed complete rewrite of fuzzywuzzy
. This is the implementation that's now used inside thefuzz
as well. This is what I suggested for porting to Java at the start of this issue as well.
There have been multiple people who wanted to work on this already, but they all disappeared before implementing anything 🤷♂️
from fuzzywuzzy.
has @xdrop abandoned this repo?
from fuzzywuzzy.
Related Issues (20)
- Could not find library in gradle HOT 2
- Difference in PartialScore between Java and Python Implementations
- module me.xdrop.fuzzywuzzy cannot be resolved to a module
- Is there somewhere I can find out what the different methods do? HOT 2
- Wrong score in Partial Ratio
- v1.3.0 I can't find the .pom file HOT 10
- Difference in extractOne results compared to Python version HOT 2
- Inconsistent results from extractOne and extractTop
- Is there a security scanning performed on this project?
- PartialRatio issue HOT 4
- Convert codes to apex class HOT 4
- Bug in search
- NoClassDefFoundError
- Using custom object Instead of String would lead to performance issue? HOT 2
- How to set the scorer like the python fuzzywuzzy? HOT 6
- Can we priortize results to push first appears over top HOT 1
- Incompatibility with the Python version in handling underscores HOT 3
- Still Incompatibility with the Python Version HOT 4
- Difference between java and python implementation: Spoiler, the problem is the round HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from fuzzywuzzy.