Comments (10)
The timing of your question couldn't be better.
Today @paulirwin mentioned in a separate issue #460 the following:
Unfortunately, while IKVM has been a reasonable go-to way to quickly support Java-based apps in the past, it has been abandoned by its main contributor in 2017 and has no .NET Core/NET Standard support.
FYI for everyone that ikvm-revived is now IKVM proper, and has .NET 6+ support as of v8.7. https://github.com/ikvmnet/ikvm/releases/tag/8.7.0
(You are probably already aware that the LuceneNET OpenNLP integration requires IKVM because OpenNLP is a seperate Apache project that is only available as Java. )
from lucenenet.
Also in a POC console app, I was able to use IKVM 8.7 and the <IkvmReference>
csproj support to load the opennlp 1.9.4 jar directly without OpenNLP.NET (on an arm64 macOS machine, no less). In case that helps anyone.
from lucenenet.
BTW - I was just looking at the code example you posted and noticed a big issue. You won't be able to use StandardAnalyzer
because it uses the StandardTokenizer
to tokenize the text, which pulls out all of the punctuation. For OpenNLP tokenization you must use the OpenNLPTokenizer
to set up your token stream.
from lucenenet.
I haven't used that specific feature of Lucene.NET, but when I bump into similar issues with other features one thing I find helpful is to look at the unit tests for the feature. Often, that helps me understand the coding pattern for using the feature. It may be worth a look.
from lucenenet.
The OpenNLP project is not ported from Java, instead it uses IKVM to convert the bytecode directly into IL. So, there is nothing we add over and above the Java implementation of OpenNLP 1.9.1.
The documentation for OpenNLP 1.9.1 is on the OpenNLP website.
As far as the Lucene TokenStream APIs are concerned, the documentation (which is a bit scant) is in the Getting Started section. I concur with @rclabo that the best place to see examples are in the tests.
For such a task, I would also recommend using the test framework package to verify your analyzer implementation is compatible with Lucene.NET. And of course to look at other tests to work out how to use it because the docs are not great.
from lucenenet.
Also, if you get it figured out, we'd greatly appreciate you posting some example code here or submitting a few additional unit tests that demonstrate the use more clearly. That will help the next person. Lucene and LuceneNET are really quite amazing but it sometimes takes some research to figure out how to use various aspects. Contributing back some of that knowhow would be greatly appreciated.
from lucenenet.
I used @rclabo 's recommendation and tried to utilize the unit tests. However I ran into an exception I assume is because I'm not using .Net4.8:
System.TypeLoadException: Could not load type 'System.Reflection.Emit.MethodToken' from assembly 'mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089'
Does the OpenNLP integration not support .Net6.0?
@rclabo Lucene.Net is indeed an amazing library and I'd love to contribute to it once I figure it out!
from lucenenet.
Also in a POC console app, I was able to use IKVM 8.7 and the
<IkvmReference>
csproj support to load the opennlp 1.9.4 jar directly without OpenNLP.NET (on an arm64 macOS machine, no less). In case that helps anyone.
Yeah, unfortunately, all of the latest versions that have ARM 64 support (and macOS x64) stopped working on .NET Framework. So, there isn't yet a version of IKVM that works everywhere. Support for mac was added in 8.7.0. It also appears that 8.7.0 and higher will get a type initialization exception if you don't explicitly create an object before using any of the converted libraries.
<MavenReference>
is really how we should be handling this so people who depend on Lucene.Net.Analysis.OpenNLP
can combine it with other Maven packages and it will resolve the IKVM compiled dependency versions so there are no conflicts.
from lucenenet.
I have updated the OpenNLP docs in master, but the API is now so advanced from where we were in beta 16 that the docs wouldn't be correct if we updated them.
Your complaint was warranted, though. There were docs in Lucene 8.2.0 that we were missing and I have added them here:
https://github.com/apache/lucenenet/blob/43e0e894b9b40f5b28064a13ef98874a82c15330/src/Lucene.Net.Analysis.OpenNLP/overview.md
I also created a demo project showing how to build a filter for NER using OpenNLP and Sentiment using Stanford CoreNLP: https://github.com/NightOwl888/lucenenet-opennlp-mavenreference-demo
For integration with OpenNLP on .NET Core until the official release, you can use the feed here: https://dev.azure.com/lucene-net-temp2/Lucene.NET/_artifacts/feed/lucene-net-temp-2.
from lucenenet.
This is great! Thank you @NightOwl888
from lucenenet.
Related Issues (20)
- Task: Finish [SuppressTempFileChecks] attribute functionality
- Failure when parsing phrases HOT 3
- Alternative for SetNextReader to return all strings HOT 1
- Docs: DocFx Build Failure for API Docs HOT 4
- Lucene.Net: 4.8 SetNextReader executes repeatedly and returns only one result HOT 1
- Replace Lucene.Net.Support.Arrays.Empty<T> with System.Array.Empty<T>
- Audit use of AtomicInt32 and AtomicInt64 methods
- Improve ICollector usage
- Simplify IndexReader constructor
- Meta: Add Support unit tests HOT 1
- Review formatting of boolean strings (in ToString() methods and similar)
- Add cancellation support to IndexSearcher
- Fix test name reporting when test is in a base class
- Create Roslyn code analyzer to streamline review of proper usage of format/parse methods for numeric types
- Target .NET 8 HOT 16
- .Net 6 and 8 slower than .Net 472 HOT 7
- Remove unnecessary`[MethodImpl(MethodImplOptions.NoInlining)]`
- Fix calls to Exception.StackTrace
- Performance decrease 30x when running on .NET 8 HOT 37
- Set license expression on nuget HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from lucenenet.