GithubHelp home page GithubHelp logo

Comments (10)

rclabo avatar rclabo commented on June 2, 2024 2

The timing of your question couldn't be better.

Today @paulirwin mentioned in a separate issue #460 the following:

Unfortunately, while IKVM has been a reasonable go-to way to quickly support Java-based apps in the past, it has been abandoned by its main contributor in 2017 and has no .NET Core/NET Standard support.

FYI for everyone that ikvm-revived is now IKVM proper, and has .NET 6+ support as of v8.7. https://github.com/ikvmnet/ikvm/releases/tag/8.7.0

(You are probably already aware that the LuceneNET OpenNLP integration requires IKVM because OpenNLP is a seperate Apache project that is only available as Java. )

from lucenenet.

paulirwin avatar paulirwin commented on June 2, 2024 1

Also in a POC console app, I was able to use IKVM 8.7 and the <IkvmReference> csproj support to load the opennlp 1.9.4 jar directly without OpenNLP.NET (on an arm64 macOS machine, no less). In case that helps anyone.

from lucenenet.

NightOwl888 avatar NightOwl888 commented on June 2, 2024 1

BTW - I was just looking at the code example you posted and noticed a big issue. You won't be able to use StandardAnalyzer because it uses the StandardTokenizer to tokenize the text, which pulls out all of the punctuation. For OpenNLP tokenization you must use the OpenNLPTokenizer to set up your token stream.

from lucenenet.

rclabo avatar rclabo commented on June 2, 2024

I haven't used that specific feature of Lucene.NET, but when I bump into similar issues with other features one thing I find helpful is to look at the unit tests for the feature. Often, that helps me understand the coding pattern for using the feature. It may be worth a look.

from lucenenet.

NightOwl888 avatar NightOwl888 commented on June 2, 2024

The OpenNLP project is not ported from Java, instead it uses IKVM to convert the bytecode directly into IL. So, there is nothing we add over and above the Java implementation of OpenNLP 1.9.1.

The documentation for OpenNLP 1.9.1 is on the OpenNLP website.

As far as the Lucene TokenStream APIs are concerned, the documentation (which is a bit scant) is in the Getting Started section. I concur with @rclabo that the best place to see examples are in the tests.

For such a task, I would also recommend using the test framework package to verify your analyzer implementation is compatible with Lucene.NET. And of course to look at other tests to work out how to use it because the docs are not great.

from lucenenet.

rclabo avatar rclabo commented on June 2, 2024

Also, if you get it figured out, we'd greatly appreciate you posting some example code here or submitting a few additional unit tests that demonstrate the use more clearly. That will help the next person. Lucene and LuceneNET are really quite amazing but it sometimes takes some research to figure out how to use various aspects. Contributing back some of that knowhow would be greatly appreciated.

from lucenenet.

GAInTheHouse avatar GAInTheHouse commented on June 2, 2024

I used @rclabo 's recommendation and tried to utilize the unit tests. However I ran into an exception I assume is because I'm not using .Net4.8:
System.TypeLoadException: Could not load type 'System.Reflection.Emit.MethodToken' from assembly 'mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089'

Does the OpenNLP integration not support .Net6.0?

@rclabo Lucene.Net is indeed an amazing library and I'd love to contribute to it once I figure it out!

from lucenenet.

NightOwl888 avatar NightOwl888 commented on June 2, 2024

Also in a POC console app, I was able to use IKVM 8.7 and the <IkvmReference> csproj support to load the opennlp 1.9.4 jar directly without OpenNLP.NET (on an arm64 macOS machine, no less). In case that helps anyone.

Yeah, unfortunately, all of the latest versions that have ARM 64 support (and macOS x64) stopped working on .NET Framework. So, there isn't yet a version of IKVM that works everywhere. Support for mac was added in 8.7.0. It also appears that 8.7.0 and higher will get a type initialization exception if you don't explicitly create an object before using any of the converted libraries.

<MavenReference> is really how we should be handling this so people who depend on Lucene.Net.Analysis.OpenNLP can combine it with other Maven packages and it will resolve the IKVM compiled dependency versions so there are no conflicts.

from lucenenet.

NightOwl888 avatar NightOwl888 commented on June 2, 2024

I have updated the OpenNLP docs in master, but the API is now so advanced from where we were in beta 16 that the docs wouldn't be correct if we updated them.

Your complaint was warranted, though. There were docs in Lucene 8.2.0 that we were missing and I have added them here:
https://github.com/apache/lucenenet/blob/43e0e894b9b40f5b28064a13ef98874a82c15330/src/Lucene.Net.Analysis.OpenNLP/overview.md

I also created a demo project showing how to build a filter for NER using OpenNLP and Sentiment using Stanford CoreNLP: https://github.com/NightOwl888/lucenenet-opennlp-mavenreference-demo

For integration with OpenNLP on .NET Core until the official release, you can use the feed here: https://dev.azure.com/lucene-net-temp2/Lucene.NET/_artifacts/feed/lucene-net-temp-2.

from lucenenet.

GAInTheHouse avatar GAInTheHouse commented on June 2, 2024

This is great! Thank you @NightOwl888

from lucenenet.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.