GithubHelp home page GithubHelp logo

allegro / elasticsearch-analysis-morfologik Goto Github PK

View Code? Open in Web Editor NEW
81.0 8.0 24.0 2.53 MB

Morfologik Polish Lemmatizer plugin for Elasticsearch

License: Apache License 2.0

Java 60.44% Groovy 39.56%
elasticsearch lemmatizer morfologik morfologik-plugin

elasticsearch-analysis-morfologik's Introduction

Morfologik Polish Lemmatizer plugin for Elasticsearch

Maven Central

Morfologik plugin for elasticsearch 8.x, 7.x, 6.x, 5.x and 2.x. It's lucene-analyzers-morfologik wrapper for elasticsearch.

Plugin provide "morfologik" analyzer and "morfologik_stem" token filter.

Originally created by https://github.com/monterail/elasticsearch-analysis-morfologik

Install

bin/elasticsearch-plugin install pl.allegro.tech.elasticsearch.plugin:elasticsearch-analysis-morfologik:8.13.2

tip: select proper plugin version, should be the same as elasticsearch version

Changelog:

version 7.6.0

  • add support for custom morfologik dictionary eg. {"type": "morfologik_stem", "dictionary": "polish-wo-brev.dict"}

Examples

"morfologik" analyzer

Request:

GET _analyze
{
  "analyzer": "morfologik",
  "text": "jestem"
}

Response:

{
  "tokens": [
    {
      "token": "być",
      "start_offset": 0,
      "end_offset": 6,
      "type": "<ALPHANUM>",
      "position": 0
    }
  ]
}

"morfologik_stem" token filter

Request:

GET _analyze
{
  "tokenizer": "standard",
  "filter": ["morfologik_stem"],
  "text": "jestem"
}

Response:

{
  "tokens": [
    {
      "token": "być",
      "start_offset": 0,
      "end_offset": 6,
      "type": "<ALPHANUM>",
      "position": 0
    }
  ]
}

Supported elasticsearch versions:

All ready to install plugins are deployed to maven central.

Elasticsearch 8.x

  • 8.13.x (8.13.1, 8.13.2)
  • 8.12.x (8.12.0, 8.12.1, 8.12.2)
  • 8.11.x (8.11.0, 8.11.1, 8.11.2, 8.11.3, 8.11.4)
  • 8.10.x (8.10.4)
  • 8.9.x (8.9.0)
  • 8.8.x (8.8.0, 8.8.1, 8.8.2)
  • 8.7.x (8.7.0, 8.7.1)
  • 8.6.x (8.6.0, 8.6.1, 8.6.2)
  • 8.5.x (8.5.0, 8.5.1, 8.5.2, 8.5.3)
  • 8.4.x (8.4.1, 8.4.2, 8.4.3)
  • 8.3.x (8.3.1, 8.3.2, 8.3.3)
  • 8.2.x (8.2.0, 8.2.1, 8.2.2, 8.2.3)
  • 8.1.x (8.1.0, 8.1.1, 8.1.2, 8.1.3)
  • 8.0.x (8.0.0, 8.0.1)

Elasticsearch 7.x

  • 7.17.x (7.17.0, 7.17.3, 7.17.4, 7.17.5, 7.17.6, 7.17.7, 7.17.8, 7.17.9, 7.17.10, 7.17.11, 7.17.14)
  • 7.16.x (7.16.1, 7.16.2.1, 7.16.3)
  • 7.10.x (7.10.0, 7.10.1, 7.10.2)
  • 7.9.x (7.9.0, 7.9.1, 7.9.2, 7.9.3)
  • 7.8.x (7.8.0, 7.8.1)
  • 7.7.x (7.7.0, 7.7.1)
  • 7.6.x (7.6.0, 7.6.1, 7.6.2)
  • 7.5.x (7.5.0, 7.5.1, 7.5.2)
  • 7.4.x (7.4.0, 7.4.1, 7.4.2)
  • 7.3.x (7.3.2)

Elasticsearch 6.x

  • 6.8.x (6.8.0, 6.8.1, 6.8.2, 6.8.3, 6.8.4, 6.8.6, 6.8.8, 6.8.9, 6.8.10, 6.8.11, 6.8.12, 6.8.23)
  • 6.7.x (6.7.1, 6.7.2)
  • 6.6.x (6.6.0, 6.6.1, 6.6.2)
  • 6.5.x (6.5.0, 6.5.1, 6.5.2, 6.5.3, 6.5.4)
  • 6.4.x (6.4.0, 6.4.1, 6.4.2, 6.4.3)
  • 6.3.x (6.3.0, 6.3.1, 6.3.2)
  • 6.2.x (6.2.1, 6.2.2, 6.2.3, 6.2.4)
  • 6.1.x (6.1.0, 6.1.1, 6.1.2, 6.1.3, 6.1.4)
  • 6.0.x (6.0.0, 6.0.1)

Elasticsearch 5.x

  • 5.6.x (5.6.0, 5.6.1, 5.6.2, 5.6.3, 5.6.4, 5.6.5, 5.6.10, 5.6.16)
  • 5.5.x (5.5.0, 5.5.1, 5.5.2)
  • 5.4.x (5.4.0, 5.4.1, 5.4.2, 5.4.3)
  • 5.3.x (5.3.0, 5.3.1, 5.3.2, 5.3.3)
  • 5.2.x (5.2.0, 5.2.1, 5.2.2)
  • 5.1.x (5.1.1, 5.1.2)
  • 5.0.x (5.0.0, 5.0.1, 5.0.2)

Elasticsearch 2.x

  • 2.4.x (2.4.1, 2.4.2, 2.4.3, 2.4.4, 2.4.6)

Install in Elasticsearch for version <= 5.4.0

bin/elasticsearch-plugin install \
  https://repo1.maven.org/maven2/pl/allegro/tech/elasticsearch/plugin/elasticsearch-analysis-morfologik/5.4.0/elasticsearch-analysis-morfologik-5.4.0-plugin.zip

tip: select proper plugin version, should be the same as elasticsearch version

Install in Elasticsearch 2.x

bin/plugin install \
  https://repo1.maven.org/maven2/pl/allegro/tech/elasticsearch/plugin/elasticsearch-analysis-morfologik/2.4.2/elasticsearch-analysis-morfologik-2.4.2-plugin.zip

tip: select proper plugin version, should be the same as elasticsearch version

Build

./gradlew clean build

elasticsearch-analysis-morfologik's People

Contributors

adamdubiel avatar allepbo avatar cybuch avatar daredzik avatar gotofinal avatar pbobruk avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

elasticsearch-analysis-morfologik's Issues

Morfologik ver 6.8.0

Hi,

Can You release plug-in version for handling Elasticsearch 6.8.0 support?
This is only version available in Debian GNU/Linux 9.8 (stretch).

Best Regards

Upgrade ES to 7.13

Please upgrade ElasticSearch to version 7.13, because earlier verions are not supported on Apple Silicon/M1.

Not found for plugin in maven 7.10.3

Hi
During installation I get an error from the maven repository of your plugin ze not found.
Es version: 7.10.3
https://repo1.maven.org/maven2/pl/allegro/tech/elasticsearch/plugin/elasticsearch-analysis-morfologik/7.10.3/elasticsearch-analysis-morfologik-7.10.3.zip

bin/elasticsearch-plugin install pl.allegro.tech.elasticsearch.plugin:elasticsearch-analysis-morfologik:7.10.3
-> Installing pl.allegro.tech.elasticsearch.plugin:elasticsearch-analysis-morfologik:7.10.3
-> Downloading pl.allegro.tech.elasticsearch.plugin:elasticsearch-analysis-morfologik:7.10.3 from maven central
-> Failed installing pl.allegro.tech.elasticsearch.plugin:elasticsearch-analysis-morfologik:7.10.3
-> Rolling back pl.allegro.tech.elasticsearch.plugin:elasticsearch-analysis-morfologik:7.10.3
-> Rolled back pl.allegro.tech.elasticsearch.plugin:elasticsearch-analysis-morfologik:7.10.3
Exception in thread "main" java.io.FileNotFoundException: https://repo1.maven.org/maven2/pl/allegro/tech/elasticsearch/plugin/elasticsearch-analysis-morfologik/7.10.3/elasticsearch-analysis-morfologik-7.10.3.zip
        at java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1928)
        at java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1528)
        at java.base/sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(HttpsURLConnectionImpl.java:224)
        at org.elasticsearch.plugins.InstallPluginCommand.downloadZip(InstallPluginCommand.java:448)
        at org.elasticsearch.plugins.InstallPluginCommand.downloadAndValidate(InstallPluginCommand.java:525)
        at org.elasticsearch.plugins.InstallPluginCommand.download(InstallPluginCommand.java:315)
        at org.elasticsearch.plugins.InstallPluginCommand.execute(InstallPluginCommand.java:251)
        at org.elasticsearch.plugins.InstallPluginCommand.execute(InstallPluginCommand.java:224)
        at org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:86)
        at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:127)
        at org.elasticsearch.cli.MultiCommand.execute(MultiCommand.java:91)
        at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:127)
        at org.elasticsearch.cli.Command.main(Command.java:90)
        at org.elasticsearch.plugins.PluginCli.main(PluginCli.java:47)

Elasticsearch v.7

Hi,
Do you plan to update the plugin for the version 7 of Elasticsearch? (currently 7.0.1)

Thanks for your work!

No 7.10.0 relase on maven repo

Hello,
according to the readme there should be 7.10.0 on maven repository, but there isn't. Can you please update the repository?

Morfologik ver 6.8.2

Hi again,

Can You release plug-in version for handling Elasticsearch 6.8.2 support?
It is fresh update in Debian GNU/Linux 9.8 (stretch).

Best Regards

lowercase + morfologik_stem dosen't work for proper names

With the following mapping:

{
  "settings": {
    "index": {
      "analysis": {
        "analyzer": {
          "morfologik_downcase": {
           "type": "custom",
            "tokenizer": "standard",
            "filter": [
              "lowercase",
              "morfologik_stem"
            ]
          }
        }
      }
    }
  },
  "mappings": {
    "message": {
      "properties": {
        "content":          { "type": "text", "analyzer": "morfologik_downcase"}
      }
    }
  }
}

The following analysis:

curl -XGET http://localhost:9200/index_name/_analyze -d '{"analyzer": "morfologik_downcase", "text": "Kwaśniewskiego" }' -H "Content-type: application/json" -H "Accept: application/json"

gives the following result:

{"tokens":[{"token":"kwaśniewskiego","start_offset":0,"end_offset":14,"type":"ALPHANUM>","position":0}]}

Even though "Kwaśniewski" is present in the Morfologik dictionary.

The reason is that the entires for proper names in Morfologik are capitalized and when the tokens are lowercased, they are not found in the dictionary. The solution is a modified Morfologik with lowercased entries. I have a patched version for the dictionary in some of my repos, i.e. https://github.com/apohllo/elasticsearch-analysis-morfologik but the real solution would require changing the original JAR of Morfologik. I once contacted the authors of the library, but they were not interested in such a change.

Maybe Allegro could host a patched version of Morfologik, as it does for the ElasticSearch Morfologik Plugin?

Morfologik ver 6.8.1

Hi again,

Can You release plug-in version for handling Elasticsearch 6.8.1 support?
It is fresh update in Debian GNU/Linux 9.8 (stretch).

Best Regards

Warning during installation

Downloading pl.allegro.tech.elasticsearch.plugin:elasticsearch-analysis-morfologik:6.0.0 from maven central
Warning: sha512 not found, falling back to sha1. This behavior is deprecated and will be removed in a future release. Please update the plugin to use a sha512 checksum

Is it possible to update the SHA to use 256 bits?

IllegalAccessException for ES 2.4.4

I am currently trying to use morfologik for ES 2.4.4. I use the following settings for the index:

  "settings": {
    "index": {
      "number_of_shards": 1,
      "number_of_replicas": 1,
      "analysis": {
        "filter": {
        },
        "analyzer": {
          "fulltext": {
            "type": "morfologik",
            "tokenizer": "standard",
            "filter": [
              "lowercase"
            ]
          }
        }
      }
    }
  }
}

But when I try to create the index, i get
java.lang.IllegalAccessException: Class org.elasticsearch.common.inject.DefaultConstructionProxyFactory$1 can not access a member of class pl.allegro.tech.elasticsearch.index.analysis.pl.MorfologikAnalyzerProvider with modifiers "public"

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.