GithubHelp home page GithubHelp logo

isabella232 / elasticsearch-concatenate-token-filter Goto Github PK

View Code? Open in Web Editor NEW

This project forked from mobimeo/elasticsearch-concatenate-token-filter

0.0 0.0 0.0 36 KB

Elasticsearch plugin which only provides a TokenFilter that merges tokens in a token stream back into one. Taken from http://elasticsearch-users.115913.n3.nabble.com/Is-there-a-concatenation-filter-td3711094.html

Java 100.00%

elasticsearch-concatenate-token-filter's Introduction

elasticsearch-concatenate-token-filter

Elasticsearch plugin which only provides a TokenFilter that merges tokens in a token stream back into one. Taken from http://elasticsearch-users.115913.n3.nabble.com/Is-there-a-concatenation-filter-td3711094.html

ElasticSearch version support

This plugin is compatible with ES 5.6.5.

Build

To build .zip or .jar for this plugin, run following command and you should see generated files in /target

mvn clean install

Install

To install on your current ES node, use the plugin binary provided in the bin folder (on Ubuntu it should be under /usr/share/elasticsearch/bin)

bin/elasticsearch-plugin  install file:<path to generated zip>/elasticsearch-concatenate-5.6.5.zip

Usage

The plugin provides a token filter of type concatenate which has one parameter token_separator. Use it in your custom analyzers to merge tokenized strings back into one single token (usually after applying stemming or other token filters).

Arrays

When saving arrays of strings to a field, these are handled in elasticsearch as separate tokens, so this filter would collapse all the elements of the array into one, and usually you don't want that to happen. As a workaround you can set position_offset_gap on the field to a high number and pass the same number as the increment_gap parameter to the filter, which then only concatenates all tokens closer than this value.

Example

Given the custom analyzer (see https://www.elastic.co/guide/en/elasticsearch/guide/current/custom-analyzers.html):

{
  "analysis" : {
    "filter" : {
      "concatenate" : {
        "type" : "concatenate",
        "token_separator" : "_"
      },
      "custom_stop" : {
        "type": "stop",
        "stopwords": ["and", "is", "the"]
      }
    },
    "analyzer" : {
      "stop_concatenate" : {
        "filter" : [
          "custom_stop",
          "concatenate"
        ],
        "tokenizer" : "standard"
      }
    }
  }
}

the string:

"the fox jumped over the fence"

would be analyzed as:

"fox_jumped_over_fence"

elasticsearch-concatenate-token-filter's People

Contributors

arad-ta avatar francesconero avatar maoning avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.