GithubHelp home page GithubHelp logo

opendatasoft / elasticsearch-aggregation-geoclustering Goto Github PK

View Code? Open in Web Editor NEW
60.0 30.0 16.0 324 KB

An Elasticsearch plugin to aggregate Geo Points in clusters.

License: Apache License 2.0

Java 99.31% Dockerfile 0.69%

elasticsearch-aggregation-geoclustering's Introduction

Elasticsearch Geo Point clustering aggregation plugin

This plugin extends Elasticsearch with a geo_point_clustering aggregation, allowing to fetch geo_point documents as clusters of points. It is very similar to what is done with the official geohash_grid aggregation except that final clusters are not bound to the geohash grid.

For example, at zoom level 1 with points across France, geohash_grid agg will output 3 clusters stuck to geohash cells u, e, s, while geo_point_clustering will merge these clusters into one. This is done during the reduce phase.

Contrary to geohash_grid aggregation, buckets keys are a tuple(centroid, geohash cells) instead of geohash cells only, because one cluster can be linked to several geohash cells, due to the cluster merge process during the reduce phase. Centroids are built during the shard collect phase.

Please note that geo_shape data type is not supported.

Usage

Install

Install plugin with: ./bin/elasticsearch-plugin install https://github.com/opendatasoft/elasticsearch-aggregation-geoclustering/releases/download/v7.17.6.0/geopoint-clustering-aggregation-7.17.6.0.zip

Quickstart

Intro

{
  "aggregations": {
    "<aggregation_name>": {
      "geo_point_clustering": {
        "field": "<field_name>",
        "zoom": "<zoom>"
      }
    }
  }
}

Input parameters :

  • field: must be of type geo_point
  • zoom: mandatory integer parameter between 0 and 25. It represents the zoom level used in the request to aggregate geo points
  • radius: radius in pixel. It is used during the reduce phase to merge close clusters. Default to 40
  • ratio: ratio used to make a second merging pass during the reduce phase. If the value is 0, no second pass is made. Default to 0
  • extent: Extent of the tiles. Default to 256

Real-life example

Create an index:

PUT test
{
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 0
  },
  "mappings": {
    "properties": {
      "location": {
        "type": "geo_point"
      }
    }
  }
}

Push some points:

POST test/_bulk?refresh
{"index":{"_id":1}}
{"location":[2.454929, 48.821578]}
{"index":{"_id":2}}
{"location":[2.245858, 48.86914]}
{"index":{"_id":3}}
{"location":[2.240358, 48.863481]}
{"index":{"_id":4}}
{"location":[2.25292, 48.847176]}
{"index":{"_id":5}}
{"location":[2.279111, 48.872383]}
{"index":{"_id":6}}
{"location":[2.336267, 48.822021]}
{"index":{"_id":7}}
{"location":[2.338677, 48.822672]}
{"index":{"_id":8}}
{"location":[2.336643, 48.822493]}
{"index":{"_id":9}}
{"location":[2.438465, 48.84204]}
{"index":{"_id":10}}
{"location":[2.381554, 48.835382]}
{"index":{"_id":11}}
{"location":[2.407744, 48.83733]}
{"index":{"_id":12}}
{"location":[2.34521, 48.849358]}
{"index":{"_id":13}}
{"location":[2.252938, 48.846041]}
{"index":{"_id":14}}
{"location":[2.279715, 48.871775]}
{"index":{"_id":15}}
{"location":[2.380629, 48.879757]}

Perform an aggregation:

POST test/_search?size=0
{
  "aggregations": {
    "clusters": {
      "geo_point_clustering": {
        "field": "location",
        "zoom": 9
      }}}}

Result:

"aggregations" : {
    "clusters" : {
      "buckets" : [
        {
          "geohash_grids" : [
            "u09wn",
            "u09tz",
            "u09ty",
            "u09tx",
            "u09tv",
            "u09tt"
          ],
          "doc_count" : 9,
          "centroid" : {
            "lat" : 48.83695897646248,
            "lon" : 2.380013056099415
          }
        },
        {
          "geohash_grids" : [
            "u09w5",
            "u09tg",
            "u09tf"
          ],
          "doc_count" : 6,
          "centroid" : {
            "lat" : 48.86166598415002,
            "lon" : 2.258483301848173
          }
        }
      ]
    }

Development environment setup

Build

Built with Java 17 and Gradle 7.5.1 (use the packaged gradlew included in this repo if you want to build yourself).

Development Environment Setup

Build the plugin using gradle:

./gradlew build

or

./gradlew assemble  # (to avoid the test suite)

Then the following command will start a dockerized ES and will install the previously built plugin:

docker-compose up

Please be careful during development: you'll need to manually rebuild the .zip using ./gradlew build on each code change before running docker-compose up again.

NOTE: In docker-compose.yml you can uncomment the debug env and attach a REMOTE JVM on *:5005 to debug the plugin.

elasticsearch-aggregation-geoclustering's People

Contributors

5k4nd avatar clement-tourriere avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

elasticsearch-aggregation-geoclustering's Issues

Calculation of centroid for locations near the 180th meridian (antimeridian) is incorrect

The centroid calculation, which simply averages latitude and longitude values, is accurate in most cases, but produces incorrect results near the 180th meridian. For example, with longitudes 179.649556 and -176.131694, the calculation is as follows:

(179.649556 - 176.131694) / 2 = 1.758931

As can be seen, the example produces results in the opposite hemisphere.

Facing the same issue

Hi,

I am facing the same issue. Is there a work around to it?

Warm Regards
Tanjim

Parser not found

After installing aggregation new gets elastic search error:

"reason": "[1:46] unable to parse BaseAggregationBuilder with name [geohash_clustering]: parser not found"

I am posting :
{ "aggregations": { "agg": { "geohash_clustering": { "field": "coordinates", "zoom": "1" } } } }

what is the problem?

My elastic version:
"version": { "number": "6.4.1", "build_flavor": "oss", "build_type": "tar", "build_hash": "e36acdb", "build_date": "2018-09-13T22:18:07.696808Z", "build_snapshot": false, "lucene_version": "7.4.0", "minimum_wire_compatibility_version": "5.6.0", "minimum_index_compatibility_version": "5.0.0" },

Works as expected on elastic.co clusters?

(this isn't so much an issue as it is a question for confirmation)

We're planning to switch our elasticsearch hosting to elastic.co (or similar) and looking for confirmation that this plugin runs with little or no difficulty on elastic.co-hosted clusters. This plugin's functionality is the reason we're switching away from AWS OpenSearch and, while I'm personally confident that it will run as expected, management would really like some external (even anecdotal) confirmation during our due diligence phase that this works.

While I would happily test this myself on our test elastic.co cluster, plugins are not available with the free tier or even monthly subscriptions, they're only available with annual contracts (which we'll switch to once confirmed, we're avoiding during the evaluation phase for cost reasons).

I have this plugin running locally in docker with Elasticsearch 7.17.6 (and it works great!) so any feedback on operability or gotchas would be greatly appreciated.

Unknown field [distance], parser not found

My elasticsearch version is 6.6.2
i installed plugin 6.6.2 and try to get aggregated results.

POST /index/_search
{
    "aggregations" : {
        "my_cluster_aggregation" : {
            "geo_point_clustering": {
                "field": "location",
                "zoom": 1,
                "distance": 50
            }
        }
    }
}

location has geo_point type.

And i get error:

{
   "error": {
      "root_cause": [
         {
            "type": "x_content_parse_exception",
            "reason": "[8:17] [geo_point_clustering] unknown field [distance], parser not found"
         }
      ],
      "type": "x_content_parse_exception",
      "reason": "[8:17] [geo_point_clustering] unknown field [distance], parser not found"
   },
   "status": 400
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.