GithubHelp home page GithubHelp logo

zleternity / advance-update Goto Github PK

View Code? Open in Web Editor NEW

This project forked from crazycompiler/advance-update

0.0 0.0 0.0 221 KB

Advance Update in a Elasticsearch plugin that provides you control over the document update functionality of elasticsearch.

Java 100.00%

advance-update's Introduction

Advance Update

Control your Elasticsearch document updates with extra speed.

Description

  • So, Elasticsearch provides update API's like _update and _bulk which helps us in updating elasticsearch document. But elasticsearch updates the document by merging the given [NEW] document with the [OLD] document, which leads to a case where

    • The older documents fields which are not present in the new given document will be retained, so if you want delete a field you have to explicitly send a update query with a script. And everyone knows that elasticsearch scripts are slower (eg: to delete a field from 13 million documents it will take 50 min in 5 node cluster)

    eg : Old Document

         {
             "a": 23,
             "b": 40
         }
    _New Document_ : /_update api
    
         {
             "a": 19,
             "c": 45
         }
    
    Now If I send this to elasticsearch the result output document will be :
    
         {
             "a": 19,
             "b": 40,
             "c": 45
         }
    
    So the field "b"is retained, but my use case is to remove the "b", as I am not sending it in the request.
    
    • The Advance Plugin can do this for you.

    eg : Old Document

         {
             "a": 23,
             "b": 40
         }
    _New Document_ : /_advanceupdate api
    
         {
             "a": 19,
             "c": 45
         }
    
    Now If I send this to elasticsearch the result output document will be :
    
         {
             "a": 19,
             "c": 45
         }
    
    • Elasticsearch Updates first update documents in the primary shard and then it updates the replica shards, now that increased the total time required for update of document.
    • You can reduce the total time by setting the replica to -1, but if you have a lot of documents which takes 1 hour to get updated and in meanwhile your nodes goes down you dont have a protection of the replicas to recover i.e, you loose your data.
    • Advance-Update gives you the functionality where your old data is safe and speed for updating documents is much higher than the normal updates.

API Support

  • POST _advanceupdate

    usage :

     /_update
    
     {
         "a": 19,
         "c": 45
     }
    
  • POST/PUT _advancebulk

     /_advancebulk
    
      { "update" : {"_id" : "2", "_type" : "type1", "_index" : "test"} }
      { "doc" : {"b": 12,"f": 14,"m":15}, "doc_as_upsert" : true}
      { "update" : {"_id" : "2", "_type" : "type1", "_index" : "test"} }
      { "doc" : {"s": 15,"l": 14,"k":12}, "doc_as_upsert" : true}
    

Prerequisites

  • Elasticsearch 5.6.0
  • Change the thread_pool.bulk.queue_size to high enough so that your documents don't get skipped or else you can keep it to -1 in your elasticsearch.yml file.

Installing

Download and install elasticsearch 5.6.0 from

Download the latest Advance Update plugin from

Install the plugin :

    <ES directory>/elasticsearch-plugin install file:///Downloads/advance-update.zip

Test the installation by visiting https://localhost:9200/_cat/plugins

Versioning

Works with only elasticsearch 5.6.0

What next ?

  • Configure copying elasticsearch documents from primary shard to replicas
  • Updating a key in all the documents.

Authors

advance-update's People

Contributors

crazycompiler avatar paralax avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.