GithubHelp home page GithubHelp logo

es-utils's People

Contributors

andrew-grechkin avatar dostermeier avatar gugod avatar ksurent avatar lharey avatar maage avatar manwar avatar propertone avatar reyjrar avatar samitbadle avatar shatlovsky avatar takus avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

es-utils's Issues

No feature for name [_status]

Hi,

The index _status API has been replaced with the index _stats. I just change the attribute in the "es-daily-index-maintenance.pl" script and it works again:

- my $status = es_request('_status',{index=>$index});
+ my $status = es_request('_stats',{index=>$index});

Why Elastijk and not official API ?

Hi,

I have a question about the API Elasticsearch. Why did you have taken "Elastijk" and not "Search::Elasticsearch", the official Elasticsearch API ?

Best regards,

Current --tail logic is unreliable for multi node clusters or multiple data sources with possible delays

The current --tail logic uses the simple condition: range => {'@timestamp' => {gte => $last_hit_ts}}

Let's assume that there is a cluster of 3 ES nodes, but only one source of data, and that data is in presented in @timestamp order. Elasticsearch sends the docs to be stored to different nodes. Those nodes will update the segments with the new docs within refresh_interval but the update is not synchronized across nodes. So the order in which docs 'become searchable' may not be in @timestamp order, so the gte => $last_hit_ts condition is not sufficiently safe. Older docs may be missed because they became 'searchable' after other docs that have a later @timestamp.

A fix for this might be something like:

  • add the concept of a 'time window' eg range => {'@timestamp' => {gte => $last_hit_ts - $time_window}}
  • record the document ids that have been seen in final time window from the last query
  • exclude those document ids so they're not shown twice (either by including the ids in an extra NOT condition in the query, or else by checking and discarding duplicate ids in the client)

(The scope of the problem described above is bounded by the value of refresh_interval but other related situations aren't. Consider the case of multiple sources of data where some might be delayed. For example, we have many machines feeding logs to several logstash servers which feed an ES cluster. Logs are often delayed for at least a few seconds and sometimes for many minutes. Increasing the 'time window' approach described above doesn't scale well to larger time periods or high volumes of log messages. For this case the best approach would be to enable the _timestamp field field and use that to drive the tailing logic.)

Logic around already-closed indexes doesn't log cleanly

This applies to version 2.9, which is the last available on RHEL 6.

The following conditional only checks whether it got a status object back, then tries to evaluate the content of it even when it might not be meaningful.

https://github.com/reyjrar/es-utils/blob/release-2.9/scripts/es-daily-index-maintenance.pl#L174

The effect during a normal run is like so:

[root@production-elasticsearch-1 ~]# /usr/local/bin/es-daily-index-maintenance.pl --all --replicas-min 1 --local --pattern logstash-*
Use of uninitialized value in numeric gt (>) at /usr/local/bin/es-daily-index-maintenance.pl line 174.
Use of uninitialized value in numeric gt (>) at /usr/local/bin/es-daily-index-maintenance.pl line 174.
Use of uninitialized value in numeric gt (>) at /usr/local/bin/es-daily-index-maintenance.pl line 174.

Every closed index emits a warning line. Script runs fine, but stderr is really noisy. What it actually gets back from ES during that _status request is:

{"error":"IndexClosedException[[logstash-2014.11.07] closed]","status":403}

Which of course dosn't have a shards element. Probably another defined() call would clean it up.

es-daily-index-maintenance.pl should also delete closed indexes

Message::Passing::Output::ElasticSearch closes indexes older than 7 days.
Because index_stats doesn't return closed indexes they aren't deleted.
GET /_cluster/state includes also the closed indexes so you might want to use the cluster_state method instead.

Apply settings after a few days

Hi,

It would be nice to be able to use the es-apply-settings.pl script to apply parameters after a few days, and not just on the last days.

Best regards,

No handler found for uri [/] and method [PUT]

Hi,

I'm testing "es-copy-index.pl" and I have a issue with elasticsearch 5.3.

es_request(//) failed[400]: Bad Request
es_request(//) returned HTTP Status Bad Request
Undefined subroutine &main::is_hashref called at ./es-copy-index.pl line 136.

$res return "No handler found for uri [/] and method [PUT]"

It seems that the problem comes from:

    $res = es_request('/',
        {
            method => 'PUT',
            index => $INDEX{to},
        },
        {
            settings => $to_settings,
            mappings => $mappings,
        }
    );

Do you have this issue ?

Fails with 500 and not much else...

[root@salttestvm70 ~]# /usr/local/bin/es-copy-index.pl --debug --from other-host --to localhost logstash-2015.10.23
Failed to create index in localhost (http status = 500): [
   "500",
   {
      "error" : "NullPointerException[null]",
      "status" : 500
   }
]
 at /usr/local/bin/es-copy-index.pl line 75.

That's all although I turned on debugging.

Error using es-daily-index-maintenance.pl

I'm getting the following error when trying to use one of the scripts:

Attempt to reload JSON/XS.pm aborted.
Compilation failed in require at /usr/bin/es-daily-index-maintenance.pl line 14.
BEGIN failed--compilation aborted at /usr/bin/es-daily-index-maintenance.pl line 14

JSON::XS is installed (v2.3.4).

perl -v
This is perl, v5.10.0 built for x86_64-linux-thread-multi

download of packages status 404

I wanted to install the es-utils following the instructions from the main page. At the point "wget" the package I'm getting a "404"-error. Where are the packages? I only see very old releases (0.009 and 0.010).

Regards.

Remove re-usable components into their own distribution

I'd like to separate the following modules from this distribution and make the distribution use those modules:

  • App::ElasticSearch::Utilities::HTTPRequest to "HTTP::Request::es"
  • App::ElasticSearch::Utilities::Query to "es::Query"
  • App::ElasticSearch::Utilities::QueryString to "es::QueryString" (bundled with es::Query)
  • App::Elasticsearch::Utilities::Connection to "LWP::UserAgent::es"

I'm selecting a lowercase 'es' intentionally. These are not the official elastic modules. These are generic, minimal modules designed to make working with Elasticsearch less of a hassle.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.