reyjrar / es-utils Goto Github PK
View Code? Open in Web Editor NEWElasticSearch Utilities
ElasticSearch Utilities
e.g. look at https://rt.cpan.org/Ticket/Display.html?id=83126 and be dismayed
File::Slurp::Tiny and Path::Tiny are both excellent alternatives. See also http://shadow.cat/blog/matt-s-trout/mstpan-5/
The POD still refers to the copy script. Please clarify why it has gone missing.
Hi,
The index _status API has been replaced with the index _stats. I just change the attribute in the "es-daily-index-maintenance.pl" script and it works again:
- my $status = es_request('_status',{index=>$index});
+ my $status = es_request('_stats',{index=>$index});
Hi,
I have a question about the API Elasticsearch. Why did you have taken "Elastijk" and not "Search::Elasticsearch", the official Elasticsearch API ?
Best regards,
The current --tail
logic uses the simple condition: range => {'@timestamp' => {gte => $last_hit_ts}}
Let's assume that there is a cluster of 3 ES nodes, but only one source of data, and that data is in presented in @timestamp
order. Elasticsearch sends the docs to be stored to different nodes. Those nodes will update the segments with the new docs within refresh_interval
but the update is not synchronized across nodes. So the order in which docs 'become searchable' may not be in @timestamp
order, so the gte => $last_hit_ts
condition is not sufficiently safe. Older docs may be missed because they became 'searchable' after other docs that have a later @timestamp
.
A fix for this might be something like:
range => {'@timestamp' => {gte => $last_hit_ts - $time_window}}
(The scope of the problem described above is bounded by the value of refresh_interval
but other related situations aren't. Consider the case of multiple sources of data where some might be delayed. For example, we have many machines feeding logs to several logstash servers which feed an ES cluster. Logs are often delayed for at least a few seconds and sometimes for many minutes. Increasing the 'time window' approach described above doesn't scale well to larger time periods or high volumes of log messages. For this case the best approach would be to enable the _timestamp
field field and use that to drive the tailing logic.)
This applies to version 2.9, which is the last available on RHEL 6.
The following conditional only checks whether it got a status object back, then tries to evaluate the content of it even when it might not be meaningful.
https://github.com/reyjrar/es-utils/blob/release-2.9/scripts/es-daily-index-maintenance.pl#L174
The effect during a normal run is like so:
[root@production-elasticsearch-1 ~]# /usr/local/bin/es-daily-index-maintenance.pl --all --replicas-min 1 --local --pattern logstash-*
Use of uninitialized value in numeric gt (>) at /usr/local/bin/es-daily-index-maintenance.pl line 174.
Use of uninitialized value in numeric gt (>) at /usr/local/bin/es-daily-index-maintenance.pl line 174.
Use of uninitialized value in numeric gt (>) at /usr/local/bin/es-daily-index-maintenance.pl line 174.
Every closed index emits a warning line. Script runs fine, but stderr is really noisy. What it actually gets back from ES during that _status request is:
{"error":"IndexClosedException[[logstash-2014.11.07] closed]","status":403}
Which of course dosn't have a shards element. Probably another defined() call would clean it up.
Message::Passing::Output::ElasticSearch closes indexes older than 7 days.
Because index_stats doesn't return closed indexes they aren't deleted.
GET /_cluster/state includes also the closed indexes so you might want to use the cluster_state method instead.
Hi,
It would be nice to be able to use the es-apply-settings.pl script to apply parameters after a few days, and not just on the last days.
Best regards,
Hi,
I'm testing "es-copy-index.pl" and I have a issue with elasticsearch 5.3.
es_request(//) failed[400]: Bad Request
es_request(//) returned HTTP Status Bad Request
Undefined subroutine &main::is_hashref called at ./es-copy-index.pl line 136.
$res return "No handler found for uri [/] and method [PUT]"
It seems that the problem comes from:
$res = es_request('/',
{
method => 'PUT',
index => $INDEX{to},
},
{
settings => $to_settings,
mappings => $mappings,
}
);
Do you have this issue ?
[root@salttestvm70 ~]# /usr/local/bin/es-copy-index.pl --debug --from other-host --to localhost logstash-2015.10.23
Failed to create index in localhost (http status = 500): [
"500",
{
"error" : "NullPointerException[null]",
"status" : 500
}
]
at /usr/local/bin/es-copy-index.pl line 75.
That's all although I turned on debugging.
I'm getting the following error when trying to use one of the scripts:
Attempt to reload JSON/XS.pm aborted.
Compilation failed in require at /usr/bin/es-daily-index-maintenance.pl line 14.
BEGIN failed--compilation aborted at /usr/bin/es-daily-index-maintenance.pl line 14
JSON::XS is installed (v2.3.4).
perl -v
This is perl, v5.10.0 built for x86_64-linux-thread-multi
What is the reason to require Perl 5.14.0 or better for release version 3.0?
I wanted to install the es-utils following the instructions from the main page. At the point "wget" the package I'm getting a "404"-error. Where are the packages? I only see very old releases (0.009 and 0.010).
Regards.
I'd like to separate the following modules from this distribution and make the distribution use those modules:
I'm selecting a lowercase 'es' intentionally. These are not the official elastic modules. These are generic, minimal modules designed to make working with Elasticsearch less of a hassle.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.