catinello / nagios-check-graylog2 Goto Github PK
View Code? Open in Web Editor NEWNagios Graylog2 checks via REST API the availability of the service.
License: BSD 2-Clause "Simplified" License
Nagios Graylog2 checks via REST API the availability of the service.
License: BSD 2-Clause "Simplified" License
If your server generates a lot of error messages the current polling can be extremely non-performant. I changed the API call from /system/indexer/failures to /system/indexer/failures/count?since=date and performance went from 2 minutes to 500 ms. Here is my diff:
--- main.go 2018-03-16 06:12:54.000000000 -0500
+++ /home/schmit/main.go 2018-08-06 10:55:30.400785192 -0500
@@ -141,7 +141,7 @@
quit(WARNING, fmt.Sprintf("lb_status: %v", system["lb_status"].(string)), nil)
}
- index := query(c+"/system/indexer/failures", *user, *pass)
+ index :=
+query(c+"/system/indexer/failures/count?since="+time.Now().AddDate(0,
+0, -1).Format("2006-01-02"), *user, *pass)
tput := query(c+"/system/throughput", *user, *pass)
inputs := query(c+"/system/inputs", *user, *pass)
total := query(c+"/count/total", *user, *pass) @@ -149,12 +149,12 @@
elapsed := time.Since(start)
// generate performance data output
- perf(elapsed.Seconds(), total["events"].(float64), inputs["total"].(float64), tput["throughput"].(float64), index["total"].(float64))
+ perf(elapsed.Seconds(), total["events"].(float64),
+inputs["total"].(float64), tput["throughput"].(float64),
+index["count"].(float64))
// fix for backwards compatiblity if no index error threshold is set
if len(*indexwarn) == 0 || len(*indexcrit) == 0 {
quit(OK, fmt.Sprintf("Service is running!\n%.f total events processed\n%.f index failures\n%.f throughput\n%.f sources\nCheck took %v\n",
- total["events"].(float64), index["total"].(float64), tput["throughput"].(float64), inputs["total"].(float64), elapsed), nil)
+ total["events"].(float64), index["count"].(float64),
+tput["throughput"].(float64), inputs["total"].(float64), elapsed), nil)
}
// convert indexwarn and indexcrit strings to float64 variables for comparison below @@ -168,17 +168,17 @@
}
// handle index thresholds
- if index["total"].(float64) < indexwarn2 && index["total"].(float64) < indexcrit2 {
+ if index["count"].(float64) < indexwarn2 && index["count"].(float64) <
+indexcrit2 {
quit(OK, fmt.Sprintf("Service is running!\n%.f total events processed\n%.f index failures\n%.f throughput\n%.f sources\nCheck took %v\n",
- total["events"].(float64), index["total"].(float64), tput["throughput"].(float64), inputs["total"].(float64), elapsed), nil)
+ total["events"].(float64), index["count"].(float64),
+tput["throughput"].(float64), inputs["total"].(float64), elapsed), nil)
}
- if index["total"].(float64) >= indexwarn2 && index["total"].(float64) < indexcrit2 {
+ if index["count"].(float64) >= indexwarn2 && index["count"].(float64)
+< indexcrit2 {
quit(WARNING, fmt.Sprintf("Index Failure above Warning Limit!\nService is running\n%.f total events processed\n%.f index failures\n%.f throughput\n%.f sources\nCheck took %v\n",
- total["events"].(float64), index["total"].(float64), tput["throughput"].(float64), inputs["total"].(float64), elapsed), nil)
+ total["events"].(float64), index["count"].(float64),
+tput["throughput"].(float64), inputs["total"].(float64), elapsed), nil)
}
- if index["total"].(float64) >= indexcrit2 {
+ if index["count"].(float64) >= indexcrit2 {
quit(CRITICAL, fmt.Sprintf("Index Failure above Critical Limit!\nService is running\n%.f total events processed\n%.f index failures\n%.f throughput\n%.f sources\nCheck took %v\n",
- total["events"].(float64), index["total"].(float64), tput["throughput"].(float64), inputs["total"].(float64), elapsed), nil)
+ total["events"].(float64), index["count"].(float64),
+tput["throughput"].(float64), inputs["total"].(float64), elapsed), nil)
}
}
Hello,
First of all, thanks a lot for this great plugin.
Here is my issue : I can't authenticate using a non admin user: I created a graylog limited user (role: reader) but when I run checkgraylog2, I have a 403 error (if I use an admin user, it works )
[root@localhost cacti]# ./check_graylog2 -l http://192.168.1.2:12900/ -insecure -u limited_user -p big_password
CRITICAL - Graylog2 API replied with HTTP code 403|time=0.000000;;;; total=0;;;; sources=0;;;; throughput=0;;;; index_failures=0;;;;
Is there any way to use a non admin user (as the passwords is used/stored in clear I prefer no to use an admin user)
Also, can you tell me which graylog metrics you use to check graylog health ? I found some on https://www.graylog.org/blog/86-back-to-basics-monitoring-graylog but I didn't found how to check if the service is running and how to check source number
Thanks for your help !
I have successfully pulled down the script and installed dependencies. I am able to run the script through the command line; however importing it into librenms is causing an service error. Usually this is because of either the wrong arguments or firewall issues.
I copied the command line arguments. I am able to hit the graylog instance with script.
Any other recommendations to check the error?
UNKNOWN - Can not parse JSON from Graylog2 API|time=0.000000;;;; total=0;;;; sources=0;;;; throughput=0;;;; index_failures=0;;;;
Running the latest Graylog and the API works fine in a browser - Chrome used for testing it
/usr/local/nagios/libexec/check_graylog2 -l http://192.168.1.95:9000/api/ -u admin -p admin
It would be great to have the following counters included in your plugin as performance data - with different Warning/Critical barriers as Index errors:
Of course, everything optional - but for larger installations a tracking of each metric would be very useful.
Hi,
I am running Graylog 2.1.2 version on Centos 7 64-bit and I am running Icinga2 on Centos 6 64-bit. When I try to check the performance of graylog2 through Icinga I am getting the "UNKNOWN - Cannot parse JSON from Graylog2 API" error
./check_graylog2 -l https://syslogsrv-ndi.example.com:9000/ -u admin -p server -insecure
UNKNOWN - Can not parse JSON from Graylog2 API|time=0.000000;;;; total=0;;;; sources=0;;;; throughput=0;;;; index_failures=0;;;;
What could be the issue, any missing package etc..?/
Thanks & Regards
Ankush Grover
Is there a way of generating graphs of "index failures"?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.