GithubHelp home page GithubHelp logo

catinello / nagios-check-graylog2 Goto Github PK

View Code? Open in Web Editor NEW
9.0 7.0 10.0 23 KB

Nagios Graylog2 checks via REST API the availability of the service.

License: BSD 2-Clause "Simplified" License

Go 100.00%
nagios graylog2 monitoring

nagios-check-graylog2's Introduction

nagios-check-graylog2

Nagios Graylog2 checks via REST API the availability of the service.

  • Is the service processing data?
  • How long does the check take?
  • Monitoring performance
    • through the number of data sources,
    • total processed messages,
    • index failures
    • and the actual throughput.

This plugin is written in standard Go which means there are no third party libraries used and it is plattform independant. It can compile on all available Go architectures and operating systems (Linux, *BSD, Mac OS X, Windows, ...).

Installation:

Just download the source and build it yourself using the go-tools.

$ go get github.com/catinello/nagios-check-graylog2
$ mv $GOPATH/bin/nagios-check-graylog2 check_graylog2

Usage:

check_graylog2
  -c string
        Index error critical limit. (optional)
  -insecure
        Accept insecure SSL/TLS certificates. (optional)
  -l string
        Graylog API URL (default "http://localhost:12900")
  -p string
        API password
  -u string
        API username
  -version
        Display version and license information. (info)
  -w string
        Index error warning limit. (optional)

Debugging:

Please try your command with the environment variable set as NCG2=debug or prefixing your command for example on linux like this.

NCG2=debug /usr/local/nagios/libexec/check_graylog2 -l http://localhost:9000/api/ -u USERNAME -p PASSWORD -w 10 -c 20

Examples:

$ ./check_graylog2 -l http://localhost:12900 -u USERNAME -p PASSWORD -w 10 -c 20
OK - Service is running!
768764376 total events processed
0 index failures
297 throughput
1 sources
Check took 94ms
|time=0.0094;;;; total=768764376;;;; sources=1;;;; throughput=297;;;; index_failures=0;;;;

$ ./check_graylog2 -l http://localhost:12900 -u USERNAME -p PASSWORD -w 10 -c 20
CRITICAL - Can not connect to Graylog2 API|time=0.000000;;;; total=0;;;; sources=0;;;; throughput=0;;;; index_failures=0;;;;

$ ./check_graylog2 -l https://localhost -insecure -u USERNAME -p PASSWORD -w 10 -c 20
UNKNOWN - Port number is missing. Try https://hostname:port|time=0.000000;;;; total=0;;;; sources=0;;;; throughput=0;;;; index_failures=0;;;;

 $ ./check_graylog2 -l http://localhost:12900 -u USERNAME -p PASSWORD -w 10 -c 20
CRITICAL - Index Failure above Critical Limit!
Service is running
533732628 total events processed
21 index failures
297 throughput
1 sources
Check took 94ms
|time=0.0094;;;; total=533732628;;;; sources=1;;;; throughput=297;;;; index_failures=21;;;;

Return Values:

Nagios return codes are used.

0 = OK
1 = WARNING
2 = CRITICAL
3 = UNKNOWN

License:

© Antonino Catinello - BSD-License

nagios-check-graylog2's People

Contributors

catinello avatar kahluagenie avatar theherodied avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

nagios-check-graylog2's Issues

On Ubuntu 16.04.2 LTS

UNKNOWN - Can not parse JSON from Graylog2 API|time=0.000000;;;; total=0;;;; sources=0;;;; throughput=0;;;; index_failures=0;;;;

Running the latest Graylog and the API works fine in a browser - Chrome used for testing it

/usr/local/nagios/libexec/check_graylog2 -l http://192.168.1.95:9000/api/ -u admin -p admin

Trouble executing in librenms

I have successfully pulled down the script and installed dependencies. I am able to run the script through the command line; however importing it into librenms is causing an service error. Usually this is because of either the wrong arguments or firewall issues.

I copied the command line arguments. I am able to hit the graylog instance with script.

Any other recommendations to check the error?

UNKNOWN - Cannot parse JSON from Graylog2 API error

Hi,

I am running Graylog 2.1.2 version on Centos 7 64-bit and I am running Icinga2 on Centos 6 64-bit. When I try to check the performance of graylog2 through Icinga I am getting the "UNKNOWN - Cannot parse JSON from Graylog2 API" error

./check_graylog2 -l https://syslogsrv-ndi.example.com:9000/ -u admin -p server -insecure

UNKNOWN - Can not parse JSON from Graylog2 API|time=0.000000;;;; total=0;;;; sources=0;;;; throughput=0;;;; index_failures=0;;;;

What could be the issue, any missing package etc..?/

Thanks & Regards
Ankush Grover

Authentication issue

Hello,

First of all, thanks a lot for this great plugin.

Here is my issue : I can't authenticate using a non admin user: I created a graylog limited user (role: reader) but when I run checkgraylog2, I have a 403 error (if I use an admin user, it works )

[root@localhost cacti]# ./check_graylog2 -l http://192.168.1.2:12900/ -insecure -u limited_user -p big_password

CRITICAL - Graylog2 API replied with HTTP code 403|time=0.000000;;;; total=0;;;; sources=0;;;; throughput=0;;;; index_failures=0;;;;

Is there any way to use a non admin user (as the passwords is used/stored in clear I prefer no to use an admin user)

Also, can you tell me which graylog metrics you use to check graylog health ? I found some on https://www.graylog.org/blog/86-back-to-basics-monitoring-graylog but I didn't found how to check if the service is running and how to check source number

Thanks for your help !

Nagios Graphs

Is there a way of generating graphs of "index failures"?

Include additional performance data

It would be great to have the following counters included in your plugin as performance data - with different Warning/Critical barriers as Index errors:

  • unprocessed messages (in Case Graylog cannot write to ES, or has performance problems due to bad pipelines, grok patterns, etc...
  • journal size in percent (same reason as above)
  • Input Buffer, Process Buffer, Output Buffer

Of course, everything optional - but for larger installations a tracking of each metric would be very useful.

Suggestion for errors count

If your server generates a lot of error messages the current polling can be extremely non-performant. I changed the API call from /system/indexer/failures to /system/indexer/failures/count?since=date and performance went from 2 minutes to 500 ms. Here is my diff:

--- main.go	2018-03-16 06:12:54.000000000 -0500
+++ /home/schmit/main.go	2018-08-06 10:55:30.400785192 -0500
@@ -141,7 +141,7 @@
 		quit(WARNING, fmt.Sprintf("lb_status: %v", system["lb_status"].(string)), nil)
 	}
 
-	index := query(c+"/system/indexer/failures", *user, *pass)
+	index := 
+query(c+"/system/indexer/failures/count?since="+time.Now().AddDate(0, 
+0, -1).Format("2006-01-02"), *user, *pass)
 	tput := query(c+"/system/throughput", *user, *pass)
 	inputs := query(c+"/system/inputs", *user, *pass)
 	total := query(c+"/count/total", *user, *pass) @@ -149,12 +149,12 @@
 	elapsed := time.Since(start)
 
 	// generate performance data output
-	perf(elapsed.Seconds(), total["events"].(float64), inputs["total"].(float64), tput["throughput"].(float64), index["total"].(float64))
+	perf(elapsed.Seconds(), total["events"].(float64), 
+inputs["total"].(float64), tput["throughput"].(float64), 
+index["count"].(float64))
 
 	// fix for backwards compatiblity if no index error threshold is set
 	if len(*indexwarn) == 0 || len(*indexcrit) == 0 {
 		quit(OK, fmt.Sprintf("Service is running!\n%.f total events processed\n%.f index failures\n%.f throughput\n%.f sources\nCheck took %v\n",
-			total["events"].(float64), index["total"].(float64), tput["throughput"].(float64), inputs["total"].(float64), elapsed), nil)
+			total["events"].(float64), index["count"].(float64), 
+tput["throughput"].(float64), inputs["total"].(float64), elapsed), nil)
 	}
 
 	// convert indexwarn and indexcrit strings to float64 variables for comparison below @@ -168,17 +168,17 @@
 	}
 
 	// handle index thresholds
-	if index["total"].(float64) < indexwarn2 && index["total"].(float64) < indexcrit2 {
+	if index["count"].(float64) < indexwarn2 && index["count"].(float64) < 
+indexcrit2 {
 		quit(OK, fmt.Sprintf("Service is running!\n%.f total events processed\n%.f index failures\n%.f throughput\n%.f sources\nCheck took %v\n",
-			total["events"].(float64), index["total"].(float64), tput["throughput"].(float64), inputs["total"].(float64), elapsed), nil)
+			total["events"].(float64), index["count"].(float64), 
+tput["throughput"].(float64), inputs["total"].(float64), elapsed), nil)
 	}
-	if index["total"].(float64) >= indexwarn2 && index["total"].(float64) < indexcrit2 {
+	if index["count"].(float64) >= indexwarn2 && index["count"].(float64) 
+< indexcrit2 {
 		quit(WARNING, fmt.Sprintf("Index Failure above Warning Limit!\nService is running\n%.f total events processed\n%.f index failures\n%.f throughput\n%.f sources\nCheck took %v\n",
-			total["events"].(float64), index["total"].(float64), tput["throughput"].(float64), inputs["total"].(float64), elapsed), nil)
+			total["events"].(float64), index["count"].(float64), 
+tput["throughput"].(float64), inputs["total"].(float64), elapsed), nil)
 	}
-	if index["total"].(float64) >= indexcrit2 {
+	if index["count"].(float64) >= indexcrit2 {
 		quit(CRITICAL, fmt.Sprintf("Index Failure above Critical Limit!\nService is running\n%.f total events processed\n%.f index failures\n%.f throughput\n%.f sources\nCheck took %v\n",
-			total["events"].(float64), index["total"].(float64), tput["throughput"].(float64), inputs["total"].(float64), elapsed), nil)
+			total["events"].(float64), index["count"].(float64), 
+tput["throughput"].(float64), inputs["total"].(float64), elapsed), nil)
 	}
 
 }

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.