GithubHelp home page GithubHelp logo

catinello / nagios-check-graylog2 Goto Github PK

View Code? Open in Web Editor NEW
9.0 7.0 10.0 23 KB

Nagios Graylog2 checks via REST API the availability of the service.

License: BSD 2-Clause "Simplified" License

Go 100.00%
nagios graylog2 monitoring

nagios-check-graylog2's Issues

Suggestion for errors count

If your server generates a lot of error messages the current polling can be extremely non-performant. I changed the API call from /system/indexer/failures to /system/indexer/failures/count?since=date and performance went from 2 minutes to 500 ms. Here is my diff:

--- main.go	2018-03-16 06:12:54.000000000 -0500
+++ /home/schmit/main.go	2018-08-06 10:55:30.400785192 -0500
@@ -141,7 +141,7 @@
 		quit(WARNING, fmt.Sprintf("lb_status: %v", system["lb_status"].(string)), nil)
 	}
 
-	index := query(c+"/system/indexer/failures", *user, *pass)
+	index := 
+query(c+"/system/indexer/failures/count?since="+time.Now().AddDate(0, 
+0, -1).Format("2006-01-02"), *user, *pass)
 	tput := query(c+"/system/throughput", *user, *pass)
 	inputs := query(c+"/system/inputs", *user, *pass)
 	total := query(c+"/count/total", *user, *pass) @@ -149,12 +149,12 @@
 	elapsed := time.Since(start)
 
 	// generate performance data output
-	perf(elapsed.Seconds(), total["events"].(float64), inputs["total"].(float64), tput["throughput"].(float64), index["total"].(float64))
+	perf(elapsed.Seconds(), total["events"].(float64), 
+inputs["total"].(float64), tput["throughput"].(float64), 
+index["count"].(float64))
 
 	// fix for backwards compatiblity if no index error threshold is set
 	if len(*indexwarn) == 0 || len(*indexcrit) == 0 {
 		quit(OK, fmt.Sprintf("Service is running!\n%.f total events processed\n%.f index failures\n%.f throughput\n%.f sources\nCheck took %v\n",
-			total["events"].(float64), index["total"].(float64), tput["throughput"].(float64), inputs["total"].(float64), elapsed), nil)
+			total["events"].(float64), index["count"].(float64), 
+tput["throughput"].(float64), inputs["total"].(float64), elapsed), nil)
 	}
 
 	// convert indexwarn and indexcrit strings to float64 variables for comparison below @@ -168,17 +168,17 @@
 	}
 
 	// handle index thresholds
-	if index["total"].(float64) < indexwarn2 && index["total"].(float64) < indexcrit2 {
+	if index["count"].(float64) < indexwarn2 && index["count"].(float64) < 
+indexcrit2 {
 		quit(OK, fmt.Sprintf("Service is running!\n%.f total events processed\n%.f index failures\n%.f throughput\n%.f sources\nCheck took %v\n",
-			total["events"].(float64), index["total"].(float64), tput["throughput"].(float64), inputs["total"].(float64), elapsed), nil)
+			total["events"].(float64), index["count"].(float64), 
+tput["throughput"].(float64), inputs["total"].(float64), elapsed), nil)
 	}
-	if index["total"].(float64) >= indexwarn2 && index["total"].(float64) < indexcrit2 {
+	if index["count"].(float64) >= indexwarn2 && index["count"].(float64) 
+< indexcrit2 {
 		quit(WARNING, fmt.Sprintf("Index Failure above Warning Limit!\nService is running\n%.f total events processed\n%.f index failures\n%.f throughput\n%.f sources\nCheck took %v\n",
-			total["events"].(float64), index["total"].(float64), tput["throughput"].(float64), inputs["total"].(float64), elapsed), nil)
+			total["events"].(float64), index["count"].(float64), 
+tput["throughput"].(float64), inputs["total"].(float64), elapsed), nil)
 	}
-	if index["total"].(float64) >= indexcrit2 {
+	if index["count"].(float64) >= indexcrit2 {
 		quit(CRITICAL, fmt.Sprintf("Index Failure above Critical Limit!\nService is running\n%.f total events processed\n%.f index failures\n%.f throughput\n%.f sources\nCheck took %v\n",
-			total["events"].(float64), index["total"].(float64), tput["throughput"].(float64), inputs["total"].(float64), elapsed), nil)
+			total["events"].(float64), index["count"].(float64), 
+tput["throughput"].(float64), inputs["total"].(float64), elapsed), nil)
 	}
 
 }

Authentication issue

Hello,

First of all, thanks a lot for this great plugin.

Here is my issue : I can't authenticate using a non admin user: I created a graylog limited user (role: reader) but when I run checkgraylog2, I have a 403 error (if I use an admin user, it works )

[root@localhost cacti]# ./check_graylog2 -l http://192.168.1.2:12900/ -insecure -u limited_user -p big_password

CRITICAL - Graylog2 API replied with HTTP code 403|time=0.000000;;;; total=0;;;; sources=0;;;; throughput=0;;;; index_failures=0;;;;

Is there any way to use a non admin user (as the passwords is used/stored in clear I prefer no to use an admin user)

Also, can you tell me which graylog metrics you use to check graylog health ? I found some on https://www.graylog.org/blog/86-back-to-basics-monitoring-graylog but I didn't found how to check if the service is running and how to check source number

Thanks for your help !

Trouble executing in librenms

I have successfully pulled down the script and installed dependencies. I am able to run the script through the command line; however importing it into librenms is causing an service error. Usually this is because of either the wrong arguments or firewall issues.

I copied the command line arguments. I am able to hit the graylog instance with script.

Any other recommendations to check the error?

On Ubuntu 16.04.2 LTS

UNKNOWN - Can not parse JSON from Graylog2 API|time=0.000000;;;; total=0;;;; sources=0;;;; throughput=0;;;; index_failures=0;;;;

Running the latest Graylog and the API works fine in a browser - Chrome used for testing it

/usr/local/nagios/libexec/check_graylog2 -l http://192.168.1.95:9000/api/ -u admin -p admin

Include additional performance data

It would be great to have the following counters included in your plugin as performance data - with different Warning/Critical barriers as Index errors:

  • unprocessed messages (in Case Graylog cannot write to ES, or has performance problems due to bad pipelines, grok patterns, etc...
  • journal size in percent (same reason as above)
  • Input Buffer, Process Buffer, Output Buffer

Of course, everything optional - but for larger installations a tracking of each metric would be very useful.

UNKNOWN - Cannot parse JSON from Graylog2 API error

Hi,

I am running Graylog 2.1.2 version on Centos 7 64-bit and I am running Icinga2 on Centos 6 64-bit. When I try to check the performance of graylog2 through Icinga I am getting the "UNKNOWN - Cannot parse JSON from Graylog2 API" error

./check_graylog2 -l https://syslogsrv-ndi.example.com:9000/ -u admin -p server -insecure

UNKNOWN - Can not parse JSON from Graylog2 API|time=0.000000;;;; total=0;;;; sources=0;;;; throughput=0;;;; index_failures=0;;;;

What could be the issue, any missing package etc..?/

Thanks & Regards
Ankush Grover

Nagios Graphs

Is there a way of generating graphs of "index failures"?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.