appuio / nagios-plugins-openshift Goto Github PK

View Code? Open in Web Editor NEW

25.0 20.0 15.0 301 KB

Nagios/Icinga 2 Plugins for monitoring OpenShift clusters

License: BSD 3-Clause "New" or "Revised" License

Makefile 0.95% Python 63.23% Shell 35.82%

nagios nagios-plugins icinga icinga-plugins openshift openshift-monitoring monitoring monitoring-plugins

nagios-plugins-openshift's Introduction

Nagios/Icinga plugins for monitoring OpenShift 3.x

This package provides Nagios-compatible plugins to verify the operation of OpenShift clusters.

Prerequisites

Ubuntu 16.04 LTS
oc binary

The plugins are tested with Icinga 2.5 or newer, but they should also work with other consumers of Nagios-compatible plugins.

Getting started

Install the plugins from our Ubuntu PPA, or build the Debian packages from source.

List of plugins

Each plugin has a list of parameters available using the argument -h.

`check_hawkular_machine_timestamp`

Check whether the monitoring data in Hawkular has been updated recently.

`check_openshift_cert_expiry_report`

Check status of all certificates managed and reported on by OpenShift Ansible.

`check_openshift_es_stats`

Collect statistics from Elasticsearch instance (i.e. part of the aggregated logging system) with optional limits.

`check_openshift_node`

Check status of a node within a cluster.

`check_openshift_node_fluentd`

Check whether a Fluentd pod is running on every machine.

`check_openshift_node_list`

Check whether list of nodes in cluster matches passed list. Reports on unexpected and missing nodes.

`check_openshift_node_log_heartbeat`

Query Elasticsearch to determine whether node has recently submitted timestamped log message to logging component.

`check_openshift_node_resources`

Check whether node resources (CPU, memory) are within given limits.

`check_openshift_object_stats`

Compute statistics on a number of cluster objects and apply given limits.

`check_openshift_pod_count`

Check whether number of running pods for a given namespace and selector is equal to or larger than expected.

`check_openshift_pod_cpu_usage`

Retrieve and apply limits to CPU usage by pods. Requires the OpenShift metrics component.

`check_openshift_pod_memory`

Retrieve and apply limits to memory usage by pods. Requires the OpenShift metrics component.

`check_openshift_pod_node_alloc`

Check whether all pods matching given selector are running on disparate nodes.

`check_openshift_pod_status_count`

Retrieve metrics over whole cluster for each recognized pod status, i.e. Running or CrashLoopBackOff.

`check_openshift_project_phase`

Check whether all projects are in a healthy status, i.e. active.

`check_openshift_pv_avail`

Apply limits to number of available physical volumes for given selector or capacity.

`check_openshift_pvc_phase`

Check for unhealthy persistent volume claims.

Contributions

Each contribution is very welcome--be it an issue or a pull request. We're happy to accept pull requests so long as they meet the existing code quality and design.

Fork repository (https://github.com/appuio/nagios-plugins-openshift/fork)
Create feature branch (git checkout -b my-new-feature)
Commit changes (git commit -av)
Push to branch (git push origin my-new-feature)
Create a pull request

nagios-plugins-openshift's People

Contributors

Stargazers

Watchers

Forkers

prometherion ibrassfield ffacp jeongjaegu slaterx eraid6 z1pp4 angelmaggio1 54nd20 lumoc ci-cd-jenkinsx-kubernetes mvilche nandlalyadav57 jjserranog dpezerovic

nagios-plugins-openshift's Issues

Installation brokes OC path

Trying to install using make install:

[root@HIDDEN nagios-plugins-openshift]# make install
mkdir -p /usr/lib/nagios-plugins-openshift
cp new-app-and-wait "/usr/lib/nagios-plugins-openshift/new-app-and-wait"
sed -r \
	-e 's#\b(OPENSHIFT_CLIENT_BINARY=)/usr/bin/oc\b#\1usr/lib/openshift-origin-client-tools/oc#' \
	< utils \
	> "/usr/lib/nagios-plugins-openshift/utils"
sed -r \
	-e 's#(^\. )/usr/lib(/nagios-plugins-openshift/utils)$#\1usr/lib\2#g' \
	< write-config \
	> "/usr/lib/nagios-plugins-openshift/write-config"
chmod +x /usr/lib/nagios-plugins-openshift/{utils,write-config,new-app-and-wait}
mkdir -p /usr/lib/nagios/plugins
set -e && for i in check_*; do \
	echo "Patching $i ..." >&2 && \
	sed -re 's#(^\. )/usr/lib(/nagios-plugins-openshift/utils)$#\1usr/lib\2#g' \
		< "$i" \
		> "/usr/lib/nagios/plugins/$(basename "$i")"; \
done
Patching check_hawkular_machine_timestamp ...
Patching check_openshift_cert_expiry_report ...
Patching check_openshift_endtoend_result ...
Patching check_openshift_es_stats ...
Patching check_openshift_node ...
Patching check_openshift_node_fluentd ...
Patching check_openshift_node_list ...
Patching check_openshift_node_resources ...
Patching check_openshift_pod_count ...
Patching check_openshift_pod_cpu_usage ...
Patching check_openshift_pod_memory ...
Patching check_openshift_pod_node_alloc ...
Patching check_openshift_pod_status_count ...
Patching check_openshift_project_phase ...
Patching check_openshift_project_pod_phase ...
Patching check_openshift_pv_avail ...
Patching check_openshift_pvc_phase ...
chmod +x /usr/lib/nagios/plugins/check_*
mkdir -p /usr/share/icinga2/include/plugins-contrib.d
cp -v openshift*.conf /usr/share/icinga2/include/plugins-contrib.d
'openshift.conf' -> '/usr/share/icinga2/include/plugins-contrib.d/openshift.conf'
'openshift-dns.conf' -> '/usr/share/icinga2/include/plugins-contrib.d/openshift-dns.conf'

But executions fails:

[root@HIDDEN nagios-plugins-openshift]# /usr/lib/nagios/plugins/check_openshift_node -f /path/to/kubeconfig -n ocp-node-01.domain.tld
/usr/lib/nagios/plugins/check_openshift_node: line 5: usr/lib/nagios-plugins-openshift/utils: No such file or directory

When having multiple critical, only 1 is shown with check_openshift_es_stats

First of all thank you for this plugin for elasticsearch, we'll use it for elasticsearch in general since it has exact the functionality we're looking for.

We just noticed 1 thing (just a test in this case):

/usr/lib64/nagios/plugins/check_openshift_es_stats --endpoint http://127.0.0.1:9200 --jvm-heap-used-percent-critical 20 --total-jvm-heap-used-percent-critical 20 --fs-used-percent-critical 1

All three are critical, but the plugin outputs only 1 as criticial.

Example:

STATSQUERY CRITICAL - fs-used-percent-CtOHZDU is 66% (outside range 0:10) | 'fs-available-CtOHZDU'=13367MB;;;0;40188 'fs-available-percent-CtOHZDU'=34.0%;;;0;100 'fs-used-CtOHZDU'=26821MB;;;0;40188 'fs-used-percent-CtOHZDU'=66%;;10;0;100 'jvm-gc-collector-old-collection-count-CtOHZDU'=2;;;0 'jvm-gc-collector-old-collection-time-CtOHZDU'=0s;;;0 'jvm-gc-collector-old-collection-time-avg-CtOHZDU'=0.0445s;;;0 'jvm-gc-collector-old-collection-time-percent-CtOHZDU'=1e-05%;;;0;100 'jvm-gc-collector-young-collection-count-CtOHZDU'=2719;;;0 'jvm-gc-collector-young-collection-time-CtOHZDU'=15s;;;0 'jvm-gc-collector-young-collection-time-avg-CtOHZDU'=0.00567s;;;0 'jvm-gc-collector-young-collection-time-percent-CtOHZDU'=0.00159%;;;0;100 'jvm-heap-available-CtOHZDU'=427MB;;;0;990 'jvm-heap-available-percent-CtOHZDU'=44.0%;;;0;100 'jvm-heap-used-CtOHZDU'=563MB;;;0;990 'jvm-heap-used-percent-CtOHZDU'=56%;;20;0;100 'jvm-non-heap-used-CtOHZDU'=154MB;;;0 'node-fs-available-max'=13367MB;;;0 'node-fs-available-min'=13367MB;;;0 'node-fs-available-percent-max'=34.0%;;;0;100 'node-fs-available-percent-min'=34.0%;;;0;100 'node-fs-used-max'=26821MB;;;0 'node-fs-used-min'=26821MB;;;0 'node-fs-used-percent-max'=66%;;;0;100 'node-fs-used-percent-min'=66%;;;0;100 'node-jvm-heap-available-max'=427MB;;;0 'node-jvm-heap-available-min'=427MB;;;0 'node-jvm-heap-available-percent-max'=44.0%;;;0;100 'node-jvm-heap-available-percent-min'=44.0%;;;0;100 'node-jvm-heap-used-max'=563MB;;;0 'node-jvm-heap-used-min'=563MB;;;0 'node-jvm-heap-used-percent-max'=56%;;;0;100 'node-jvm-heap-used-percent-min'=56%;;;0;100 'process-cpu-percent-CtOHZDU'=0%;;;0 'total-fs-available'=13367MB;;;0;40188 'total-fs-available-percent'=33%;;;0;100 'total-fs-used'=26821MB;;;0;40188 'total-fs-used-percent'=66%;;;0;100 'total-jvm-heap-available'=427MB;;;0;990 'total-jvm-heap-available-percent'=43%;;;0;100 'total-jvm-heap-used'=563MB;;;0;990 'total-jvm-heap-used-percent'=56%;;20;0;100

It should also notify about the heap in this case, but it doesn't seem to.

Probably easy to solve ? Or isn't this plugin made to be used like this? :/ :)

check_openshift_node_log_heartbeat not working with oc 3.11

Dear,
launching
check_openshift_node_log_heartbeat --token-from token --endpoint https://endpoint.xyz nodename
seems not working with OC 3.11 reporting always critical:
HEARTBEAT CRITICAL - Node(s) not reporting heartbeat or not contained in query result: nodename | doc_count_error_upper_bound=0 sum_other_doc_count=0;0

Perhaps to do with the fact that the date field in Elastic is not in the format '%Y-%m-%dT%H:%M:%S.%f%z' having unexpected : in time zone, for example: '2019-12-09T15:05:38.647015+00:00'
My Openshift version below

oc version
oc v3.11.0+0cbc58b
kubernetes v1.11.0+d4cacc0
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://endpoint.xyz 
openshift v3.11.135
kubernetes v1.11.0+d4cacc0

Thanks

How to use these two plugins ? Can you guide me with this.

Please check these 4 commands with your team.

check_openshift_pod_cpu_usage - Need exact parameters
Retrieve and apply limits to CPU usage by pods. Requires the OpenShift metrics component.

check_openshift_pod_memory - Need exact parameters
Retrieve and apply limits to memory usage by pods. Requires the OpenShift metrics component.

check_openshift_es_stats - Need exact parameters
Collect statistics from Elasticsearch instance (i.e. part of the aggregated logging system) with optional limits.

check_openshift_node_log_heartbeat - Need exact parameters
Query Elasticsearch to determine whether node has recently submitted timestamped log message to logging component.

SSL Error

How would I get around the SSL errors or the need of a trusted SSL certificate?

appuio / nagios-plugins-openshift Goto Github PK

nagios-plugins-openshift's Introduction

Nagios/Icinga plugins for monitoring OpenShift 3.x

Prerequisites

Getting started

List of plugins

check_hawkular_machine_timestamp

check_openshift_cert_expiry_report

check_openshift_es_stats

check_openshift_node

check_openshift_node_fluentd

check_openshift_node_list

check_openshift_node_log_heartbeat

check_openshift_node_resources

check_openshift_object_stats

check_openshift_pod_count

check_openshift_pod_cpu_usage

check_openshift_pod_memory

check_openshift_pod_node_alloc

check_openshift_pod_status_count

check_openshift_project_phase

check_openshift_pv_avail

check_openshift_pvc_phase