GithubHelp home page GithubHelp logo

appuio / nagios-plugins-openshift Goto Github PK

View Code? Open in Web Editor NEW
25.0 20.0 15.0 301 KB

Nagios/Icinga 2 Plugins for monitoring OpenShift clusters

License: BSD 3-Clause "New" or "Revised" License

Makefile 0.95% Python 63.23% Shell 35.82%
nagios nagios-plugins icinga icinga-plugins openshift openshift-monitoring monitoring monitoring-plugins

nagios-plugins-openshift's Introduction

Nagios/Icinga plugins for monitoring OpenShift 3.x

This package provides Nagios-compatible plugins to verify the operation of OpenShift clusters.

Prerequisites

  • Ubuntu 16.04 LTS
  • oc binary

The plugins are tested with Icinga 2.5 or newer, but they should also work with other consumers of Nagios-compatible plugins.

Getting started

Install the plugins from our Ubuntu PPA, or build the Debian packages from source.

List of plugins

Each plugin has a list of parameters available using the argument -h.

check_hawkular_machine_timestamp

Check whether the monitoring data in Hawkular has been updated recently.

check_openshift_cert_expiry_report

Check status of all certificates managed and reported on by OpenShift Ansible.

check_openshift_es_stats

Collect statistics from Elasticsearch instance (i.e. part of the aggregated logging system) with optional limits.

check_openshift_node

Check status of a node within a cluster.

check_openshift_node_fluentd

Check whether a Fluentd pod is running on every machine.

check_openshift_node_list

Check whether list of nodes in cluster matches passed list. Reports on unexpected and missing nodes.

check_openshift_node_log_heartbeat

Query Elasticsearch to determine whether node has recently submitted timestamped log message to logging component.

check_openshift_node_resources

Check whether node resources (CPU, memory) are within given limits.

check_openshift_object_stats

Compute statistics on a number of cluster objects and apply given limits.

check_openshift_pod_count

Check whether number of running pods for a given namespace and selector is equal to or larger than expected.

check_openshift_pod_cpu_usage

Retrieve and apply limits to CPU usage by pods. Requires the OpenShift metrics component.

check_openshift_pod_memory

Retrieve and apply limits to memory usage by pods. Requires the OpenShift metrics component.

check_openshift_pod_node_alloc

Check whether all pods matching given selector are running on disparate nodes.

check_openshift_pod_status_count

Retrieve metrics over whole cluster for each recognized pod status, i.e. Running or CrashLoopBackOff.

check_openshift_project_phase

Check whether all projects are in a healthy status, i.e. active.

check_openshift_pv_avail

Apply limits to number of available physical volumes for given selector or capacity.

check_openshift_pvc_phase

Check for unhealthy persistent volume claims.

Contributions

Each contribution is very welcome--be it an issue or a pull request. We're happy to accept pull requests so long as they meet the existing code quality and design.

  1. Fork repository (https://github.com/appuio/nagios-plugins-openshift/fork)
  2. Create feature branch (git checkout -b my-new-feature)
  3. Commit changes (git commit -av)
  4. Push to branch (git push origin my-new-feature)
  5. Create a pull request

nagios-plugins-openshift's People

Contributors

54nd20 avatar hansmi avatar kallies avatar megian avatar mhutter avatar simu avatar srueg avatar tobru avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

nagios-plugins-openshift's Issues

Installation brokes OC path

Trying to install using make install:

[root@HIDDEN nagios-plugins-openshift]# make install
mkdir -p /usr/lib/nagios-plugins-openshift
cp new-app-and-wait "/usr/lib/nagios-plugins-openshift/new-app-and-wait"
sed -r \
	-e 's#\b(OPENSHIFT_CLIENT_BINARY=)/usr/bin/oc\b#\1usr/lib/openshift-origin-client-tools/oc#' \
	< utils \
	> "/usr/lib/nagios-plugins-openshift/utils"
sed -r \
	-e 's#(^\. )/usr/lib(/nagios-plugins-openshift/utils)$#\1usr/lib\2#g' \
	< write-config \
	> "/usr/lib/nagios-plugins-openshift/write-config"
chmod +x /usr/lib/nagios-plugins-openshift/{utils,write-config,new-app-and-wait}
mkdir -p /usr/lib/nagios/plugins
set -e && for i in check_*; do \
	echo "Patching $i ..." >&2 && \
	sed -re 's#(^\. )/usr/lib(/nagios-plugins-openshift/utils)$#\1usr/lib\2#g' \
		< "$i" \
		> "/usr/lib/nagios/plugins/$(basename "$i")"; \
done
Patching check_hawkular_machine_timestamp ...
Patching check_openshift_cert_expiry_report ...
Patching check_openshift_endtoend_result ...
Patching check_openshift_es_stats ...
Patching check_openshift_node ...
Patching check_openshift_node_fluentd ...
Patching check_openshift_node_list ...
Patching check_openshift_node_resources ...
Patching check_openshift_pod_count ...
Patching check_openshift_pod_cpu_usage ...
Patching check_openshift_pod_memory ...
Patching check_openshift_pod_node_alloc ...
Patching check_openshift_pod_status_count ...
Patching check_openshift_project_phase ...
Patching check_openshift_project_pod_phase ...
Patching check_openshift_pv_avail ...
Patching check_openshift_pvc_phase ...
chmod +x /usr/lib/nagios/plugins/check_*
mkdir -p /usr/share/icinga2/include/plugins-contrib.d
cp -v openshift*.conf /usr/share/icinga2/include/plugins-contrib.d
'openshift.conf' -> '/usr/share/icinga2/include/plugins-contrib.d/openshift.conf'
'openshift-dns.conf' -> '/usr/share/icinga2/include/plugins-contrib.d/openshift-dns.conf'

But executions fails:

[root@HIDDEN nagios-plugins-openshift]# /usr/lib/nagios/plugins/check_openshift_node -f /path/to/kubeconfig -n ocp-node-01.domain.tld
/usr/lib/nagios/plugins/check_openshift_node: line 5: usr/lib/nagios-plugins-openshift/utils: No such file or directory

When having multiple critical, only 1 is shown with check_openshift_es_stats

First of all thank you for this plugin for elasticsearch, we'll use it for elasticsearch in general since it has exact the functionality we're looking for.

We just noticed 1 thing (just a test in this case):

/usr/lib64/nagios/plugins/check_openshift_es_stats --endpoint http://127.0.0.1:9200 --jvm-heap-used-percent-critical 20 --total-jvm-heap-used-percent-critical 20 --fs-used-percent-critical 1

All three are critical, but the plugin outputs only 1 as criticial.

Example:

STATSQUERY CRITICAL - fs-used-percent-CtOHZDU is 66% (outside range 0:10) | 'fs-available-CtOHZDU'=13367MB;;;0;40188 'fs-available-percent-CtOHZDU'=34.0%;;;0;100 'fs-used-CtOHZDU'=26821MB;;;0;40188 'fs-used-percent-CtOHZDU'=66%;;10;0;100 'jvm-gc-collector-old-collection-count-CtOHZDU'=2;;;0 'jvm-gc-collector-old-collection-time-CtOHZDU'=0s;;;0 'jvm-gc-collector-old-collection-time-avg-CtOHZDU'=0.0445s;;;0 'jvm-gc-collector-old-collection-time-percent-CtOHZDU'=1e-05%;;;0;100 'jvm-gc-collector-young-collection-count-CtOHZDU'=2719;;;0 'jvm-gc-collector-young-collection-time-CtOHZDU'=15s;;;0 'jvm-gc-collector-young-collection-time-avg-CtOHZDU'=0.00567s;;;0 'jvm-gc-collector-young-collection-time-percent-CtOHZDU'=0.00159%;;;0;100 'jvm-heap-available-CtOHZDU'=427MB;;;0;990 'jvm-heap-available-percent-CtOHZDU'=44.0%;;;0;100 'jvm-heap-used-CtOHZDU'=563MB;;;0;990 'jvm-heap-used-percent-CtOHZDU'=56%;;20;0;100 'jvm-non-heap-used-CtOHZDU'=154MB;;;0 'node-fs-available-max'=13367MB;;;0 'node-fs-available-min'=13367MB;;;0 'node-fs-available-percent-max'=34.0%;;;0;100 'node-fs-available-percent-min'=34.0%;;;0;100 'node-fs-used-max'=26821MB;;;0 'node-fs-used-min'=26821MB;;;0 'node-fs-used-percent-max'=66%;;;0;100 'node-fs-used-percent-min'=66%;;;0;100 'node-jvm-heap-available-max'=427MB;;;0 'node-jvm-heap-available-min'=427MB;;;0 'node-jvm-heap-available-percent-max'=44.0%;;;0;100 'node-jvm-heap-available-percent-min'=44.0%;;;0;100 'node-jvm-heap-used-max'=563MB;;;0 'node-jvm-heap-used-min'=563MB;;;0 'node-jvm-heap-used-percent-max'=56%;;;0;100 'node-jvm-heap-used-percent-min'=56%;;;0;100 'process-cpu-percent-CtOHZDU'=0%;;;0 'total-fs-available'=13367MB;;;0;40188 'total-fs-available-percent'=33%;;;0;100 'total-fs-used'=26821MB;;;0;40188 'total-fs-used-percent'=66%;;;0;100 'total-jvm-heap-available'=427MB;;;0;990 'total-jvm-heap-available-percent'=43%;;;0;100 'total-jvm-heap-used'=563MB;;;0;990 'total-jvm-heap-used-percent'=56%;;20;0;100

It should also notify about the heap in this case, but it doesn't seem to.

Probably easy to solve ? Or isn't this plugin made to be used like this? :/ :)

check_openshift_node_log_heartbeat not working with oc 3.11

Dear,
launching
check_openshift_node_log_heartbeat --token-from token --endpoint https://endpoint.xyz nodename
seems not working with OC 3.11 reporting always critical:
HEARTBEAT CRITICAL - Node(s) not reporting heartbeat or not contained in query result: nodename | doc_count_error_upper_bound=0 sum_other_doc_count=0;0

Perhaps to do with the fact that the date field in Elastic is not in the format '%Y-%m-%dT%H:%M:%S.%f%z' having unexpected : in time zone, for example: '2019-12-09T15:05:38.647015+00:00'
My Openshift version below

oc version
oc v3.11.0+0cbc58b
kubernetes v1.11.0+d4cacc0
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://endpoint.xyz 
openshift v3.11.135
kubernetes v1.11.0+d4cacc0

Thanks

How to use these two plugins ? Can you guide me with this.

Please check these 4 commands with your team.

check_openshift_pod_cpu_usage - Need exact parameters
Retrieve and apply limits to CPU usage by pods. Requires the OpenShift metrics component.

check_openshift_pod_memory - Need exact parameters
Retrieve and apply limits to memory usage by pods. Requires the OpenShift metrics component.

check_openshift_es_stats - Need exact parameters
Collect statistics from Elasticsearch instance (i.e. part of the aggregated logging system) with optional limits.

check_openshift_node_log_heartbeat - Need exact parameters
Query Elasticsearch to determine whether node has recently submitted timestamped log message to logging component.

SSL Error

How would I get around the SSL errors or the need of a trusted SSL certificate?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.