aswen / nagios-plugins Goto Github PK

View Code? Open in Web Editor NEW

32.0 32.0 51.0 89 KB

Scripts and plugins for Nagios

Shell 100.00%

nagios-plugins's People

Contributors

Stargazers

Watchers

nagios-plugins's Issues

last_run_report.yaml is not readable by the check_puppet_agent script

There is apparently "secure" info in last_run_report so puppetlabs have made the file not world readable. That means that it is not possible for the script to read that file.

eval is evil.

If somebody would use quotes (or backticks) in some kind of message, using eval would allow them to execute malicious code on purpose or by accident.

IE: If any backticks are used, you would be able to reboot a server.

Multi-line Error Messages get Cut Off

When an error message is "short enough" to be found by check_puppet_agent but is also multi-lines long, the script spits out the multi-lines which nagios cuts off.

Output: CRITICAL: Last run had 1 or more errors. Check the logs. FIRST_ERROR ('change from purged to present failed: Execution of ''/bin/yum -d 0
-e 0 -y list '' returned 1: Error: No matching Packages
to list')

Expected: CRITICAL: Last run had 1 or more errors. Check the logs. FIRST_ERROR ('change from purged to present failed: Execution of ''/bin/yum -d 0 -e 0 -y list '' returned 1: Error: No matching Packages to list')

Not working through nrpe

It's not working for me for some reason. I can execute the script fine on the client:
[root@dev003 ~]# /usr/lib64/nagios/plugins/check_puppet_agent
OK: Puppet agent "3.3.0" running catalogversion 1423524659, and executed at Mon 09 Feb 2015 04:24:53 PM PST for last time

But it won't work from the server using nrpe:

/usr/lib64/nagios/plugins/check_nrpe -H dev003 -c check_puppet_agent

CRITICAL: Last run was 1423529323 seconds ago. crit is 7200

It looks like it's using puppet version as the last run time. Both systems are CentOS-6

Ubuntu 18.04/16.04 and RHEL5 support broken

As of commit 0dcd691, associative arrays (available as of Bash version 4) are now a requirement for check_puppet_agent. That means Ubuntu 16.04/18.04 LTS, which link /bin/sh to /bin/dash, no longer function due to Dash not having associative array support. RHEL5 ships with Bash 3 which doesn't have associative array support so that broke too.

One option is to revert 0dcd691. The commit doesn't add functionality from what I can tell, only an alternate style to accomplish the same thing.

Another option, and this fix is working for my Ubuntu 16.04/18.04 fleet, is to change the first #!/bin/sh to #!/bin/bash (assuming Bash is installed of course). This doesn't reconcile the Bash 4 requirement that older OS's such as RHEL5 don't have, though.

If it helps, I could submit a PR to change to /bin/bash and maybe add in a Bash version check or something. But I feel the best choice is to revert 0dcd691 if supporting older Bash versions is a priority.

-Paul

sudoers requested are not matching sudoers required

The plugin run one of:

nagios-plugins/check_puppet_agent

Line 224 in 2455f72

puppet_config_print="sudo $PUPPET config print all"

nagios-plugins/check_puppet_agent

Line 227 in 2455f72

puppet_config_print="sudo $PUPPET config print"

nagios-plugins/check_puppet_agent

Line 230 in 2455f72

puppet_config_print="sudo $PUPPET config print --section agent"

As by:

nagios-plugins/check_puppet_agent

Line 234 in 2455f72

puppet_config_output="$($puppet_config_print)"

While the sudoes requested

nagios-plugins/check_puppet_agent

Line 41 in 2455f72

 # Cmnd_Alias PUPPETCHECK=/usr/bin/puppet config print --section agent runinterval,\ 

does not include this command.

The result on nrpe is like:
UNKNOWN: Internal error: Puppet version unknown from �[1:31mError: Could not initialize global default settings: Permission denied @ dir_s_mkdir - /root/.puppetlabs�[0m
As the requested sudoers are much more tailored, i would suggest to check the least privilege model before deciding if:

change sudoers requested
change the code to avoid the call

Removed sudo from line 126 now its working on Centos 6.6

Hi,
I have remove the sudo from the line 126 https://github.com/aswen/nagios-plugins/blob/master/check_puppet_agent#L126

My version:
[ -z "${lastrunfile}" ] && lastrunfile=$(/usr/bin/puppet config print lastrunfile)

With sudo this command didn't work and $lastrunfile was empty. Now that I removed it the plugin works.

Socket timeout after 10 seconds

Hi. I added your check_puppet_agent module to /usr/local/nagios/libexec, and gave it 755 perms with nagios.nagios ownership.

When I run this command:

/usr/local/nagios/libexec/j1n_check_puppet_agent -w 1800 -c 3600

I get expected output, but when I run the following command I keep getting socket timeouts:

/usr/local/nagios/libexec/check_nrpe -H service-a-2.sn1.vpc1.j1n.us -c check_local_puppet_agent

Here is the excerpt from /usr/local/nagios/etc/nrpe.cfg:

command[check_local_puppet_agent]=/usr/bin/sudo /usr/local/nagios/libexec/j1n_check_puppet_agent -w 1800 -c 3600

And here is the sudo entry in /etc/sudoers:

Cmnd_Alias PUPPETCHECK=/usr/bin/puppet config print runinterval,\ /usr/bin/puppet config print splay,\ /usr/bin/puppet config print splaylimit,\ /usr/bin/puppet config print agent_disabled_lockfile,\ /usr/bin/puppet config print lastrunfile,\ /usr/bin/puppet config print pidfile, \ /usr/local/nagios/libexec/j1n_check_puppet_agent Defaults:nagios !requiretty nagios ALL = NOPASSWD: PUPPETCHECK

What am I missing?

Could not initialize global default settings: Permission denied @ dir_s_mkdir

Check_puppet_agent can only be run as root on my install.

str@puppetpal:/usr/lib/nagios/plugins$sudo -u nagios ./check_puppet_agent
Error: Could not initialize global default settings: Permission denied @ dir_s_mkdir - /home/str/.puppetlabs
UNKNOWN: Internal error: Puppet version unknown from Error: Could not initialize global default settings: Permission denied @ dir_s_mkdir - /home/str/.puppetlabs

cat /etc/sudoers

User_Alias NAGIOS=nagios
Cmnd_Alias PUPPETCHECK=/usr/bin/puppet config print runinterval,\
                       /usr/bin/puppet config print splay,\
                       /usr/bin/puppet config print splaylimit,\
                       /usr/bin/puppet config print agent_disabled_lockfile,\
                       /usr/bin/puppet config print lastrunfile,\
                       /usr/bin/puppet config print lastrunreport,\
                       /usr/bin/puppet config print pidfile
NAGIOS     ALL=NOPASSWD:PUPPETCHECK

Execution speed improvement for check_puppet_agent

Hi,
Right now most of time script spends in running 'puppet config print' 4 times. It's inefficient and causes noticeable runtime for the script. It's not a big deal but when monitoring system is running many small scripts it's better to have them running in efficient way so that total execution time for monitoring and load is as small as possible. It would be nice to have all needed options requested with one 'puppet config print' run.

reporting "OK" when puppet run failed

I have a consistent issue with the script on debian 8 with puppet 4.8.2 .

A failed puppet agent run is still being reported as OK by this script. it reports the correct catalogversion.

perhaps the last_run_report.yaml format has changed? i notice the script is looking for "status: failure" but in the report the line is "status: unchanged"

last_run_summary.yaml when agent can't retrieve catalog from remote server

Hopefully I'm not jumping the gun here. I will probably do some additional digging, but wanted to report what I'm seeing.

I'm using puppet agent 4.9.2

The plugin is able to parse last_run_summary.yaml if my puppet run is a success, but consider the case where a puppet run results in the following.

"Error: Could not retrieve catalog from remote server: Error 500 on SERVER: Server Error: Evaluation Error: Error while evaluating a Resource Statement, Evaluation Error: Error while evaluating a Resource Statement, Could not find declared class...."

In that case my last_run_summary.yaml looks something like this.

version:
config:
puppet: 4.9.2
time:
last_run: 1487266691

This means that around lines 312-322 of the plugin it fails to parse last_run_summary.yaml resulting in a "UNKNOWN: last_run_summary.yaml not found, not readable or incomplete".

It seems that if a catalog can't be retrieved from the puppet master the resulting last_run_summary.yaml isn't valid for the purposes of this nagios plugin. I'm not sure how unique this case is, but I think it would be preferable to identify this case and report something about the agent being unable to retrieve it's catalog on the last puppet run instead of UNKNOWN.

You should be able to replicate this case by removing a module from your puppet environment that contains classes used in your manifests.

Not really an issue but a patch -

Im new to this but I figured I would give back to you for writing a nagios script I needed.

I made a few minor updates.:

added a simple usage statement
parsed the -w, -c optiosn with getopts.
Added default warning value of 3600
Added default critical value of 3700

Here is the patch

--- check_puppet_agent  2012-02-14 16:05:30.745482918 -0600
@@ -35,9 +35,10 @@

 # CHANGELOG:
 #20120126     A.Swen  created.
+# 20120214    trey85stang Modified, added getopts, usage, defaults

 # SETTINGS
-statefile=/var/lib/puppet/agent/state/last_run_summary.yaml
+statefile=/var/lib/puppet/state/last_run_summary.yaml

 # FUNCTIONS
 result () {
@@ -53,13 +54,43 @@
   exit $rc
 }

-# SCRIPT
-if [ $# -lt 2 ];then
-  result 5
-else
-  WARN=$1
-  CRIT=$2
-fi
+usage () {
+  echo ""
+  echo "USAGE: "
+  echo "  $0 -w 3600 -c 7200"
+  echo "    -w warning threshold"
+  echo "    -c ciritcal threshold"
+  echo ""
+  exit 1
+}
+
+CRIT=7200
+WARN=3600
+
+while getopts "w:c:" opt
+do
+  case $opt in
+    w)
+      if ! echo $OPTARG | grep -q "[A-Za-z]" && [ -n "$OPTARG" ]
+      then
+        WARN=$OPTARG
+      else
+        usage
+      fi
+    ;;
+    c)
+      if ! echo $OPTARG | grep -q "[A-Za-z]" && [ -n "$OPTARG" ]
+      then
+        CRIT=$OPTARG
+      else
+        usage
+      fi
+    ;;
+    *)
+      usage
+    ;;
+  esac
+done

 # check if state file exists
 [ -s ${statefile} ] || result 1
@@ -94,3 +125,4 @@
 result 0

 # END
+

sudo - config print lastrunfile output mismatch

So, with sudo being used for commands, and having to add nagios user to sudoers per docs at top of script, I noticed the following behavior.

If I assume the nagios user:

$ -> sudo sudo -s -u nagios
bash-4.1$ /usr/bin/puppet --configprint lastrunfile
/var/spool/nagios/.puppet/var/state/last_run_summary.yaml

But then run the same command as su:

$ -> puppet --configprint lastrunfile
/var/lib/puppet/state/last_run_summary.yaml

The result when run via nagios and su do not match, which is why the parse_yaml function was hanging indefinitely. After hardcoding the path the parse_yaml function worked as expected.

Sidenote: You should be able to remove the sudo references in the script. If NRPE was compiled to run under nagios user or via nrpe_user in nrpe.cfg, and the script has 755 perms with and accessible as nagios then you shouldn't need to mess with sudoers etc.

last_run_summary.yaml not readable

Hi,

running the script in debug mode it fail with

nagios@x:~$ bash -x /usr/local/custom/nagios/nagios-plugins-aswen/check_puppet_agent -w 3600 -c 7200 -s /var/lib/puppet/state/last_run_summary.yaml -d 0
[...]
++ sudo /usr/bin/puppet config print agent_disabled_lockfile
+ agent_disabled_lockfile=/var/lib/puppet/state/agent_disabled.lock
+ '[' -f /var/lib/puppet/state/agent_disabled.lock ']'
+ '[' -z /var/lib/puppet/state/last_run_summary.yaml ']'
+ '[' -s /var/lib/puppet/state/last_run_summary.yaml -a -r /var/lib/puppet/state/last_run_summary.yaml ']'
+ result 1
+ case $1 in
+ echo 'UNKNOWN: last_run_summary.yaml not found, not readable or incomplete'
UNKNOWN: last_run_summary.yaml not found, not readable or incomplete
+ rc=3
+ exit 3
nagios@x:~$

The reason why:
https://tickets.puppetlabs.com/browse/PUP-6936

The current stupid resolution:
chmox +x /var/lib/puppet

Feel free to suggest way to close it... I think that this should be closed on puppet side, and suggest a workaround like the one above.

(Updated by @aswen: put console output in code block)

Check_puppet_agent fails to detect the use of cached catalogs if new catalog compilation fails

My colleague @benwtr wrote:

I have a theory- this might be normal behaviour.
When puppet-agent fails to retrieve the catalog from the master, I'm pretty sure it happily uses a cached catalog and runs just fine but it looks like the --test flag for puppet-agent turns that off and fails instead of using a cached copy of the catalog.
If this is the case, it might not be what we want. We could probably disable catalog caching, or maybe update the script so alerts if the catalog couldn't be retrieved from the puppet-master.

I think he's right.

Count failed catalog compilations?

Hello:

I noticed that the puppet check doesn't check for failed runs of the form:

Warning: Not using cache on failed catalog
Error: Could not retrieve catalog; skipping run

If we change the awk calls in the variables, we could then check for NR != 1; in such runs, the failure parameter isn't even in the file.

[root@host state]# grep failure last_run_summary.yaml
    failure: 0
[root@host state]# awk -v count=0 '/failure:/ {count++; count+=$2} END{print count}' last_run_summary.yaml 
1

Checking if puppet is running... Not working.

[ "$(ps axf|egrep "/usr/bin/ruby /usr/sbin/puppetd|/usr/bin/ruby1.8 /usr/bin/puppet agent")" ] || result 4

Hello aswen, this above line does not work (at least not on my centos systems) I've been running this check for a long time but I just noticed that this doesnt work or I would of let you know sooner. anyways, the egrep line is always matched in the ps axf output. you could easily add egrep -v egrep but Im not a fan of multiple grep's.

I think this line, below, will cover most cases on all linux os's. If you are running puppet master on the same box you could get a false positive. I dont think unixs come with this command, but I dont think that matters since egrep probably doesnt come with them either.

anyways... the above can be changed with:

[ "$(pgrep puppet")" ] || result 4

after which checking if puppet is running should work as expected.

Once again, thanks for this plugin. It is appreciated.

Strict checks for either agent or master

An edge case was discovered which was causing check_puppet_agent to hang. If a node was running puppetmasterd but not puppetd, then the check_puppet_agent script would hang indefinitely when run through nrpe. The quick fix would be to tighten the regex in line 233:

ps axfww|egrep "/usr(/local)?/bin/ruby[^ ]* /usr(/local)?/s?bin/puppet(d)?$"|grep -v egrep

I would suggest a more elegant approach, to allow us to pass -r Role. agent (default) or master. This way we can conditionally change the regex accordingly, yet still use the same script. The only difference would be to add a 2nd line to nrpe.cfg, something like:

command[check_local_puppet_agent]=/usr/local/nagios/libexec/j1n_check_puppet -w 1800 -c 3600
command[check_local_puppet_agent]=/usr/local/nagios/libexec/j1n_check_puppet -r master

aswen / nagios-plugins Goto Github PK

nagios-plugins's People

Contributors

Stargazers

Watchers

Forkers

nagios-plugins's Issues

/usr/lib64/nagios/plugins/check_nrpe -H dev003 -c check_puppet_agent

In that case my last_run_summary.yaml looks something like this.

Recommend Projects

Recommend Topics

Recommend Org

Jobs