aswen / nagios-plugins Goto Github PK
View Code? Open in Web Editor NEWScripts and plugins for Nagios
Scripts and plugins for Nagios
There is apparently "secure" info in last_run_report so puppetlabs have made the file not world readable. That means that it is not possible for the script to read that file.
If somebody would use quotes (or backticks) in some kind of message, using eval would allow them to execute malicious code on purpose or by accident.
IE: If any backticks are used, you would be able to reboot
a server.
When an error message is "short enough" to be found by check_puppet_agent but is also multi-lines long, the script spits out the multi-lines which nagios cuts off.
Output: CRITICAL: Last run had 1 or more errors. Check the logs. FIRST_ERROR ('change from purged to present failed: Execution of ''/bin/yum -d 0
-e 0 -y list '' returned 1: Error: No matching Packages
to list')
Expected: CRITICAL: Last run had 1 or more errors. Check the logs. FIRST_ERROR ('change from purged to present failed: Execution of ''/bin/yum -d 0 -e 0 -y list '' returned 1: Error: No matching Packages to list')
It's not working for me for some reason. I can execute the script fine on the client:
[root@dev003 ~]# /usr/lib64/nagios/plugins/check_puppet_agent
OK: Puppet agent "3.3.0" running catalogversion 1423524659, and executed at Mon 09 Feb 2015 04:24:53 PM PST for last time
But it won't work from the server using nrpe:
CRITICAL: Last run was 1423529323 seconds ago. crit is 7200
It looks like it's using puppet version as the last run time. Both systems are CentOS-6
As of commit 0dcd691, associative arrays (available as of Bash version 4) are now a requirement for check_puppet_agent. That means Ubuntu 16.04/18.04 LTS, which link /bin/sh to /bin/dash, no longer function due to Dash not having associative array support. RHEL5 ships with Bash 3 which doesn't have associative array support so that broke too.
One option is to revert 0dcd691. The commit doesn't add functionality from what I can tell, only an alternate style to accomplish the same thing.
Another option, and this fix is working for my Ubuntu 16.04/18.04 fleet, is to change the first #!/bin/sh to #!/bin/bash (assuming Bash is installed of course). This doesn't reconcile the Bash 4 requirement that older OS's such as RHEL5 don't have, though.
If it helps, I could submit a PR to change to /bin/bash and maybe add in a Bash version check or something. But I feel the best choice is to revert 0dcd691 if supporting older Bash versions is a priority.
-Paul
The plugin run one of:
nagios-plugins/check_puppet_agent
Line 224 in 2455f72
nagios-plugins/check_puppet_agent
Line 227 in 2455f72
nagios-plugins/check_puppet_agent
Line 230 in 2455f72
As by:
nagios-plugins/check_puppet_agent
Line 234 in 2455f72
While the sudoes requested
nagios-plugins/check_puppet_agent
Line 41 in 2455f72
The result on nrpe is like:
UNKNOWN: Internal error: Puppet version unknown from �[1:31mError: Could not initialize global default settings: Permission denied @ dir_s_mkdir - /root/.puppetlabs�[0m
As the requested sudoers are much more tailored, i would suggest to check the least privilege model before deciding if:
Hi,
I have remove the sudo from the line 126 https://github.com/aswen/nagios-plugins/blob/master/check_puppet_agent#L126
My version:
[ -z "${lastrunfile}" ] && lastrunfile=$(/usr/bin/puppet config print lastrunfile)
With sudo this command didn't work and $lastrunfile was empty. Now that I removed it the plugin works.
Hi. I added your check_puppet_agent module to /usr/local/nagios/libexec, and gave it 755 perms with nagios.nagios ownership.
When I run this command:
/usr/local/nagios/libexec/j1n_check_puppet_agent -w 1800 -c 3600
I get expected output, but when I run the following command I keep getting socket timeouts:
/usr/local/nagios/libexec/check_nrpe -H service-a-2.sn1.vpc1.j1n.us -c check_local_puppet_agent
Here is the excerpt from /usr/local/nagios/etc/nrpe.cfg:
command[check_local_puppet_agent]=/usr/bin/sudo /usr/local/nagios/libexec/j1n_check_puppet_agent -w 1800 -c 3600
And here is the sudo entry in /etc/sudoers:
Cmnd_Alias PUPPETCHECK=/usr/bin/puppet config print runinterval,\ /usr/bin/puppet config print splay,\ /usr/bin/puppet config print splaylimit,\ /usr/bin/puppet config print agent_disabled_lockfile,\ /usr/bin/puppet config print lastrunfile,\ /usr/bin/puppet config print pidfile, \ /usr/local/nagios/libexec/j1n_check_puppet_agent Defaults:nagios !requiretty nagios ALL = NOPASSWD: PUPPETCHECK
What am I missing?
Check_puppet_agent can only be run as root on my install.
str@puppetpal:/usr/lib/nagios/plugins$sudo -u nagios ./check_puppet_agent
Error: Could not initialize global default settings: Permission denied @ dir_s_mkdir - /home/str/.puppetlabs
UNKNOWN: Internal error: Puppet version unknown from Error: Could not initialize global default settings: Permission denied @ dir_s_mkdir - /home/str/.puppetlabs
cat /etc/sudoers
User_Alias NAGIOS=nagios
Cmnd_Alias PUPPETCHECK=/usr/bin/puppet config print runinterval,\
/usr/bin/puppet config print splay,\
/usr/bin/puppet config print splaylimit,\
/usr/bin/puppet config print agent_disabled_lockfile,\
/usr/bin/puppet config print lastrunfile,\
/usr/bin/puppet config print lastrunreport,\
/usr/bin/puppet config print pidfile
NAGIOS ALL=NOPASSWD:PUPPETCHECK
Hi,
Right now most of time script spends in running 'puppet config print' 4 times. It's inefficient and causes noticeable runtime for the script. It's not a big deal but when monitoring system is running many small scripts it's better to have them running in efficient way so that total execution time for monitoring and load is as small as possible. It would be nice to have all needed options requested with one 'puppet config print' run.
I have a consistent issue with the script on debian 8 with puppet 4.8.2 .
A failed puppet agent run is still being reported as OK by this script. it reports the correct catalogversion.
perhaps the last_run_report.yaml format has changed? i notice the script is looking for "status: failure" but in the report the line is "status: unchanged"
Hopefully I'm not jumping the gun here. I will probably do some additional digging, but wanted to report what I'm seeing.
I'm using puppet agent 4.9.2
The plugin is able to parse last_run_summary.yaml if my puppet run is a success, but consider the case where a puppet run results in the following.
"Error: Could not retrieve catalog from remote server: Error 500 on SERVER: Server Error: Evaluation Error: Error while evaluating a Resource Statement, Evaluation Error: Error while evaluating a Resource Statement, Could not find declared class...."
version:
config:
puppet: 4.9.2
time:
last_run: 1487266691
This means that around lines 312-322 of the plugin it fails to parse last_run_summary.yaml resulting in a "UNKNOWN: last_run_summary.yaml not found, not readable or incomplete".
It seems that if a catalog can't be retrieved from the puppet master the resulting last_run_summary.yaml isn't valid for the purposes of this nagios plugin. I'm not sure how unique this case is, but I think it would be preferable to identify this case and report something about the agent being unable to retrieve it's catalog on the last puppet run instead of UNKNOWN.
You should be able to replicate this case by removing a module from your puppet environment that contains classes used in your manifests.
Im new to this but I figured I would give back to you for writing a nagios script I needed.
I made a few minor updates.:
Here is the patch
--- check_puppet_agent 2012-02-14 16:05:30.745482918 -0600
@@ -35,9 +35,10 @@
# CHANGELOG:
#20120126 A.Swen created.
+# 20120214 trey85stang Modified, added getopts, usage, defaults
# SETTINGS
-statefile=/var/lib/puppet/agent/state/last_run_summary.yaml
+statefile=/var/lib/puppet/state/last_run_summary.yaml
# FUNCTIONS
result () {
@@ -53,13 +54,43 @@
exit $rc
}
-# SCRIPT
-if [ $# -lt 2 ];then
- result 5
-else
- WARN=$1
- CRIT=$2
-fi
+usage () {
+ echo ""
+ echo "USAGE: "
+ echo " $0 -w 3600 -c 7200"
+ echo " -w warning threshold"
+ echo " -c ciritcal threshold"
+ echo ""
+ exit 1
+}
+
+CRIT=7200
+WARN=3600
+
+while getopts "w:c:" opt
+do
+ case $opt in
+ w)
+ if ! echo $OPTARG | grep -q "[A-Za-z]" && [ -n "$OPTARG" ]
+ then
+ WARN=$OPTARG
+ else
+ usage
+ fi
+ ;;
+ c)
+ if ! echo $OPTARG | grep -q "[A-Za-z]" && [ -n "$OPTARG" ]
+ then
+ CRIT=$OPTARG
+ else
+ usage
+ fi
+ ;;
+ *)
+ usage
+ ;;
+ esac
+done
# check if state file exists
[ -s ${statefile} ] || result 1
@@ -94,3 +125,4 @@
result 0
# END
+
So, with sudo being used for commands, and having to add nagios user to sudoers per docs at top of script, I noticed the following behavior.
If I assume the nagios user:
$ -> sudo sudo -s -u nagios
bash-4.1$ /usr/bin/puppet --configprint lastrunfile
/var/spool/nagios/.puppet/var/state/last_run_summary.yaml
But then run the same command as su:
$ -> puppet --configprint lastrunfile
/var/lib/puppet/state/last_run_summary.yaml
The result when run via nagios and su do not match, which is why the parse_yaml function was hanging indefinitely. After hardcoding the path the parse_yaml function worked as expected.
Sidenote: You should be able to remove the sudo references in the script. If NRPE was compiled to run under nagios user or via nrpe_user in nrpe.cfg, and the script has 755 perms with and accessible as nagios then you shouldn't need to mess with sudoers etc.
Hi,
running the script in debug mode it fail with
nagios@x:~$ bash -x /usr/local/custom/nagios/nagios-plugins-aswen/check_puppet_agent -w 3600 -c 7200 -s /var/lib/puppet/state/last_run_summary.yaml -d 0
[...]
++ sudo /usr/bin/puppet config print agent_disabled_lockfile
+ agent_disabled_lockfile=/var/lib/puppet/state/agent_disabled.lock
+ '[' -f /var/lib/puppet/state/agent_disabled.lock ']'
+ '[' -z /var/lib/puppet/state/last_run_summary.yaml ']'
+ '[' -s /var/lib/puppet/state/last_run_summary.yaml -a -r /var/lib/puppet/state/last_run_summary.yaml ']'
+ result 1
+ case $1 in
+ echo 'UNKNOWN: last_run_summary.yaml not found, not readable or incomplete'
UNKNOWN: last_run_summary.yaml not found, not readable or incomplete
+ rc=3
+ exit 3
nagios@x:~$
The reason why:
https://tickets.puppetlabs.com/browse/PUP-6936
The current stupid resolution:
chmox +x /var/lib/puppet
Feel free to suggest way to close it... I think that this should be closed on puppet side, and suggest a workaround like the one above.
(Updated by @aswen: put console output in code block)
My colleague @benwtr wrote:
I have a theory- this might be normal behaviour.
When puppet-agent fails to retrieve the catalog from the master, I'm pretty sure it happily uses a cached catalog and runs just fine but it looks like the --test flag for puppet-agent turns that off and fails instead of using a cached copy of the catalog.
If this is the case, it might not be what we want. We could probably disable catalog caching, or maybe update the script so alerts if the catalog couldn't be retrieved from the puppet-master.
I think he's right.
Hello:
I noticed that the puppet check doesn't check for failed runs of the form:
Warning: Not using cache on failed catalog
Error: Could not retrieve catalog; skipping run
If we change the awk calls in the variables, we could then check for NR != 1; in such runs, the failure parameter isn't even in the file.
[root@host state]# grep failure last_run_summary.yaml
failure: 0
[root@host state]# awk -v count=0 '/failure:/ {count++; count+=$2} END{print count}' last_run_summary.yaml
1
[ "$(ps axf|egrep "/usr/bin/ruby /usr/sbin/puppetd|/usr/bin/ruby1.8 /usr/bin/puppet agent")" ] || result 4
Hello aswen, this above line does not work (at least not on my centos systems) I've been running this check for a long time but I just noticed that this doesnt work or I would of let you know sooner. anyways, the egrep line is always matched in the ps axf output. you could easily add egrep -v egrep but Im not a fan of multiple grep's.
I think this line, below, will cover most cases on all linux os's. If you are running puppet master on the same box you could get a false positive. I dont think unixs come with this command, but I dont think that matters since egrep probably doesnt come with them either.
anyways... the above can be changed with:
[ "$(pgrep puppet")" ] || result 4
after which checking if puppet is running should work as expected.
Once again, thanks for this plugin. It is appreciated.
An edge case was discovered which was causing check_puppet_agent to hang. If a node was running puppetmasterd but not puppetd, then the check_puppet_agent script would hang indefinitely when run through nrpe. The quick fix would be to tighten the regex in line 233:
ps axfww|egrep "/usr(/local)?/bin/ruby[^ ]* /usr(/local)?/s?bin/puppet(d)?$"|grep -v egrep
I would suggest a more elegant approach, to allow us to pass -r Role. agent (default) or master. This way we can conditionally change the regex accordingly, yet still use the same script. The only difference would be to add a 2nd line to nrpe.cfg, something like:
command[check_local_puppet_agent]=/usr/local/nagios/libexec/j1n_check_puppet -w 1800 -c 3600
command[check_local_puppet_agent]=/usr/local/nagios/libexec/j1n_check_puppet -r master
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.