dangmocrang / check_idrac Goto Github PK
View Code? Open in Web Editor NEWA script to monitoring DELL IDRAC via SNMP
License: Other
A script to monitoring DELL IDRAC via SNMP
License: Other
Hi guys,
I have the DIMM Socket B6 with status NONCRITICAL and I don't receive any alert, bellow the result, I'm using the new version 2.2rc4.
Memory 1 (DIMM Socket A1) 8.0 GB/1333 MHz: ENABLED/OK [DDR3, Kingston, S/N: D2268486]
Memory 2 (DIMM Socket A2) 8.0 GB/1333 MHz: ENABLED/OK [DDR3, Kingston, S/N: B8267986]
Memory 3 (DIMM Socket A3) 8.0 GB/1333 MHz: ENABLED/OK [DDR3, Kingston, S/N: CD268D86]
Memory 4 (DIMM Socket A4) 8.0 GB/1333 MHz: ENABLED/OK [DDR3, Kingston, S/N: CA267D86]
Memory 5 (DIMM Socket A5) 8.0 GB/1333 MHz: ENABLED/OK [DDR3, Kingston, S/N: CE268186]
Memory 6 (DIMM Socket A6) 8.0 GB/1333 MHz: ENABLED/OK [DDR3, Kingston, S/N: CE267E86]
Memory 7 (DIMM Socket A7) 8.0 GB/1333 MHz: ENABLED/OK [DDR3, Kingston, S/N: CE226FB9]
Memory 8 (DIMM Socket A8) 8.0 GB/1333 MHz: ENABLED/OK [DDR3, Kingston, S/N: CE226EB9]
Memory 9 (DIMM Socket B1) 8.0 GB/1333 MHz: ENABLED/OK [DDR3, Kingston, S/N: D2268F86]
Memory 10 (DIMM Socket B2) 8.0 GB/1333 MHz: ENABLED/OK [DDR3, Kingston, S/N: D1268286]
Memory 11 (DIMM Socket B3) 8.0 GB/1333 MHz: ENABLED/OK [DDR3, Kingston, S/N: 2826E089]
Memory 12 (DIMM Socket B4) 8.0 GB/1333 MHz: ENABLED/OK [DDR3, Kingston, S/N: CD267D86]
Memory 13 (DIMM Socket B5) 8.0 GB/1333 MHz: ENABLED/OK [DDR3, Kingston, S/N: CA268386]
Memory 14 (DIMM Socket B6) 8.0 GB/1333 MHz: ENABLED/NONCRITICAL [DDR3, Kingston, S/N: CD267986]
Memory 15 (DIMM Socket B7) 8.0 GB/1333 MHz: ENABLED/OK [DDR3, Kingston, S/N: CF267C86]
Memory 16 (DIMM Socket B8) 8.0 GB/1333 MHz: ENABLED/OK [DDR3, Kingston, S/N: D1267A86]
Regards,
Marcos Dutra
First of all thanks for the plugin!
We're trying to implement it in our environment, but currently hit a dead end with the monitoring part for storage system.
We’re using a RAID 1 configuration for our System Disk and if one Disk is removed or faulted it should show up as a Critical event.
But in our tests the result we’ve got after removing one of the two disks is only that it is non critical and the system neither logs a warning nor a critical state.
Is there a way to set the alert threshold in a way so that a disk failing, or being removed triggers a warning?
Thank you.
Best Regards!
As in the title.
I specified the host ip address in the idrac_2.1.conf but still have this error. Any idea why this is?
Also it would be good to have installation instruction in bullet points.
The example in MANUAL.md shows checking a specific DIMM and not using a configuration file, but when I try it states that I need to use a configuration file.
Check specific hardware
./idrac2.1.py -H 10.10.10.20 -v2c -c public -w MEM#3
Thank you for making this plugin and putting it all together.
Can I use the equivalent of -v2c public
?
I get the following errors specifying this in the configuration file
[snmp]
version = '2c'
community = 'public'
This is the error I get in Nagios:
iDRAC Server Health;WARNING;notify-service-by-email;snmp version "None" is not supported!
When I run this manually and append -v2c everything works however - e.g.
/usr/lib64/nagios/plugins/idrac_2.2rc4 -H 10.12.64.131 -v2c -c public -m /usr/share/snmp/mibs/idrac-smiv2.mib
I've also tried version = '2'
to no avail.
I see it's unset here in the code: https://github.com/dangmocrang/check_idrac/blob/master/idrac_2.2rc4#L17 but seems ugly to force it there, I'd like to use the config file you are shipping.
What am I doing wrong?
Hi,
thanks for this neat plugin.
Some issues I notices with the battery module.
1.
The web interface is giving me two batteries:
System Board CMOS Battery - Good
PERC1 ROMB Battery - Good
I'm wondering why the plugin gives back PERC1 and 2 and the second with a "0"?
BATTERY
--System Board CMOS Battery: ENABLED/OK [PRESENCEDETECTED]
--PERC1 ROMB Battery: ENABLED/OK [PRESENCEDETECTED]
--PERC2 ROMB Battery: ENABLED/OK [0(!)]
I think this shouldn't be listed (or can I catch this state in the conf?)
2.
Another server but maybe the same issue:
Web interfaces just have:
System Board CMOS Battery - Good
Plugin returns:
BATTERY
--System Board CMOS Battery: ENABLED/OK [PRESENCEDETECTED]
--PERC1 ROMB Battery: ENABLED/OK [-2147483648(!)]
--PERC2 ROMB Battery: ENABLED/OK [-2147483648(!)]
Regards
Ricardo
as nagios config file takes # as comment, i am unable to use # in specific hardware check such as FAN#1
./check_idrac -m /usr/share/snmp/mibs/idrac.mib -H -v2c -c @apfin@nce -w FAN#1
while defining service description in nagios config it takes # as comment. can we use "-" instead of "#"? Please reply.
Greetings.
I'm having trouble with the output of the check_idrac plugin.
The output and the performance data are not formatted correctly. According to https://nagios-plugins.org/doc/guidelines.html , the output should look like this:
SERVICE STATUS: text of output | performance data
However, the check_idrac plugin output looks like this:
System Board Inlet Temp: 23.0 C ENABLED/OK | Temperature=23.0;;;;
CPU1 Temp: 53.0 C ENABLED/OK | Temperature=53.0;;;;
CPU2 Temp: 46.0 C ENABLED/OK | Temperature=46.0;;;;
Nagios is not interpreting the output correctly, and this is causing problems.
Can you please look into this and adjust the output to match the Nagios plugin standards?
Hi,
awesome tool, but we have some problems getting the status of our physical disks. Three values are switched from the group view in comparison to the view of a single disk which causes an unproblematic drive to throw a warning.
Here is an example:
I asume that the order of the values returned by snmpget is different from the ones returned by snmpwalk (which is used for groups and all) which then causes the wrong output and a false warning alert.
We have tested with different systems which all produce the same result.
When I set the value_on_alert Array in line 655 to [3,8,9] instead of [3,7,9] I can get rid of the false warning but still get the switched ouput. In order to fix the output I have to manipulate the order in lines 667 - 669 which then messes up the group output of PDisk.
If you need any more information, let me know,
best regards
Raphael Rehberg
[root@REDACTED ~]# /usr/local/nagios/libexec/idrac_2.0b9.py -H REDACTED -c REDACTED -w FAN --perf
System Board Fan1A: 2400 RPM - ENABLED/OK
System Board Fan1B: 1560 RPM - ENABLED/OK
System Board Fan2A: 3720 RPM - ENABLED/OK
System Board Fan2B: 2640 RPM - ENABLED/OK
System Board Fan3A: 2520 RPM - ENABLED/OK
System Board Fan3B: 1800 RPM - ENABLED/OK
System Board Fan4A: 2520 RPM - ENABLED/OK
System Board Fan4B: 1800 RPM - ENABLED/OK
System Board Fan5A: 2400 RPM - ENABLED/OK
System Board Fan5B: 1800 RPM - ENABLED/OK
[root@REDACTED ~]# /usr/local/nagios/libexec/idrac_2.0b9.py -H REDACTED -c REDACTED -w FAN
System Board Fan1A: 2400 RPM - ENABLED/OK
System Board Fan1B: 1560 RPM - ENABLED/OK
System Board Fan2A: 3720 RPM - ENABLED/OK
System Board Fan2B: 2640 RPM - ENABLED/OK
System Board Fan3A: 2520 RPM - ENABLED/OK
System Board Fan3B: 1800 RPM - ENABLED/OK
System Board Fan4A: 2520 RPM - ENABLED/OK
System Board Fan4B: 1800 RPM - ENABLED/OK
System Board Fan5A: 2400 RPM - ENABLED/OK
System Board Fan5B: 1800 RPM - ENABLED/OK
Hi Dung,
Thanks for your great job regarding this check.
I met some issues with PS check on differents servers (2 x PE R420, 1 x PE R720, 2 x PE R620, 2 x PE R630, 2 x PE R430).
Let's have a look to a PE R630:
root@srv020:/usr/lib/nagios/plugins# ./check_idrac -H srv001-idrac -m ./idrac-smiv2.mib -c public -w PS -f check_idrac.conf
PS 1: OK, Volt I/O: 264 V/(N/A) V, Current: 0.6 A, Watt I/O: 594.0 W/495.0 W
PS 2: OK, Volt I/O: 264 V/(N/A) V, Current: 0.2 A, Watt I/O: 594.0 W/495.0 W
root@srv020:/usr/lib/nagios/plugins# ./check_idrac -H srv001-idrac -m ./idrac-smiv2.mib -c public -w PS#1 -f check_idrac.conf
CRIT - PS 1: OK, Volt I/O: 264 V/0(!!) V, Current: 0.6(!!) A, Watt I/O: 594.0 W/495(!!) W
root@srv020:/usr/lib/nagios/plugins# ./check_idrac -H srv001-idrac -m ./idrac-smiv2.mib -c public -w PS#2 -f check_idrac.conf
CRIT - PS 2: OK, Volt I/O: 264 V/0(!!) V, Current: 0.2(!!) A, Watt I/O: 594.0 W/495(!!) W
My check_idrac.conf is the default one. Why does it quit with Critical status whereas the comment is OK?
I got this result with each server. The check runs on Debian Wheezy with Python 2.7.3
Thanks for your help.
Bruno
Hello,
Thank you for your wonderful plugin, with the latest version I keep getting this error :
your MIB may out of dated!
I downloaded the lastest MIBs from Dell, I'm running a Dell R730 but I get the same error no matter what I try :
./idrac_2.2rc4 -H 192.168.1.24 -v2c -c public -m /root/tools/check_idrac-master/mymibs/mibs/iDRAC-SMIv2.mib
Here are my specs :
FreeBSD nagios 11.0-RELEASE-p2 FreeBSD 11.0-RELEASE-p2 #0: Mon Oct 24 06:55:27 UTC 2016 [email protected]:/usr/obj/usr/src/sys/GENERIC amd64
root@nagios:~/tools/check_idrac-master # python --version
Python 2.7.13
Can you please help me out with this.
Hello i'm running iDrac 9 on R620 server.
This is the output of my command : ./idrac_2.2rc4 -H 10.50.19.66 -c public -v2c -m ./iDRAC-SMIv2.mib
PS
Traceback (most recent call last):
File "./idrac_2.2rc4", line 848, in
result, tmp_code = PARSER().main()
File "./idrac_2.2rc4", line 662, in main
output.append(tmp % (self.hardware[2], hw[0], hw[1].split()[-1].replace('"', ''),
IndexError: list index out of range
Any idea ?
After upgrading the firmware of iDrac 7 to 2.60.60.60, systemBattery group (1.3.6.1.4.1.674.10892.5.4.600.50.1.5) does not have information about the PERC battery (BBU) anymore.
The following OID still works:
1.3.6.1.4.1.674.10892.5.5.1.20.130.15
Hello there.
[root@localhost check_idrac-master]# ./idrac_2.0b7 -H localhost -c public
Somebody can help with this issue ?
Thanks,
Hello,
I have using now the script for a 2 days. I have new Dell servers. 1 have failed today. But I get no nagios messages.
I see in nagios.
systemStateGlobalSystemStatus critical
But no alerts. Its get a green message.
When I changing the --critical= its give the same green message.
Hi,
I just configured your plugin with nagios and I managed to make it work with a server that uses iDrac9, I had to use the latest version you posted on another issue because I was having the same problem with warnings on disks when queried singularly.
But I still have warnings on most of my disks like this one:
PDisk 5 (0:1:0) 3726.02 GB: NON-RAID(!), PowerStat: SPUNUP, HotSpare: no [TOSHIBA, HDD, S/N: 39A8K5Z7F7DE] isFailing: 0
This will generate a warning on nagios I am guessing because of the NON-RAID flag. The thing is that it is supposed to be like that because those disks are not attached to a raid controller as it is an array of disks used in a ZFS pool.
I think it would be needed a flag to suppress this warning.
Otherwise great plugin!
Thanks.
check_idrac currently didn't support IPv6
Adding IPv6 support is easy because it require just to prefix host value with 'udp6:' when calling SNMP commands.
Please find attached a patch:
idrac_2.2rc4_ipv6.patch.txt
Hi
did you have any chance to test it on iDrac 8 version? I'm getting error.
[user@admin]$ python idrac_2.2rc1 -H 10.199.227.208 -c public -v2c
your MIB may out of dated!
error - systemBattery: Unknown Object Identifier (Sub-id not found: (top) -> systemBattery)
Hi,
I'm seeing a timeout for one of my dracs for the Fan check when using the Check_iDRAC for DELL iDRAC found on https://exchange.nagios.org/directory/P ... AC/details.
PS
--PS 1: OK, Volt I/O: 264 V/230 V, Current: 0.6 A, Watt I/O: 900.0 W/750 W
--PS 2: OK, Volt I/O: 264 V/230 V, Current: 0.2 A, Watt I/O: 900.0 W/750 W
SNMP timeout!
It's not exactly a timeout though, if I comment the out the elif statement that prints the 'SNMP timeout!', as shown below. I can see that values for the Fan check are actually found. Can you please help with this issue?
def get_snmp(self, oids):
cmd_v3 = '%s %s -O q -v %s -u %s -l %s -a %s -A %s -x %s -X %s %s -m %s'
% (self.snmp_command, self.host, conf['snmp_version'], conf['snmp_security_name'],
conf['snmp_security_level'],
conf['snmp_authentication_protocol'], conf['snmp_authentication_password'],
conf['snmp_privacy_protocol'], conf['snmp_privacy_password'],
oids, self.mib_file)
cmd_v2 = '%s %s -O q -v %s -c %s %s -m %s'
% (self.snmp_command, self.host, conf['snmp_version'],
conf['snmp_community'],
oids, self.mib_file)
available_cmd = {'3': cmd_v3, '2c': cmd_v2}
snmp_cli = available_cmd[conf['snmp_version']]
status, output = run(snmp_cli) # query snmp data
if status != 0:
if 'Unknown Object Identifier' in output:
print 'your MIB may out of dated!'
print 'error -', output.replace('\n', '. ')
#elif 'Timeout:' in output:
else:
print output
sys.exit(1)
IDRAC-MIB-SMIv2::coolingDeviceReading.1.1 3720
IDRAC-MIB-SMIv2::coolingDeviceReading.1.2 3840
IDRAC-MIB-SMIv2::coolingDeviceReading.1.3 3840
IDRAC-MIB-SMIv2::coolingDeviceReading.1.4 3720
IDRAC-MIB-SMIv2::coolingDeviceReading.1.5 3720
IDRAC-MIB-SMIv2::coolingDeviceReading.1.6 3840
IDRAC-MIB-SMIv2::coolingDeviceReading.1.7 3840
IDRAC-MIB-SMIv2::coolingDeviceReading.1.8 2400
IDRAC-MIB-SMIv2::coolingDeviceReading.1.9 2400
IDRAC-MIB-SMIv2::coolingDeviceReading.1.10 2400
IDRAC-MIB-SMIv2::coolingDeviceReading.1.11 2280
IDRAC-MIB-SMIv2::coolingDeviceReading.1.12 2280
IDRAC-MIB-SMIv2::coolingDeviceReading.1.13 2400
IDRAC-MIB-SMIv2::coolTimeout: No Response from 172.16.0.39
ingDeviceReading.1.14 2280
IDRAC-MIB-SMIv2::coolingDeviceType.1.1 coolingDeviceTypeIsAFan
IDRAC-MIB-SMIv2::coolingDeviceType.1.2 coolingDeviceTypeIsAFan
IDRAC-MIB-SMIv2::coolingDeviceType.1.3 coolingDeviceTypeIsAFan
IDRAC-MIB-SMIv2::coolingDeviceType.1.4 coolingDeviceTypeIsAFan
IDRAC-MIB-SMIv2::coolingDeviceType.1.5 coolingDeviceTypeIsAFan
IDRAC-MIB-SMIv2::coolingDeviceType.1.6 coolingDeviceTypeIsAFan
IDRAC-MIB-SMIv2::coolingDeviceType.1.7 coolingDeviceTypeIsAFan
IDRAC-MIB-SMIv2::coolingDeviceType.1.8 coolingDeviceTypeIsAFan
IDRAC-MIB-SMIv2::coolingDeviceType.1.9 coolingDeviceTypeIsAFan
IDRAC-MIB-SMIv2::coolingDeviceType.1.10 coolingDeviceTypeIsAFan
IDRAC-MIB-SMIv2::coolingDeviceType.1.11 coolingDeviceTypeIsAFan
IDRAC-MIB-SMIv2::coolingDeviceType.1.12 coolingDeviceTypeIsAFan
IDRAC-MIB-SMIv2::coolingDeviceType.1.13 coolingDeviceTypeIsAFan
IDRAC-MIB-SMIv2::coolingDeviceType.1.14 coolingDeviceTypeIsAFan
hi,
and thank you for this great plugin!
is it possible to monitor a group of hardware items in the nagios way without a check for every single module?
for example the memory. i use this check
idrac_2.0b9 -H 10.0.1.3 -f check_idrac.conf -p -w MEM
and i get this result:
Memory 1 (DIMM Socket A1) 8.0 GB/1333 MHz: ENABLED/OK [DDR3, Micron Technology, S/N: E09DD683]
Memory 2 (DIMM Socket A2) 8.0 GB/1333 MHz: ENABLED/OK [DDR3, Micron Technology, S/N: E09DD684]
Memory 3 (DIMM Socket A3) 8.0 GB/1333 MHz: ENABLED/OK [DDR3, Micron Technology, S/N: E09DD686]
Memory 4 (DIMM Socket A4) 8.0 GB/1333 MHz: ENABLED/OK [DDR3, Micron Technology, S/N: E09DD685]
Memory 5 (DIMM Socket B1) 8.0 GB/1333 MHz: ENABLED/OK [DDR3, Micron Technology, S/N: E09DD673]
Memory 6 (DIMM Socket B2) 8.0 GB/1333 MHz: ENABLED/OK [DDR3, Micron Technology, S/N: E09DD670]
Memory 7 (DIMM Socket B3) 8.0 GB/1333 MHz: ENABLED/OK [DDR3, Micron Technology, S/N: E09DD67F]
Memory 8 (DIMM Socket B4) 8.0 GB/1333 MHz: ENABLED/OK [DDR3, Micron Technology, S/N: E09DD682]
Memory 9 (DIMM Socket A5) 8.0 GB/1333 MHz: ENABLED/OK [DDR3, Micron Technology, S/N: E09DD583]
Memory 10 (DIMM Socket A6) 8.0 GB/1333 MHz: ENABLED/OK [DDR3, Micron Technology, S/N: E09DD582]
Memory 11 (DIMM Socket A7) 8.0 GB/1333 MHz: ENABLED/OK [DDR3, Micron Technology, S/N: E09DD61F]
Memory 12 (DIMM Socket A8) 8.0 GB/1333 MHz: ENABLED/OK [DDR3, Micron Technology, S/N: E09DD579]
Memory 13 (DIMM Socket B5) 8.0 GB/1333 MHz: ENABLED/OK [DDR3, Micron Technology, S/N: E09DD57F]
Memory 14 (DIMM Socket B6) 8.0 GB/1333 MHz: ENABLED/OK [DDR3, Micron Technology, S/N: E09DD581]
Memory 15 (DIMM Socket B7) 8.0 GB/1333 MHz: ENABLED/OK [DDR3, Micron Technology, S/N: E09DD57E]
Memory 16 (DIMM Socket B8) 8.0 GB/1333 MHz: ENABLED/OK [DDR3, Micron Technology, S/N: E09DD580]
Memory 17 (DIMM Socket A9) 16.0 GB/1600 MHz: ENABLED/OK [DDR3, Samsung, S/N: 01BE535B]
Memory 18 (DIMM Socket A10) 16.0 GB/1600 MHz: ENABLED/OK [DDR3, Samsung, S/N: 01BE575A]
Memory 19 (DIMM Socket A11) 16.0 GB/1600 MHz: ENABLED/OK [DDR3, Samsung, S/N: 01BE570C]
Memory 20 (DIMM Socket A12) 16.0 GB/1600 MHz: ENABLED/OK [DDR3, Samsung, S/N: 01BE5365]
Memory 21 (DIMM Socket B9) 16.0 GB/1600 MHz: ENABLED/OK [DDR3, Samsung, S/N: 01BE533E]
Memory 22 (DIMM Socket B10) 16.0 GB/1600 MHz: ENABLED/OK [DDR3, Samsung, S/N: 01BE534C]
Memory 23 (DIMM Socket B11) 16.0 GB/1600 MHz: ENABLED/OK [DDR3, Samsung, S/N: 01BE57D6]
Memory 24 (DIMM Socket B12) 16.0 GB/1600 MHz: ENABLED/OK [DDR3, Samsung, S/N: 01BE57F6]
checking every module is painful. is it possible to get the global status of all memory modules without checking every single module?
thank you in advance
barosch
Hi Dung,
thanks for your script, it's so helpfull for me. :-)
when i'm checking all components of a Poweredge R730xd Hardware i receive always an warning state. I think that's because some SSDs where in "PowerStat; 5(!)" (and not e.g. "PowerStat: SPUNUP", like HDDs). You can see my Output at PDisk 16 + 17 (SSDs) by checking all states:
PS
--PS 1: OK, Volt I/O: 264 V/(N/A) V, Current: 0.8 A, Watt I/O: 900.0 W/750 W
--PS 2: OK, Volt I/O: 264 V/(N/A) V, Current: 0.2 A, Watt I/O: 900.0 W/750 W
FAN
--System Board Fan1: 3600 RPM - ENABLED/OK
--System Board Fan2: 3720 RPM - ENABLED/OK
--System Board Fan3: 3600 RPM - ENABLED/OK
--System Board Fan4: 3480 RPM - ENABLED/OK
--System Board Fan5: 3600 RPM - ENABLED/OK
--System Board Fan6: 3600 RPM - ENABLED/OK
BATTERY
--System Board CMOS Battery: ENABLED/OK [PRESENCEDETECTED]
--PERC1 ROMB Battery: ENABLED/OK [PRESENCEDETECTED]
PU
--PU 1: ENABLED/OK, RedundancyStatus: FULL, SystemBoard Pwr Consumption: 182 W
MEM
--Memory 1 (DIMM Socket A1) 8.0 GB/2133 MHz: ENABLED/OK [26, Samsung, S/N: ]
--Memory 2 (DIMM Socket A2) 8.0 GB/2133 MHz: ENABLED/OK [26, Samsung, S/N: ***]
--Memory 3 (DIMM Socket A3) 8.0 GB/2133 MHz: ENABLED/OK [26, Samsung, S/N: ***]
--Memory 4 (DIMM Socket A4) 8.0 GB/2133 MHz: ENABLED/OK [26, Samsung, S/N: ***]
--Memory 5 (DIMM Socket B1) 8.0 GB/2133 MHz: ENABLED/OK [26, Samsung, S/N: ***]
--Memory 6 (DIMM Socket B2) 8.0 GB/2133 MHz: ENABLED/OK [26, Samsung, S/N: ***]
--Memory 7 (DIMM Socket B3) 8.0 GB/2133 MHz: ENABLED/OK [26, Samsung, S/N: ***]
--Memory 8 (DIMM Socket B4) 8.0 GB/2133 MHz: ENABLED/OK [26, Samsung, S/N: ***]
VDISK
--VDisk 1 (System): OK/ONLINE, RAID-1 (185.75 GB), BadBlock: 0 [Virtual Disk 0 on Integrated RAID Controller 1]
--VDisk 2 (Backup2Disk): OK/ONLINE, RAID-6 (7820.75 GB), BadBlock: 0 [Virtual Disk 1 on Integrated RAID Controller 1]
--VDisk 3 (Datadepot): OK/ONLINE, RAID-5 (4469.0 GB), BadBlock: 0 [Virtual Disk 2 on Integrated RAID Controller 1]
DISK
--PDisk 1 (0:1:0) 1117.25 GB: ONLINE, PowerStat: SPUNUP, HotSpare: no [SEAGATE, S/N: ***]
--PDisk 2 (0:1:1) 1117.25 GB: ONLINE, PowerStat: SPUNUP, HotSpare: no [SEAGATE, S/N: ***]
--PDisk 3 (0:1:2) 1117.25 GB: ONLINE, PowerStat: SPUNUP, HotSpare: no [SEAGATE, S/N: ***]
--PDisk 4 (0:1:3) 1117.25 GB: ONLINE, PowerStat: SPUNUP, HotSpare: no [SEAGATE, S/N: ***]
--PDisk 5 (0:1:4) 1117.25 GB: ONLINE, PowerStat: SPUNUP, HotSpare: no [SEAGATE, S/N: ***]
--PDisk 6 (0:1:5) 1117.25 GB: ONLINE, PowerStat: SPUNUP, HotSpare: no [SEAGATE, S/N: ***]
--PDisk 7 (0:1:6) 1117.25 GB: ONLINE, PowerStat: SPUNUP, HotSpare: no [SEAGATE, S/N: ***]
--PDisk 8 (0:1:7) 1117.25 GB: ONLINE, PowerStat: SPUNUP, HotSpare: no [SEAGATE, S/N: ***]
--PDisk 9 (0:1:8) 1117.25 GB: ONLINE, PowerStat: SPUNUP, HotSpare: no [SEAGATE, S/N: ***]
--PDisk 10 (0:1:12) 1117.25 GB: ONLINE, PowerStat: SPUNUP, HotSpare: no [SEAGATE, S/N: ***]
--PDisk 11 (0:1:13) 1117.25 GB: ONLINE, PowerStat: SPUNUP, HotSpare: no [SEAGATE, S/N: ***]
--PDisk 12 (0:1:14) 1117.25 GB: ONLINE, PowerStat: SPUNUP, HotSpare: no [SEAGATE, S/N: ***]
--PDisk 13 (0:1:15) 1117.25 GB: ONLINE, PowerStat: SPUNUP, HotSpare: no [SEAGATE, S/N: ***]
--PDisk 14 (0:1:16) 1117.25 GB: ONLINE, PowerStat: SPUNUP, HotSpare: no [SEAGATE, S/N: ***]
--PDisk 15 (0:1:23) 1117.25 GB: READY, PowerStat: SPUNUP, HotSpare: no [SEAGATE, S/N: ***]
--PDisk 16 (0:1:24) 185.75 GB: ONLINE, PowerStat: 5(!), HotSpare: no [TOSHIBA, S/N: *********]
--PDisk 17 (0:1:25) 185.75 GB: ONLINE, PowerStat: 5(!), HotSpare: no [TOSHIBA, S/N: ***********]
SENSOR
--System Board Inlet Temp: 27.0 C ENABLED/OK
--System Board Exhaust Temp: 38.0 C ENABLED/OK
--CPU1 Temp: 44.0 C ENABLED/OK
--CPU2 Temp: 47.0 C ENABLED/OK
CPU
--CPU 1 (8 cores/16 threads): ENABLED/OK [Intel(R) Xeon(R) CPU E5-2630L v3 @ 1.80GHz]
--CPU 2 (8 cores/16 threads): ENABLED/OK [Intel(R) Xeon(R) CPU E5-2630L v3 @ 1.80GHz]
Also a of my iDRAC Web GUI at the PDisk 16 (0:1:24):
For any help solving the problem thanks in advance!
Greetings from Germany
Nick
Dear @dangmocrang!
Thanks for your great job creating and maintaining this awsome check.
In my setup it is always showing "PS" (first line of output of script) - as you see in the screenshots.
My question:
Is there a way to show more information in the "Status Information" field when there is an error with hardware?
My Nagios configuration - commands.cfg:
# 'check_idrac_full' command definition
define command{
command_name check_idrac_full
command_line /usr/local/nagios/libexec/check_idrac/idrac_2.2rc4 -H $HOSTADDRESS$ -f /usr/local/nagios/libexec/check_idrac/idrac_2.1.conf -m /usr/local/nagios/libexec/check_idrac/idrac-smiv2.mib
}
My Nagios configuration - hosts.cfg:
define service{
use generic-service
host_name dell
service_description iDRAC Full
check_command check_idrac_full
notifications_enabled 0
}
Thank you and greeeeetz from Austria,
Florian Miesenberger
[root@host dell_monitors]# ./check_idrac -m /root/.snmp/mibs/idrac-smiv2.mib -H 74.201.248.76 -v2c -v2c -c public -w FAN#1 --fan-warn=4000,6000 --fan-crit=3000,7000 -p
Traceback (most recent call last):
File "./check_idrac", line 842, in
result, exit_code = PARSER().main()
File "./check_idrac", line 671, in main
perf_data = ' | RPM=%s;%s;%s;;' % (hw[3].split('(')[0], conf['fan_min_warn'], conf['fan_min_crit'])
KeyError: 'fan_min_warn'
[root@host dell_monitors]# vi check_idrac
[root@host dell_monitors]# ./check_idrac -m /root/.snmp/mibs/idrac-smiv2.mib -H 74.201.248.76 -v2c -v2c -c public -w FAN#1 --fan-warn=4000,6000 --fan-crit=3000,7000 -p -f conf.conf
Traceback (most recent call last):
File "./check_idrac", line 842, in
result, exit_code = PARSER().main()
File "./check_idrac", line 671, in main
perf_data = ' | RPM=%s;%s;%s;;' % (hw[3].split('(')[0], conf['fan_min_warn'], conf['fan_min_crit'])
KeyError: 'fan_min_warn'
And for all fans:
[root@host dell_monitors]# ./check_idrac -m /root/.snmp/mibs/idrac-smiv2.mib -H 74.201.248.76 -v2c -v2c -c public -w FAN --fan-warn=4000,6000 --fan-crit=3000,7000
System Board Fan1A: 4920 RPM - ENABLED/OK
System Board Fan1B: 4440 RPM - ENABLED/OK
System Board Fan2A: 4920 RPM - ENABLED/OK
System Board Fan2B: 4440 RPM - ENABLED/OK
System Board Fan3A: 4920 RPM - ENABLED/OK
System Board Fan3B: 4440 RPM - ENABLED/OK
System Board Fan4A: 4920 RPM - ENABLED/OK
System Board Fan4B: 4440 RPM - ENABLED/OK
System Board Fan5A: 4920 RPM - ENABLED/OK
System Board Fan5B: 4440 RPM - ENABLED/OK
System Board Fan6A: 4680 RPM - ENABLED/OK
System Board Fan6B: 4440 RPM - ENABLED/OK
[root@host dell_monitors]# ./check_idrac -m /root/.snmp/mibs/idrac-smiv2.mib -H 74.201.248.76 -v2c -v2c -c public -w FAN --fan-warn=4000,6000 --fan-crit=3000,7000 -p
Traceback (most recent call last):
File "./check_idrac", line 842, in
result, exit_code = PARSER().main()
File "./check_idrac", line 671, in main
perf_data = ' | RPM=%s;%s;%s;;' % (hw[3].split('(')[0], conf['fan_min_warn'], conf['fan_min_crit'])
KeyError: 'fan_min_warn'
But when run without -p
[root@host dell_monitors]# ./check_idrac -m /root/.snmp/mibs/idrac-smiv2.mib -H 74.201.248.76 -v2c -v2c -c public -w FAN#1 --fan-warn=4000,6000 --fan-crit=3000,7000
OK - System Board Fan1A: 4920 RPM - ENABLED/OK
Hello
I think if wrong credentials or missing parameters exist the check should return unknown state and a valid error message
e.g applies to:
Bad operator (INTEGER): At line 73 in /usr/share/mibs/ietf/SNMPv2-PDU
Unlinked OID in IPATM-IPMC-MIB: marsMIB ::= { mib-2 57 }
Undefined identifier: mib-2 near line 18 of /usr/share/mibs/ietf/IPATM-IPMC-MIB
Expected "::=" (RFC5644): At line 493 in /usr/share/mibs/iana/IANA-IPPM-METRICS-REGISTRY-MIB
Expected "{" (EOF): At line 651 in /usr/share/mibs/iana/IANA-IPPM-METRICS-REGISTRY-MIB
Bad object identifier: At line 651 in /usr/share/mibs/iana/IANA-IPPM-METRICS-REGISTRY-MIB
Bad parse of OBJECT-IDENTITY: At line 651 in /usr/share/mibs/iana/IANA-IPPM-METRICS-REGISTRY-MIB
snmpwalk: Authentication failure (incorrect password, community or key) (Sub-id not found: (top) -> virtualDiskTable)
Hello,
Thanks for the plugin, I try to check idrac v8 it was worked but I cannot see the disk and vdisk information on the command "./check_idrac_2.2 -H x.x.x.x -v 2c -c secret" and I use mib file from dell iDRAC-SMIv2.mib
How can I debug it to fix that my problem?
Thank you.
on some dell host's ive got the following error...
Plugin-Ausgabe
PS
--PS 1: OK, Volt I/O: 264 V/228.0 V, Current: 0.8 A, Watt I/O: 900.0 W/750.0 W
--PS 2: OK, Volt I/O: 264 V/228.0 V, Current: 0.2 A, Watt I/O: 900.0 W/750.0 W
FAN
--System Board Fan1: 4560 RPM - ENABLED/OK
--System Board Fan2: 4680 RPM - ENABLED/OK
--System Board Fan3: 4680 RPM - ENABLED/OK
--System Board Fan4: 4560 RPM - ENABLED/OK
--System Board Fan5: 4560 RPM - ENABLED/OK
--System Board Fan6: 4440 RPM - ENABLED/OK
BATTERY
--System Board CMOS Battery: ENABLED/OK [PRESENCEDETECTED]
--PERC1 ROMB Battery: ENABLED/OK [PRESENCEDETECTED]
--PERC2 ROMB Battery: ENABLED/OK [0]
PU
--PU 1: ENABLED/OK, RedundancyStatus: FULL, SystemBoard Pwr Consumption: 196 W
MEM
--Memory 1 (DIMM Socket A1) 32.0 GB/1333 MHz: ENABLED/OK [DDR3, Samsung, S/N: YYYYYYYYYYYY]
--Memory 2 (DIMM Socket A2) 32.0 GB/1333 MHz: ENABLED/OK [DDR3, Samsung, S/N: YYYYYYYYYYYY]
--Memory 3 (DIMM Socket A3) 32.0 GB/1333 MHz: ENABLED/OK [DDR3, Samsung, S/N: YYYYYYYYYYYY]
--Memory 4 (DIMM Socket A4) 32.0 GB/1333 MHz: ENABLED/OK [DDR3, Samsung, S/N: YYYYYYYYYYYY]
--Memory 5 (DIMM Socket A5) 32.0 GB/1333 MHz: ENABLED/OK [DDR3, Samsung, S/N: YYYYYYYYYYYY]
--Memory 6 (DIMM Socket B1) 32.0 GB/1333 MHz: ENABLED/OK [DDR3, Samsung, S/N: YYYYYYYYYYYY]
--Memory 7 (DIMM Socket B2) 32.0 GB/1333 MHz: ENABLED/OK [DDR3, Samsung, S/N: YYYYYYYYYYYY]
--Memory 8 (DIMM Socket B3) 32.0 GB/1333 MHz: ENABLED/OK [DDR3, Samsung, S/N: YYYYYYYYYYYY]
--Memory 9 (DIMM Socket B4) 32.0 GB/1333 MHz: ENABLED/OK [DDR3, Samsung, S/N: YYYYYYYYYYYY]
--Memory 10 (DIMM Socket B5) 32.0 GB/1333 MHz: ENABLED/OK [DDR3, Samsung, S/N: YYYYYYYYYYYY]
--Memory 11 (DIMM Socket A7) 32.0 GB/1333 MHz: ENABLED/OK [DDR3, Samsung, S/N: YYYYYYYYYYYY]
--Memory 12 (DIMM Socket A8) 32.0 GB/1333 MHz: ENABLED/OK [DDR3, Samsung, S/N: YYYYYYYYYYYY]
--Memory 13 (DIMM Socket B7) 32.0 GB/1333 MHz: ENABLED/OK [DDR3, Samsung, S/N: YYYYYYYYYYYY]
--Memory 14 (DIMM Socket B8) 32.0 GB/1333 MHz: ENABLED/OK [DDR3, Samsung, S/N: YYYYYYYYYYYY]
VDISK
--VDisk 1 (): OK/ONLINE, RAID-1 (136.13 GB), BadBlock: 0 [Virtual Disk 0 on Integrated RAID Controller 1]
DISK
--PDisk 1 (0:1:0) 136.13 GB: ONLINE, PowerStat: SPUNUP, HotSpare: no [TOSHIBA, S/N: YYYYYYYYYYYY]
--PDisk 2 (0:1:1) 136.13 GB: ONLINE, PowerStat: SPUNUP, HotSpare: no [TOSHIBA, S/N: YYYYYYYYYYYY]
Traceback (most recent call last):
File "/usr/lib/nagios/plugins/check_dell_idrac.py", line 885, in
result, tmp_code = PARSER(host, hw_info, hw_order, hw_no_alert, hw_mib, perf).main()
File "/usr/lib/nagios/plugins/check_dell_idrac.py", line 718, in main
else: hw_3 = round(float(hw[3])/10, 1)
ValueError: could not convert string to float: "CPU1 TEMP"
on 1 other host ive got a timeout.... (like this)
PS
--PS 1: OK, Volt I/O: 264 V/(N/A) V, Current: 0.2 A, Watt I/O: 900.0 W/750.0 W
--PS 2: OK, Volt I/O: 264 V/(N/A) V, Current: 21.0 A, Watt I/O: 900.0 W/750.0 W
FAN
--System Board Fan1: 3000 RPM - ENABLED/OK
--System Board Fan2: 2880 RPM - ENABLED/OK
--System Board Fan3: 3000 RPM - ENABLED/OK
--System Board Fan4: 2880 RPM - ENABLED/OK
--System Board Fan5: 3000 RPM - ENABLED/OK
--System Board Fan6: 3000 RPM - ENABLED/OK
BATTERY
--System Board CMOS Battery: ENABLED/OK [PRESENCEDETECTED]
PU
--PU 1: ENABLED/OK, RedundancyStatus: FULL, SystemBoard Pwr Consumption: 210 W
MEM
--Memory 1 (DIMM Socket A1) 32.0 GB/2400 MHz: ENABLED/OK [26, Samsung, S/N: YYYYYYYYYYYY]
--Memory 2 (DIMM Socket A2) 32.0 GB/2400 MHz: ENABLED/OK [26, Samsung, S/N: YYYYYYYYYYYY]
--Memory 3 (DIMM Socket A3) 32.0 GB/2400 MHz: ENABLED/OK [26, Samsung, S/N: YYYYYYYYYYYY]
--Memory 4 (DIMM Socket A4) 32.0 GB/2400 MHz: ENABLED/OK [26, Samsung, S/N: YYYYYYYYYYYY]
--Memory 5 (DIMM Socket A5) 32.0 GB/2400 MHz: ENABLED/OK [26, Samsung, S/N: YYYYYYYYYYYY]
--Memory 6 (DIMM Socket A6) 32.0 GB/2400 MHz: ENABLED/OK [26, Samsung, S/N: YYYYYYYYYYYY]
--Memory 7 (DIMM Socket A7) 32.0 GB/2400 MHz: ENABLED/OK [26, Samsung, S/N: YYYYYYYYYYYY]
--Memory 8 (DIMM Socket A8) 32.0 GB/2400 MHz: ENABLED/OK [26, Samsung, S/N: YYYYYYYYYYYY]
--Memory 9 (DIMM Socket B1) 32.0 GB/2400 MHz: ENABLED/OK [26, Samsung, S/N: YYYYYYYYYYYY]
--Memory 10 (DIMM Socket B2) 32.0 GB/2400 MHz: ENABLED/OK [26, Samsung, S/N: YYYYYYYYYYYY]
--Memory 11 (DIMM Socket B3) 32.0 GB/2400 MHz: ENABLED/OK [26, Samsung, S/N: YYYYYYYYYYYY]
--Memory 12 (DIMM Socket B4) 32.0 GB/2400 MHz: ENABLED/OK [26, Samsung, S/N: YYYYYYYYYYYY]
--Memory 13 (DIMM Socket B5) 32.0 GB/2400 MHz: ENABLED/OK [26, Samsung, S/N: YYYYYYYYYYYY]
--Memory 14 (DIMM Socket B6) 32.0 GB/2400 MHz: ENABLED/OK [26, Samsung, S/N: YYYYYYYYYYYY]
--Memory 15 (DIMM Socket B7) 32.0 GB/2400 MHz: ENABLED/OK [26, Samsung, S/N: YYYYYYYYYYYY]
--Memory 16 (DIMM Socket B8) 32.0 GB/2400 MHz: ENABLED/OK [26, Samsung, S/N: YYYYYYYYYYYY]
VDISK
--VDisk 1 (Volume0): OK/ONLINE, RAID-1 (185.75 GB), BadBlock: 0 [Virtual Disk 0 on Integrated RAID Controller 1]
DISK
--PDisk 1 (0:1:0) 185.75 GB: ONLINE, PowerStat: ON, HotSpare: no [ATA, S/N: YYYYYYYYYYYY]
--PDisk 2 (0:1:1) 185.75 GB: ONLINE, PowerStat: ON, HotSpare: no [ATA, S/N: YYYYYYYYYYYY]
SNMP Timeout!
Hello,
Is it possible to set NON-RAID status as not a warning? A lot of servers have NON-RAID as normal setup
When using warning/critical limits, e.g.:
./idrac_2.2rc4 -p -H x.x.x.x -v2c -c public -m /idrac-smiv2.mib -w SENSOR#4 --temp-warn 29,40 --temp-crit 25,75
, the (!)/(!!) string is added to the numeric recorded value of the monitored value:
tmp[key][stat_t] += '(!)'
Later threshold checks fail after this is added with the following error:
Traceback (most recent call last):
File "./idrac_2.2rc4", line 841, in
result, exit_code = PARSER().main()
File "./idrac_2.2rc4", line 679, in main
hw_dict, exit_code = self.raise_alert(hw_dict, value_on_alert)
File "./idrac_2.2rc4", line 582, in raise_alert
if float(tmp[key][stat_t]) <= conf['sensor_thresholds'][0]:
As a workaround I've stripped these characters from the recorded value:
if float(tmp[key][stat_t]) <= conf['sensor_thresholds'][0]:
if float(tmp[key][stat_t].strip('(!)')) <= conf['sensor_thresholds'][0]:
Maybe there's a better way to do it, but this did the trick for us.
Regards,
Ovidiu
Line 697 in the script should be replaced with
perf_data = ' | \'%s\'=%s;;;%s;%s' \
so that the temperature labels don't have spaces in them and are read correctly by the monitoring system
Hi there,
i'm trying to monitor my Dell server hardware through the iDrac but i have the error: SNMP timeout!
i have followed the instructions, i have copied the MIB and check_idrac files int the correct folder, the permission should be ok as well.
i have created my command into Nagios which is: "$USER1$/check_idrac -H
i'm able to ping the host from Nagios with command check_icmp.
any idea please ?
many thanks in advance,
djano
Hi
When running the latest version of this script 2,2rc4 with a simple "./check_idrac -H x.x.x.x -v2c -c public" the output looks good until the end where it displays the following error at the command line:
PS
--PS 1: OK, Volt I/O: 264 V/(N/A) V, Current: 0.1 A, Watt I/O: 1260.0 W/1100 W
--PS 2: OK, Volt I/O: 264 V/(N/A) V, Current: 0.0 A, Watt I/O: 1260.0 W/1100 W
DISK
--PDisk 1 (0:1:0) 1117.25 GB: ONLINE, PowerStat: SPUNUP, HotSpare: no [HGST, HDD, S/N: 0EGV1HGF]
--PDisk 2 (0:1:1) 1117.25 GB: ONLINE, PowerStat: SPUNUP, HotSpare: no [HGST, HDD, S/N: 0EGV79GF]
--PDisk 3 (0:1:2) 1117.25 GB: ONLINE, PowerStat: SPUNUP, HotSpare: no [HGST, HDD, S/N: 0EGV6BXF]
--PDisk 4 (0:1:3) 1117.25 GB: ONLINE, PowerStat: SPUNUP, HotSpare: no [HGST, HDD, S/N: 0EGV4X7F]
FAN
--System Board Fan1: 1440 RPM - ENABLED/OK
BATTERY
--System Board CMOS Battery: ENABLED/OK [PRESENCEDETECTED]
--PERC ROMB Battery: ENABLED/OK [PRESENCEDETECTED]
PU
--PU 1: ENABLED/OK, RedundancyStatus: FULL, SystemBoard Pwr Consumption: 84 W
MEM
--Memory 1 (DIMM Socket A1) 8.0 GB/2400 MHz: ENABLED/OK [26, Micron Technology, S/N: 12E5CCF2]
--Memory 2 (DIMM Socket A2) 8.0 GB/2400 MHz: ENABLED/OK [26, Micron Technology, S/N: 12E5CEB0]
--Memory 3 (DIMM Socket A3) 8.0 GB/2400 MHz: ENABLED/OK [26, Micron Technology, S/N: 12E5CE55]
--Memory 4 (DIMM Socket A4) 8.0 GB/2400 MHz: ENABLED/OK [26, Micron Technology, S/N: 12E5CCBA]
VDISK
--VDisk 1 (DATA): OK/ONLINE, RAID-10 (2234.5 GB), BadBlock: 0 [Virtual Disk 0 on RAID Controller in Slot 3]
Traceback (most recent call last):
File "./check_idrac", line 847, in
result, tmp_code = PARSER().main()
File "./check_idrac", line 643, in main
hw_dict = self.classifier(snmp_data, hw_dict) # classify data
File "./check_idrac", line 412, in classifier
item_order = int(_.split()[0].split('.')[-1])
ValueError: invalid literal for int() with base 10: 'Bad'
I believe as a result when this is implemented within Nagios I get a Status of WARNING. Individually when all items are checked rather than all at once - the output is OK.
hello i cant use the cli python
run python idrac_2.2rc4 -H xxx -c public -v 2c
then it say no community name spesified and run snmpwalk
USAGE: snmpwalk [OPTIONS] AGENT [OID]
Hello,
Many thanks for this script. It saved us to monitoring ESXis with different version threw Nagios.
Everything is working fine, but i noticed a small bug.
All files are the default files from git, also i have the latest version.
In one server , there is a warning for Correctable memory error . The check finds that the RAM Dimm is in NONCRITICAL state, but it reports as OK.
In other servers, when a component is in NONCRITICAL, for example, ROM Battery, then the Nagios check is in Warning state.
[root@lhvmsrv120 check_idrac-master]# /usr/lib64/nagios/plugins/check_idrac-master/idrac_2.2rc4 -H 10.168.2.64 -v 2c -c public -m /usr/lib64/nagios/plugins/check_idrac-master/idrac-smiv2.mib -w MEM#6
OK - Memory 6 (DIMM Socket A6) 32.0 GB/2133 MHz: ENABLED/NONCRITICAL [26, Hynix Semiconductor, S/N: 10CAA11A]
Can you please check ?
Thank you,
John
Hi,
I've an interesting issue with check_idrac on one of our Dell servers (out of probably 100). When I'm running check_idrac, it successfully executes until "System Board Inlet Temp" and then it raises the following exception:
--PDisk 12 (0:1:12) 278.88 GB: ONLINE, PowerStat: SPUNUP, HotSpare: no [HGST, HDD, S/N: XXXXXXXX]
--PDisk 13 (0:1:13) 278.88 GB: ONLINE, PowerStat: SPUNUP, HotSpare: no [HGST, HDD, S/N: YYYYYYYY]
Traceback (most recent call last):
File "/etc/icinga2/scripts/idrac", line 842, in <module>
result, tmp_code = PARSER().main()
File "/etc/icinga2/scripts/idrac", line 688, in main
hw_dict, exit_code = self.raise_alert(hw_dict, value_on_alert)
File "/etc/icinga2/scripts/idrac", line 576, in raise_alert
tmp[key][stat_t] = float(tmp[key][stat_t].strip('(!)'))/10
ValueError: could not convert string to float: "System Board Inlet Temp"
On other servers the output looks like:
--PDisk 14 (0:1:13) 278.88 GB: ONLINE, PowerStat: SPUNUP, HotSpare: no [HGST, HDD, S/N: YYYYYYYY]
--System Board Inlet Temp: 21.0 C ENABLED/OK | Temperature=21.0;;;;
I did some debugging and it looks like that tmp[key][stat_t]
holds different values on a working and non-working server:
210
"System Board Inlet Temp"
Thank you,
Matthias
Compared generation 1, I get a lot of SNMP timeout situations when I ask the script to check all hardware (i.e. when I do not provide a -w option). I'm not sure why that is, but I've returned to generation 1 for the time being.
By the way, when the timeouts occur, v. 2.0b6 yields a WARNING (code 1). I believe this is not how it's supposed to be: I believe that it should yield an UNKNOWN (code 3) state. So I suggest adjusting line 439 so that
sys.exit(1)
is changed to
sys.exit(3)
Hi -
I cant seem to figure out why this does not output any type of perfdata which nagios can consume. I read that the --no-alert disables performance data but even the bare data output does not contain any performance data in the way nagios expects it. Nagios expects it as pipe delimeted (meaning the rest of data after | is the perfdata) and then tag valued pair (data_point=value_point) separated by white space to indicate what should be plotted
Did I miss something?
Thanks for your help! Awesome script overall!
Hi,
I would like to use your script to monitor my Dell iDRAC, but when I launch it against any of my host, I get the following errror :
Traceback (most recent call last):
File "./idrac_2.2rc4", line 848, in <module>
result, tmp_code = PARSER().main()
File "./idrac_2.2rc4", line 765, in main
else: hw_4 = float(hw[4])/10
ValueError: could not convert string to float: (n/a)
Did you got an idea to address this please ?
Guillaume T.
Hi,
When I remove power on a PS to test a critical problem, i get that a hardware not found. (PS#2)
/usr/lib/nagios/plugins/check_idrac.py -H 10.x.x.x -v2c -c public -w PS#1
OK - PS 1: OK, Volt I/O: 264 V/208 V, Current: 0.4 A, Watt I/O: 900.0 W/750 W
/usr/lib/nagios/plugins/check_idrac.py -H 10.x.x.x -v2c -c public -w PS#2
hardware not found! If you sure the hw exists then you may want to edit TRANSLATOR code (line 612).
/usr/lib/nagios/plugins/check_idrac.py -H 10.x.x.x -v2c -c public -w PS
PS 1: OK, Volt I/O: 264 V/208 V, Current: 0.4 A, Watt I/O: 900.0 W/750 W
PS 2: CRITICAL(!!), Volt I/O: 264 V/(N/A) V, Current: (N/A) A, Watt I/O: 900.0 W/750 W
I was able to correct this, but i think the solution is not permanent one.
I added thoses lines at line 388:
if self.hardware[2] == 'PS' :
output = output.replace('No Such Instance currently exists at this OID','0')
else:
That way I get a CRIT on PS#2.
/usr/lib/nagios/plugins/check_idrac.py -H 10.x.x.x -v2c -c public -w PS#2
CRIT - PS 2: CRITICAL(!!), Volt I/O: 264 V/(N/A) V, Current: 0.0 A, Watt I/O: 900.0 W/750 W
Regards,
Mike
Hi,
Is it possible to add WARNING when a NONCRITICAL error occur?
Via iDRAC I found "Correctable memory error rate exceeded for DIMM_B12".
This is reported via this script as:
Memory 24 (DIMM Socket B12) 32.0 GB/1333 MHz: ENABLED/NONCRITICAL [DDR3, Hynix Semiconductor, S/N: 318E5BCD]
But the exit code is still 0 and a status OK is returned in Nagios, I would like to see this as a WARNING state.
Thanks for a great plugin
Right now, when I use the script to check one of my servers, it prints all the hardware status, even if it's OK. This is a lot of data to sort through.
Is there a way to only output hardware status if the hardware has a problem?
I am kind of confused:
where are the python files you are talking about in the howto?
This check seems to be exactly what I am looking for. Please help.
Regards
Hi,
tested your script today and saw, that there is always no data rot the fans. is this correct?
Regards,
Marcus
The following occurs when I run check_idrac against a server with iDRAC 8 that currently has last redundancy on 1 PSU:
Traceback (most recent call last):
File "./check_idrac", line 874, in <module>
result, tmp_code = PARSER().main()
File "./check_idrac", line 773, in main
else: hw_4 = float(hw[4]/10
I (temporarily) fixed this by changing line 773 as follows:
else: hw_4 = hw[4].split('(')[0]
I'm no advanced programmer, and I have no idea if that was the proper thing to do, but it seemed reasonable as I was looking at some of the surrounding code.
It should be noted that there is no issue with the original code when it runs against a healthy iDRAC 8 machine.
Check of whole hardware with config file does not work. It shows "SNMP version None not supported"-error message. Check of single hardware with config file works fine. Is there missing config_check in check_all part?
./idrac_2.0b5 -H SOME.HOST.NAME -c public
yields:
coercing to Unicode: need string or buffer, NoneType found
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.