GithubHelp home page GithubHelp logo

check_idrac's People

Contributors

andrewpkent avatar ballestr avatar chas0rde avatar dangmocrang avatar giudig avatar odenbach avatar ovidiustanila avatar technotaff avatar theinvisible avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

check_idrac's Issues

Memory alert

Hi guys,

I have the DIMM Socket B6 with status NONCRITICAL and I don't receive any alert, bellow the result, I'm using the new version 2.2rc4.

Memory 1 (DIMM Socket A1) 8.0 GB/1333 MHz: ENABLED/OK [DDR3, Kingston, S/N: D2268486]
Memory 2 (DIMM Socket A2) 8.0 GB/1333 MHz: ENABLED/OK [DDR3, Kingston, S/N: B8267986]
Memory 3 (DIMM Socket A3) 8.0 GB/1333 MHz: ENABLED/OK [DDR3, Kingston, S/N: CD268D86]
Memory 4 (DIMM Socket A4) 8.0 GB/1333 MHz: ENABLED/OK [DDR3, Kingston, S/N: CA267D86]
Memory 5 (DIMM Socket A5) 8.0 GB/1333 MHz: ENABLED/OK [DDR3, Kingston, S/N: CE268186]
Memory 6 (DIMM Socket A6) 8.0 GB/1333 MHz: ENABLED/OK [DDR3, Kingston, S/N: CE267E86]
Memory 7 (DIMM Socket A7) 8.0 GB/1333 MHz: ENABLED/OK [DDR3, Kingston, S/N: CE226FB9]
Memory 8 (DIMM Socket A8) 8.0 GB/1333 MHz: ENABLED/OK [DDR3, Kingston, S/N: CE226EB9]
Memory 9 (DIMM Socket B1) 8.0 GB/1333 MHz: ENABLED/OK [DDR3, Kingston, S/N: D2268F86]
Memory 10 (DIMM Socket B2) 8.0 GB/1333 MHz: ENABLED/OK [DDR3, Kingston, S/N: D1268286]
Memory 11 (DIMM Socket B3) 8.0 GB/1333 MHz: ENABLED/OK [DDR3, Kingston, S/N: 2826E089]
Memory 12 (DIMM Socket B4) 8.0 GB/1333 MHz: ENABLED/OK [DDR3, Kingston, S/N: CD267D86]
Memory 13 (DIMM Socket B5) 8.0 GB/1333 MHz: ENABLED/OK [DDR3, Kingston, S/N: CA268386]
Memory 14 (DIMM Socket B6) 8.0 GB/1333 MHz: ENABLED/NONCRITICAL [DDR3, Kingston, S/N: CD267986]
Memory 15 (DIMM Socket B7) 8.0 GB/1333 MHz: ENABLED/OK [DDR3, Kingston, S/N: CF267C86]
Memory 16 (DIMM Socket B8) 8.0 GB/1333 MHz: ENABLED/OK [DDR3, Kingston, S/N: D1267A86]

Regards,

Marcos Dutra

iDRAC8 / Poweredge R930 / DISK ON(!)

Hi!
DISK check is WARNING, because it returns ON(!). Can you implement that state? Atm I have the -n option activated...which is not best practice :D Or am I doing smth wrong?

screen shot 2015-11-05 at 14 09 42

Thank you! Great great Plugin!

Handeling of DEGRADED vDisk

First of all thanks for the plugin!
We're trying to implement it in our environment, but currently hit a dead end with the monitoring part for storage system.
We’re using a RAID 1 configuration for our System Disk and if one Disk is removed or faulted it should show up as a Critical event.
But in our tests the result we’ve got after removing one of the two disks is only that it is non critical and the system neither logs a warning nor a critical state.
Is there a way to set the alert threshold in a way so that a disk failing, or being removed triggers a warning?
Thank you.
Best Regards!

Specifying hardware number requires configuration file?

The example in MANUAL.md shows checking a specific DIMM and not using a configuration file, but when I try it states that I need to use a configuration file.

Check specific hardware
./idrac2.1.py -H 10.10.10.20 -v2c -c public -w MEM#3

Using -v2c public for snmpwalk calls

Thank you for making this plugin and putting it all together.

Can I use the equivalent of -v2c public ?

I get the following errors specifying this in the configuration file

[snmp]
version = '2c'
community = 'public'

This is the error I get in Nagios:

iDRAC Server Health;WARNING;notify-service-by-email;snmp version "None" is not supported!

When I run this manually and append -v2c everything works however - e.g.

/usr/lib64/nagios/plugins/idrac_2.2rc4 -H 10.12.64.131 -v2c -c public -m /usr/share/snmp/mibs/idrac-smiv2.mib

I've also tried version = '2' to no avail.

I see it's unset here in the code: https://github.com/dangmocrang/check_idrac/blob/master/idrac_2.2rc4#L17 but seems ugly to force it there, I'd like to use the config file you are shipping.

What am I doing wrong?

ROMB Battery returns value of 0 or 2147483648 when not available

Hi,

thanks for this neat plugin.
Some issues I notices with the battery module.

1.
The web interface is giving me two batteries:
System Board CMOS Battery - Good
PERC1 ROMB Battery - Good

I'm wondering why the plugin gives back PERC1 and 2 and the second with a "0"?

BATTERY
--System Board CMOS Battery: ENABLED/OK [PRESENCEDETECTED]
--PERC1 ROMB Battery: ENABLED/OK [PRESENCEDETECTED]
--PERC2 ROMB Battery: ENABLED/OK [0(!)]

I think this shouldn't be listed (or can I catch this state in the conf?)

2.
Another server but maybe the same issue:

Web interfaces just have:
System Board CMOS Battery - Good

Plugin returns:

BATTERY
--System Board CMOS Battery: ENABLED/OK [PRESENCEDETECTED]
--PERC1 ROMB Battery: ENABLED/OK [-2147483648(!)]
--PERC2 ROMB Battery: ENABLED/OK [-2147483648(!)]

Regards
Ricardo

Unable to use specific hardware such as FAN#1 in nagios config

as nagios config file takes # as comment, i am unable to use # in specific hardware check such as FAN#1
./check_idrac -m /usr/share/snmp/mibs/idrac.mib -H -v2c -c @apfin@nce -w FAN#1
while defining service description in nagios config it takes # as comment. can we use "-" instead of "#"? Please reply.

output problems

Greetings.

I'm having trouble with the output of the check_idrac plugin.

The output and the performance data are not formatted correctly. According to https://nagios-plugins.org/doc/guidelines.html , the output should look like this:

SERVICE STATUS: text of output | performance data

However, the check_idrac plugin output looks like this:

System Board Inlet Temp: 23.0 C ENABLED/OK | Temperature=23.0;;;;
CPU1 Temp: 53.0 C ENABLED/OK | Temperature=53.0;;;;
CPU2 Temp: 46.0 C ENABLED/OK | Temperature=46.0;;;;

Nagios is not interpreting the output correctly, and this is causing problems.

Can you please look into this and adjust the output to match the Nagios plugin standards?

Pulling a single PDisk gives different result than pulling the whole group.

Hi,

awesome tool, but we have some problems getting the status of our physical disks. Three values are switched from the group view in comparison to the view of a single disk which causes an unproblematic drive to throw a warning.

Here is an example:

idrac_pdisk

I asume that the order of the values returned by snmpget is different from the ones returned by snmpwalk (which is used for groups and all) which then causes the wrong output and a false warning alert.

We have tested with different systems which all produce the same result.

When I set the value_on_alert Array in line 655 to [3,8,9] instead of [3,7,9] I can get rid of the false warning but still get the switched ouput. In order to fix the output I have to manipulate the order in lines 667 - 669 which then messes up the group output of PDisk.

If you need any more information, let me know,

best regards

Raphael Rehberg

--perf has no effect

[root@REDACTED ~]# /usr/local/nagios/libexec/idrac_2.0b9.py -H REDACTED -c REDACTED -w FAN --perf
System Board Fan1A: 2400 RPM - ENABLED/OK
System Board Fan1B: 1560 RPM - ENABLED/OK
System Board Fan2A: 3720 RPM - ENABLED/OK
System Board Fan2B: 2640 RPM - ENABLED/OK
System Board Fan3A: 2520 RPM - ENABLED/OK
System Board Fan3B: 1800 RPM - ENABLED/OK
System Board Fan4A: 2520 RPM - ENABLED/OK
System Board Fan4B: 1800 RPM - ENABLED/OK
System Board Fan5A: 2400 RPM - ENABLED/OK
System Board Fan5B: 1800 RPM - ENABLED/OK
[root@REDACTED ~]# /usr/local/nagios/libexec/idrac_2.0b9.py -H REDACTED -c REDACTED -w FAN
System Board Fan1A: 2400 RPM - ENABLED/OK
System Board Fan1B: 1560 RPM - ENABLED/OK
System Board Fan2A: 3720 RPM - ENABLED/OK
System Board Fan2B: 2640 RPM - ENABLED/OK
System Board Fan3A: 2520 RPM - ENABLED/OK
System Board Fan3B: 1800 RPM - ENABLED/OK
System Board Fan4A: 2520 RPM - ENABLED/OK
System Board Fan4B: 1800 RPM - ENABLED/OK
System Board Fan5A: 2400 RPM - ENABLED/OK
System Board Fan5B: 1800 RPM - ENABLED/OK

Issue with PS check

Hi Dung,
Thanks for your great job regarding this check.
I met some issues with PS check on differents servers (2 x PE R420, 1 x PE R720, 2 x PE R620, 2 x PE R630, 2 x PE R430).
Let's have a look to a PE R630:
root@srv020:/usr/lib/nagios/plugins# ./check_idrac -H srv001-idrac -m ./idrac-smiv2.mib -c public -w PS -f check_idrac.conf
PS 1: OK, Volt I/O: 264 V/(N/A) V, Current: 0.6 A, Watt I/O: 594.0 W/495.0 W
PS 2: OK, Volt I/O: 264 V/(N/A) V, Current: 0.2 A, Watt I/O: 594.0 W/495.0 W
root@srv020:/usr/lib/nagios/plugins# ./check_idrac -H srv001-idrac -m ./idrac-smiv2.mib -c public -w PS#1 -f check_idrac.conf
CRIT - PS 1: OK, Volt I/O: 264 V/0(!!) V, Current: 0.6(!!) A, Watt I/O: 594.0 W/495(!!) W
root@srv020:/usr/lib/nagios/plugins# ./check_idrac -H srv001-idrac -m ./idrac-smiv2.mib -c public -w PS#2 -f check_idrac.conf
CRIT - PS 2: OK, Volt I/O: 264 V/0(!!) V, Current: 0.2(!!) A, Watt I/O: 594.0 W/495(!!) W

My check_idrac.conf is the default one. Why does it quit with Critical status whereas the comment is OK?
I got this result with each server. The check runs on Debian Wheezy with Python 2.7.3

Thanks for your help.

Bruno

your MIB may out of dated!

Hello,

Thank you for your wonderful plugin, with the latest version I keep getting this error :

your MIB may out of dated!

I downloaded the lastest MIBs from Dell, I'm running a Dell R730 but I get the same error no matter what I try :

./idrac_2.2rc4 -H 192.168.1.24 -v2c -c public -m /root/tools/check_idrac-master/mymibs/mibs/iDRAC-SMIv2.mib

Here are my specs :

FreeBSD nagios 11.0-RELEASE-p2 FreeBSD 11.0-RELEASE-p2 #0: Mon Oct 24 06:55:27 UTC 2016 [email protected]:/usr/obj/usr/src/sys/GENERIC amd64

root@nagios:~/tools/check_idrac-master # python --version
Python 2.7.13

Can you please help me out with this.

index out of range

Hello i'm running iDrac 9 on R620 server.
This is the output of my command : ./idrac_2.2rc4 -H 10.50.19.66 -c public -v2c -m ./iDRAC-SMIv2.mib
PS

Traceback (most recent call last):
File "./idrac_2.2rc4", line 848, in
result, tmp_code = PARSER().main()
File "./idrac_2.2rc4", line 662, in main
output.append(tmp % (self.hardware[2], hw[0], hw[1].split()[-1].replace('"', ''),
IndexError: list index out of range

Any idea ?

PERC BBU is not available on systemBattery

After upgrading the firmware of iDrac 7 to 2.60.60.60, systemBattery group (1.3.6.1.4.1.674.10892.5.4.600.50.1.5) does not have information about the PERC battery (BBU) anymore.

The following OID still works:
1.3.6.1.4.1.674.10892.5.5.1.20.130.15

EMPTY values output

Hello there.

I have a problem with using this script.

[root@localhost check_idrac-master]# ./idrac_2.0b7 -H localhost -c public

PS

FAN

BATTERY

PU

MEM

VDISK

DISK

SENSOR

CPU


Somebody can help with this issue ?
Thanks,

systemStateGlobalSystemStatus

Hello,

I have using now the script for a 2 days. I have new Dell servers. 1 have failed today. But I get no nagios messages.

I see in nagios.

systemStateGlobalSystemStatus critical

But no alerts. Its get a green message.
When I changing the --critical= its give the same green message.

DISK NON-RAID flag generates a warning

Hi,

I just configured your plugin with nagios and I managed to make it work with a server that uses iDrac9, I had to use the latest version you posted on another issue because I was having the same problem with warnings on disks when queried singularly.
But I still have warnings on most of my disks like this one:

PDisk 5 (0:1:0) 3726.02 GB: NON-RAID(!), PowerStat: SPUNUP, HotSpare: no [TOSHIBA, HDD, S/N: 39A8K5Z7F7DE] isFailing: 0

This will generate a warning on nagios I am guessing because of the NON-RAID flag. The thing is that it is supposed to be like that because those disks are not attached to a raid controller as it is an array of disks used in a ZFS pool.

I think it would be needed a flag to suppress this warning.

Otherwise great plugin!

Thanks.

Add IPv6 support

check_idrac currently didn't support IPv6

Adding IPv6 support is easy because it require just to prefix host value with 'udp6:' when calling SNMP commands.

Please find attached a patch:
idrac_2.2rc4_ipv6.patch.txt

iDrac 8

Hi

did you have any chance to test it on iDrac 8 version? I'm getting error.
[user@admin]$ python idrac_2.2rc1 -H 10.199.227.208 -c public -v2c

PS

FAN

your MIB may out of dated!
error - systemBattery: Unknown Object Identifier (Sub-id not found: (top) -> systemBattery)

Timeout

Hi,

I'm seeing a timeout for one of my dracs for the Fan check when using the Check_iDRAC for DELL iDRAC found on https://exchange.nagios.org/directory/P ... AC/details.

/usr/local/nagios/libexec/check_idrac_health -H 172.16.0.39 -v2c -c XXXXX

PS
--PS 1: OK, Volt I/O: 264 V/230 V, Current: 0.6 A, Watt I/O: 900.0 W/750 W
--PS 2: OK, Volt I/O: 264 V/230 V, Current: 0.2 A, Watt I/O: 900.0 W/750 W
SNMP timeout!

It's not exactly a timeout though, if I comment the out the elif statement that prints the 'SNMP timeout!', as shown below. I can see that values for the Fan check are actually found. Can you please help with this issue?

def get_snmp(self, oids):
cmd_v3 = '%s %s -O q -v %s -u %s -l %s -a %s -A %s -x %s -X %s %s -m %s'
% (self.snmp_command, self.host, conf['snmp_version'], conf['snmp_security_name'],
conf['snmp_security_level'],
conf['snmp_authentication_protocol'], conf['snmp_authentication_password'],
conf['snmp_privacy_protocol'], conf['snmp_privacy_password'],
oids, self.mib_file)
cmd_v2 = '%s %s -O q -v %s -c %s %s -m %s'
% (self.snmp_command, self.host, conf['snmp_version'],
conf['snmp_community'],
oids, self.mib_file)
available_cmd = {'3': cmd_v3, '2c': cmd_v2}
snmp_cli = available_cmd[conf['snmp_version']]
status, output = run(snmp_cli) # query snmp data
if status != 0:
if 'Unknown Object Identifier' in output:
print 'your MIB may out of dated!'
print 'error -', output.replace('\n', '. ')
#elif 'Timeout:' in output:

print 'SNMP timeout!'

else:
print output
sys.exit(1)

IDRAC-MIB-SMIv2::coolingDeviceReading.1.1 3720
IDRAC-MIB-SMIv2::coolingDeviceReading.1.2 3840
IDRAC-MIB-SMIv2::coolingDeviceReading.1.3 3840
IDRAC-MIB-SMIv2::coolingDeviceReading.1.4 3720
IDRAC-MIB-SMIv2::coolingDeviceReading.1.5 3720
IDRAC-MIB-SMIv2::coolingDeviceReading.1.6 3840
IDRAC-MIB-SMIv2::coolingDeviceReading.1.7 3840
IDRAC-MIB-SMIv2::coolingDeviceReading.1.8 2400
IDRAC-MIB-SMIv2::coolingDeviceReading.1.9 2400
IDRAC-MIB-SMIv2::coolingDeviceReading.1.10 2400
IDRAC-MIB-SMIv2::coolingDeviceReading.1.11 2280
IDRAC-MIB-SMIv2::coolingDeviceReading.1.12 2280
IDRAC-MIB-SMIv2::coolingDeviceReading.1.13 2400
IDRAC-MIB-SMIv2::coolTimeout: No Response from 172.16.0.39
ingDeviceReading.1.14 2280
IDRAC-MIB-SMIv2::coolingDeviceType.1.1 coolingDeviceTypeIsAFan
IDRAC-MIB-SMIv2::coolingDeviceType.1.2 coolingDeviceTypeIsAFan
IDRAC-MIB-SMIv2::coolingDeviceType.1.3 coolingDeviceTypeIsAFan
IDRAC-MIB-SMIv2::coolingDeviceType.1.4 coolingDeviceTypeIsAFan
IDRAC-MIB-SMIv2::coolingDeviceType.1.5 coolingDeviceTypeIsAFan
IDRAC-MIB-SMIv2::coolingDeviceType.1.6 coolingDeviceTypeIsAFan
IDRAC-MIB-SMIv2::coolingDeviceType.1.7 coolingDeviceTypeIsAFan
IDRAC-MIB-SMIv2::coolingDeviceType.1.8 coolingDeviceTypeIsAFan
IDRAC-MIB-SMIv2::coolingDeviceType.1.9 coolingDeviceTypeIsAFan
IDRAC-MIB-SMIv2::coolingDeviceType.1.10 coolingDeviceTypeIsAFan
IDRAC-MIB-SMIv2::coolingDeviceType.1.11 coolingDeviceTypeIsAFan
IDRAC-MIB-SMIv2::coolingDeviceType.1.12 coolingDeviceTypeIsAFan
IDRAC-MIB-SMIv2::coolingDeviceType.1.13 coolingDeviceTypeIsAFan
IDRAC-MIB-SMIv2::coolingDeviceType.1.14 coolingDeviceTypeIsAFan

check a group of

hi,
and thank you for this great plugin!
is it possible to monitor a group of hardware items in the nagios way without a check for every single module?
for example the memory. i use this check
idrac_2.0b9 -H 10.0.1.3 -f check_idrac.conf -p -w MEM
and i get this result:
Memory 1 (DIMM Socket A1) 8.0 GB/1333 MHz: ENABLED/OK [DDR3, Micron Technology, S/N: E09DD683]
Memory 2 (DIMM Socket A2) 8.0 GB/1333 MHz: ENABLED/OK [DDR3, Micron Technology, S/N: E09DD684]
Memory 3 (DIMM Socket A3) 8.0 GB/1333 MHz: ENABLED/OK [DDR3, Micron Technology, S/N: E09DD686]
Memory 4 (DIMM Socket A4) 8.0 GB/1333 MHz: ENABLED/OK [DDR3, Micron Technology, S/N: E09DD685]
Memory 5 (DIMM Socket B1) 8.0 GB/1333 MHz: ENABLED/OK [DDR3, Micron Technology, S/N: E09DD673]
Memory 6 (DIMM Socket B2) 8.0 GB/1333 MHz: ENABLED/OK [DDR3, Micron Technology, S/N: E09DD670]
Memory 7 (DIMM Socket B3) 8.0 GB/1333 MHz: ENABLED/OK [DDR3, Micron Technology, S/N: E09DD67F]
Memory 8 (DIMM Socket B4) 8.0 GB/1333 MHz: ENABLED/OK [DDR3, Micron Technology, S/N: E09DD682]
Memory 9 (DIMM Socket A5) 8.0 GB/1333 MHz: ENABLED/OK [DDR3, Micron Technology, S/N: E09DD583]
Memory 10 (DIMM Socket A6) 8.0 GB/1333 MHz: ENABLED/OK [DDR3, Micron Technology, S/N: E09DD582]
Memory 11 (DIMM Socket A7) 8.0 GB/1333 MHz: ENABLED/OK [DDR3, Micron Technology, S/N: E09DD61F]
Memory 12 (DIMM Socket A8) 8.0 GB/1333 MHz: ENABLED/OK [DDR3, Micron Technology, S/N: E09DD579]
Memory 13 (DIMM Socket B5) 8.0 GB/1333 MHz: ENABLED/OK [DDR3, Micron Technology, S/N: E09DD57F]
Memory 14 (DIMM Socket B6) 8.0 GB/1333 MHz: ENABLED/OK [DDR3, Micron Technology, S/N: E09DD581]
Memory 15 (DIMM Socket B7) 8.0 GB/1333 MHz: ENABLED/OK [DDR3, Micron Technology, S/N: E09DD57E]
Memory 16 (DIMM Socket B8) 8.0 GB/1333 MHz: ENABLED/OK [DDR3, Micron Technology, S/N: E09DD580]
Memory 17 (DIMM Socket A9) 16.0 GB/1600 MHz: ENABLED/OK [DDR3, Samsung, S/N: 01BE535B]
Memory 18 (DIMM Socket A10) 16.0 GB/1600 MHz: ENABLED/OK [DDR3, Samsung, S/N: 01BE575A]
Memory 19 (DIMM Socket A11) 16.0 GB/1600 MHz: ENABLED/OK [DDR3, Samsung, S/N: 01BE570C]
Memory 20 (DIMM Socket A12) 16.0 GB/1600 MHz: ENABLED/OK [DDR3, Samsung, S/N: 01BE5365]
Memory 21 (DIMM Socket B9) 16.0 GB/1600 MHz: ENABLED/OK [DDR3, Samsung, S/N: 01BE533E]
Memory 22 (DIMM Socket B10) 16.0 GB/1600 MHz: ENABLED/OK [DDR3, Samsung, S/N: 01BE534C]
Memory 23 (DIMM Socket B11) 16.0 GB/1600 MHz: ENABLED/OK [DDR3, Samsung, S/N: 01BE57D6]
Memory 24 (DIMM Socket B12) 16.0 GB/1600 MHz: ENABLED/OK [DDR3, Samsung, S/N: 01BE57F6]

checking every module is painful. is it possible to get the global status of all memory modules without checking every single module?
thank you in advance
barosch

Check of all components of Dell PowerEdge R730xd with SSDs gives warning

Hi Dung,

thanks for your script, it's so helpfull for me. :-)
when i'm checking all components of a Poweredge R730xd Hardware i receive always an warning state. I think that's because some SSDs where in "PowerStat; 5(!)" (and not e.g. "PowerStat: SPUNUP", like HDDs). You can see my Output at PDisk 16 + 17 (SSDs) by checking all states:

PS
--PS 1: OK, Volt I/O: 264 V/(N/A) V, Current: 0.8 A, Watt I/O: 900.0 W/750 W
--PS 2: OK, Volt I/O: 264 V/(N/A) V, Current: 0.2 A, Watt I/O: 900.0 W/750 W
FAN
--System Board Fan1: 3600 RPM - ENABLED/OK
--System Board Fan2: 3720 RPM - ENABLED/OK
--System Board Fan3: 3600 RPM - ENABLED/OK
--System Board Fan4: 3480 RPM - ENABLED/OK
--System Board Fan5: 3600 RPM - ENABLED/OK
--System Board Fan6: 3600 RPM - ENABLED/OK
BATTERY
--System Board CMOS Battery: ENABLED/OK [PRESENCEDETECTED]
--PERC1 ROMB Battery: ENABLED/OK [PRESENCEDETECTED]
PU
--PU 1: ENABLED/OK, RedundancyStatus: FULL, SystemBoard Pwr Consumption: 182 W
MEM
--Memory 1 (DIMM Socket A1) 8.0 GB/2133 MHz: ENABLED/OK [26, Samsung, S/N: ]
--Memory 2 (DIMM Socket A2) 8.0 GB/2133 MHz: ENABLED/OK [26, Samsung, S/N: ***
]
--Memory 3 (DIMM Socket A3) 8.0 GB/2133 MHz: ENABLED/OK [26, Samsung, S/N: ***
]
--Memory 4 (DIMM Socket A4) 8.0 GB/2133 MHz: ENABLED/OK [26, Samsung, S/N: ***
]
--Memory 5 (DIMM Socket B1) 8.0 GB/2133 MHz: ENABLED/OK [26, Samsung, S/N: ***
]
--Memory 6 (DIMM Socket B2) 8.0 GB/2133 MHz: ENABLED/OK [26, Samsung, S/N: ***
]
--Memory 7 (DIMM Socket B3) 8.0 GB/2133 MHz: ENABLED/OK [26, Samsung, S/N: ***
]
--Memory 8 (DIMM Socket B4) 8.0 GB/2133 MHz: ENABLED/OK [26, Samsung, S/N: ***
]
VDISK
--VDisk 1 (System): OK/ONLINE, RAID-1 (185.75 GB), BadBlock: 0 [Virtual Disk 0 on Integrated RAID Controller 1]
--VDisk 2 (Backup2Disk): OK/ONLINE, RAID-6 (7820.75 GB), BadBlock: 0 [Virtual Disk 1 on Integrated RAID Controller 1]
--VDisk 3 (Datadepot): OK/ONLINE, RAID-5 (4469.0 GB), BadBlock: 0 [Virtual Disk 2 on Integrated RAID Controller 1]
DISK
--PDisk 1 (0:1:0) 1117.25 GB: ONLINE, PowerStat: SPUNUP, HotSpare: no [SEAGATE, S/N: ***
]
--PDisk 2 (0:1:1) 1117.25 GB: ONLINE, PowerStat: SPUNUP, HotSpare: no [SEAGATE, S/N: ***
]
--PDisk 3 (0:1:2) 1117.25 GB: ONLINE, PowerStat: SPUNUP, HotSpare: no [SEAGATE, S/N: ***
]
--PDisk 4 (0:1:3) 1117.25 GB: ONLINE, PowerStat: SPUNUP, HotSpare: no [SEAGATE, S/N: ***
]
--PDisk 5 (0:1:4) 1117.25 GB: ONLINE, PowerStat: SPUNUP, HotSpare: no [SEAGATE, S/N: ***
]
--PDisk 6 (0:1:5) 1117.25 GB: ONLINE, PowerStat: SPUNUP, HotSpare: no [SEAGATE, S/N: ***
]
--PDisk 7 (0:1:6) 1117.25 GB: ONLINE, PowerStat: SPUNUP, HotSpare: no [SEAGATE, S/N: ***
]
--PDisk 8 (0:1:7) 1117.25 GB: ONLINE, PowerStat: SPUNUP, HotSpare: no [SEAGATE, S/N: ***
]
--PDisk 9 (0:1:8) 1117.25 GB: ONLINE, PowerStat: SPUNUP, HotSpare: no [SEAGATE, S/N: ***
]
--PDisk 10 (0:1:12) 1117.25 GB: ONLINE, PowerStat: SPUNUP, HotSpare: no [SEAGATE, S/N: ***
]
--PDisk 11 (0:1:13) 1117.25 GB: ONLINE, PowerStat: SPUNUP, HotSpare: no [SEAGATE, S/N: ***
]
--PDisk 12 (0:1:14) 1117.25 GB: ONLINE, PowerStat: SPUNUP, HotSpare: no [SEAGATE, S/N: ***
]
--PDisk 13 (0:1:15) 1117.25 GB: ONLINE, PowerStat: SPUNUP, HotSpare: no [SEAGATE, S/N: ***
]
--PDisk 14 (0:1:16) 1117.25 GB: ONLINE, PowerStat: SPUNUP, HotSpare: no [SEAGATE, S/N: ***
]
--PDisk 15 (0:1:23) 1117.25 GB: READY, PowerStat: SPUNUP, HotSpare: no [SEAGATE, S/N: ***
]
--PDisk 16 (0:1:24) 185.75 GB: ONLINE, PowerStat: 5(!), HotSpare: no [TOSHIBA, S/N: ***
******]
--PDisk 17 (0:1:25) 185.75 GB: ONLINE, PowerStat: 5(!), HotSpare: no [TOSHIBA, S/N: ***
********]
SENSOR
--System Board Inlet Temp: 27.0 C ENABLED/OK
--System Board Exhaust Temp: 38.0 C ENABLED/OK
--CPU1 Temp: 44.0 C ENABLED/OK
--CPU2 Temp: 47.0 C ENABLED/OK
CPU
--CPU 1 (8 cores/16 threads): ENABLED/OK [Intel(R) Xeon(R) CPU E5-2630L v3 @ 1.80GHz]
--CPU 2 (8 cores/16 threads): ENABLED/OK [Intel(R) Xeon(R) CPU E5-2630L v3 @ 1.80GHz]

Also a of my iDRAC Web GUI at the PDisk 16 (0:1:24):
idrac

For any help solving the problem thanks in advance!

Greetings from Germany
Nick

Show more information in the "Status Information" field

Dear @dangmocrang!

Thanks for your great job creating and maintaining this awsome check.
In my setup it is always showing "PS" (first line of output of script) - as you see in the screenshots.

ok
crit
crit-with-details

My question:
Is there a way to show more information in the "Status Information" field when there is an error with hardware?

My Nagios configuration - commands.cfg:

# 'check_idrac_full' command definition
define command{
        command_name       check_idrac_full
        command_line           /usr/local/nagios/libexec/check_idrac/idrac_2.2rc4 -H $HOSTADDRESS$ -f /usr/local/nagios/libexec/check_idrac/idrac_2.1.conf -m /usr/local/nagios/libexec/check_idrac/idrac-smiv2.mib
        }

My Nagios configuration - hosts.cfg:

define service{
        use                                  generic-service
        host_name                      dell
        service_description          iDRAC Full
        check_command             check_idrac_full
        notifications_enabled       0
        }

Thank you and greeeeetz from Austria,
Florian Miesenberger

performance data does not work (stack traces)

[root@host dell_monitors]# ./check_idrac -m /root/.snmp/mibs/idrac-smiv2.mib -H 74.201.248.76 -v2c -v2c -c public -w FAN#1 --fan-warn=4000,6000 --fan-crit=3000,7000 -p
Traceback (most recent call last):
File "./check_idrac", line 842, in
result, exit_code = PARSER().main()
File "./check_idrac", line 671, in main
perf_data = ' | RPM=%s;%s;%s;;' % (hw[3].split('(')[0], conf['fan_min_warn'], conf['fan_min_crit'])
KeyError: 'fan_min_warn'
[root@host dell_monitors]# vi check_idrac
[root@host dell_monitors]# ./check_idrac -m /root/.snmp/mibs/idrac-smiv2.mib -H 74.201.248.76 -v2c -v2c -c public -w FAN#1 --fan-warn=4000,6000 --fan-crit=3000,7000 -p -f conf.conf
Traceback (most recent call last):
File "./check_idrac", line 842, in
result, exit_code = PARSER().main()
File "./check_idrac", line 671, in main
perf_data = ' | RPM=%s;%s;%s;;' % (hw[3].split('(')[0], conf['fan_min_warn'], conf['fan_min_crit'])
KeyError: 'fan_min_warn'

And for all fans:

[root@host dell_monitors]# ./check_idrac -m /root/.snmp/mibs/idrac-smiv2.mib -H 74.201.248.76 -v2c -v2c -c public -w FAN --fan-warn=4000,6000 --fan-crit=3000,7000
System Board Fan1A: 4920 RPM - ENABLED/OK
System Board Fan1B: 4440 RPM - ENABLED/OK
System Board Fan2A: 4920 RPM - ENABLED/OK
System Board Fan2B: 4440 RPM - ENABLED/OK
System Board Fan3A: 4920 RPM - ENABLED/OK
System Board Fan3B: 4440 RPM - ENABLED/OK
System Board Fan4A: 4920 RPM - ENABLED/OK
System Board Fan4B: 4440 RPM - ENABLED/OK
System Board Fan5A: 4920 RPM - ENABLED/OK
System Board Fan5B: 4440 RPM - ENABLED/OK
System Board Fan6A: 4680 RPM - ENABLED/OK
System Board Fan6B: 4440 RPM - ENABLED/OK
[root@host dell_monitors]# ./check_idrac -m /root/.snmp/mibs/idrac-smiv2.mib -H 74.201.248.76 -v2c -v2c -c public -w FAN --fan-warn=4000,6000 --fan-crit=3000,7000 -p
Traceback (most recent call last):
File "./check_idrac", line 842, in
result, exit_code = PARSER().main()
File "./check_idrac", line 671, in main
perf_data = ' | RPM=%s;%s;%s;;' % (hw[3].split('(')[0], conf['fan_min_warn'], conf['fan_min_crit'])
KeyError: 'fan_min_warn'

But when run without -p

[root@host dell_monitors]# ./check_idrac -m /root/.snmp/mibs/idrac-smiv2.mib -H 74.201.248.76 -v2c -v2c -c public -w FAN#1 --fan-warn=4000,6000 --fan-crit=3000,7000
OK - System Board Fan1A: 4920 RPM - ENABLED/OK

Authentication error handling

Hello
I think if wrong credentials or missing parameters exist the check should return unknown state and a valid error message

e.g applies to:

Bad operator (INTEGER): At line 73 in /usr/share/mibs/ietf/SNMPv2-PDU
Unlinked OID in IPATM-IPMC-MIB: marsMIB ::= { mib-2 57 }
Undefined identifier: mib-2 near line 18 of /usr/share/mibs/ietf/IPATM-IPMC-MIB
Expected "::=" (RFC5644): At line 493 in /usr/share/mibs/iana/IANA-IPPM-METRICS-REGISTRY-MIB
Expected "{" (EOF): At line 651 in /usr/share/mibs/iana/IANA-IPPM-METRICS-REGISTRY-MIB
Bad object identifier: At line 651 in /usr/share/mibs/iana/IANA-IPPM-METRICS-REGISTRY-MIB
Bad parse of OBJECT-IDENTITY: At line 651 in /usr/share/mibs/iana/IANA-IPPM-METRICS-REGISTRY-MIB
snmpwalk: Authentication failure (incorrect password, community or key) (Sub-id not found: (top) -> virtualDiskTable)

Vdisk and disk not show in the results

Hello,

Thanks for the plugin, I try to check idrac v8 it was worked but I cannot see the disk and vdisk information on the command "./check_idrac_2.2 -H x.x.x.x -v 2c -c secret" and I use mib file from dell iDRAC-SMIv2.mib

How can I debug it to fix that my problem?

Thank you.

Error on some hosts

on some dell host's ive got the following error...

Plugin-Ausgabe
PS
--PS 1: OK, Volt I/O: 264 V/228.0 V, Current: 0.8 A, Watt I/O: 900.0 W/750.0 W
--PS 2: OK, Volt I/O: 264 V/228.0 V, Current: 0.2 A, Watt I/O: 900.0 W/750.0 W
FAN
--System Board Fan1: 4560 RPM - ENABLED/OK
--System Board Fan2: 4680 RPM - ENABLED/OK
--System Board Fan3: 4680 RPM - ENABLED/OK
--System Board Fan4: 4560 RPM - ENABLED/OK
--System Board Fan5: 4560 RPM - ENABLED/OK
--System Board Fan6: 4440 RPM - ENABLED/OK
BATTERY
--System Board CMOS Battery: ENABLED/OK [PRESENCEDETECTED]
--PERC1 ROMB Battery: ENABLED/OK [PRESENCEDETECTED]
--PERC2 ROMB Battery: ENABLED/OK [0]
PU
--PU 1: ENABLED/OK, RedundancyStatus: FULL, SystemBoard Pwr Consumption: 196 W
MEM
--Memory 1 (DIMM Socket A1) 32.0 GB/1333 MHz: ENABLED/OK [DDR3, Samsung, S/N: YYYYYYYYYYYY]
--Memory 2 (DIMM Socket A2) 32.0 GB/1333 MHz: ENABLED/OK [DDR3, Samsung, S/N: YYYYYYYYYYYY]
--Memory 3 (DIMM Socket A3) 32.0 GB/1333 MHz: ENABLED/OK [DDR3, Samsung, S/N: YYYYYYYYYYYY]
--Memory 4 (DIMM Socket A4) 32.0 GB/1333 MHz: ENABLED/OK [DDR3, Samsung, S/N: YYYYYYYYYYYY]
--Memory 5 (DIMM Socket A5) 32.0 GB/1333 MHz: ENABLED/OK [DDR3, Samsung, S/N: YYYYYYYYYYYY]
--Memory 6 (DIMM Socket B1) 32.0 GB/1333 MHz: ENABLED/OK [DDR3, Samsung, S/N: YYYYYYYYYYYY]
--Memory 7 (DIMM Socket B2) 32.0 GB/1333 MHz: ENABLED/OK [DDR3, Samsung, S/N: YYYYYYYYYYYY]
--Memory 8 (DIMM Socket B3) 32.0 GB/1333 MHz: ENABLED/OK [DDR3, Samsung, S/N: YYYYYYYYYYYY]
--Memory 9 (DIMM Socket B4) 32.0 GB/1333 MHz: ENABLED/OK [DDR3, Samsung, S/N: YYYYYYYYYYYY]
--Memory 10 (DIMM Socket B5) 32.0 GB/1333 MHz: ENABLED/OK [DDR3, Samsung, S/N: YYYYYYYYYYYY]
--Memory 11 (DIMM Socket A7) 32.0 GB/1333 MHz: ENABLED/OK [DDR3, Samsung, S/N: YYYYYYYYYYYY]
--Memory 12 (DIMM Socket A8) 32.0 GB/1333 MHz: ENABLED/OK [DDR3, Samsung, S/N: YYYYYYYYYYYY]
--Memory 13 (DIMM Socket B7) 32.0 GB/1333 MHz: ENABLED/OK [DDR3, Samsung, S/N: YYYYYYYYYYYY]
--Memory 14 (DIMM Socket B8) 32.0 GB/1333 MHz: ENABLED/OK [DDR3, Samsung, S/N: YYYYYYYYYYYY]
VDISK
--VDisk 1 (): OK/ONLINE, RAID-1 (136.13 GB), BadBlock: 0 [Virtual Disk 0 on Integrated RAID Controller 1]
DISK
--PDisk 1 (0:1:0) 136.13 GB: ONLINE, PowerStat: SPUNUP, HotSpare: no [TOSHIBA, S/N: YYYYYYYYYYYY]
--PDisk 2 (0:1:1) 136.13 GB: ONLINE, PowerStat: SPUNUP, HotSpare: no [TOSHIBA, S/N: YYYYYYYYYYYY]
Traceback (most recent call last):
File "/usr/lib/nagios/plugins/check_dell_idrac.py", line 885, in
result, tmp_code = PARSER(host, hw_info, hw_order, hw_no_alert, hw_mib, perf).main()
File "/usr/lib/nagios/plugins/check_dell_idrac.py", line 718, in main
else: hw_3 = round(float(hw[3])/10, 1)
ValueError: could not convert string to float: "CPU1 TEMP"

on 1 other host ive got a timeout.... (like this)

PS
--PS 1: OK, Volt I/O: 264 V/(N/A) V, Current: 0.2 A, Watt I/O: 900.0 W/750.0 W
--PS 2: OK, Volt I/O: 264 V/(N/A) V, Current: 21.0 A, Watt I/O: 900.0 W/750.0 W
FAN
--System Board Fan1: 3000 RPM - ENABLED/OK
--System Board Fan2: 2880 RPM - ENABLED/OK
--System Board Fan3: 3000 RPM - ENABLED/OK
--System Board Fan4: 2880 RPM - ENABLED/OK
--System Board Fan5: 3000 RPM - ENABLED/OK
--System Board Fan6: 3000 RPM - ENABLED/OK
BATTERY
--System Board CMOS Battery: ENABLED/OK [PRESENCEDETECTED]
PU
--PU 1: ENABLED/OK, RedundancyStatus: FULL, SystemBoard Pwr Consumption: 210 W
MEM
--Memory 1 (DIMM Socket A1) 32.0 GB/2400 MHz: ENABLED/OK [26, Samsung, S/N: YYYYYYYYYYYY]
--Memory 2 (DIMM Socket A2) 32.0 GB/2400 MHz: ENABLED/OK [26, Samsung, S/N: YYYYYYYYYYYY]
--Memory 3 (DIMM Socket A3) 32.0 GB/2400 MHz: ENABLED/OK [26, Samsung, S/N: YYYYYYYYYYYY]
--Memory 4 (DIMM Socket A4) 32.0 GB/2400 MHz: ENABLED/OK [26, Samsung, S/N: YYYYYYYYYYYY]
--Memory 5 (DIMM Socket A5) 32.0 GB/2400 MHz: ENABLED/OK [26, Samsung, S/N: YYYYYYYYYYYY]
--Memory 6 (DIMM Socket A6) 32.0 GB/2400 MHz: ENABLED/OK [26, Samsung, S/N: YYYYYYYYYYYY]
--Memory 7 (DIMM Socket A7) 32.0 GB/2400 MHz: ENABLED/OK [26, Samsung, S/N: YYYYYYYYYYYY]
--Memory 8 (DIMM Socket A8) 32.0 GB/2400 MHz: ENABLED/OK [26, Samsung, S/N: YYYYYYYYYYYY]
--Memory 9 (DIMM Socket B1) 32.0 GB/2400 MHz: ENABLED/OK [26, Samsung, S/N: YYYYYYYYYYYY]
--Memory 10 (DIMM Socket B2) 32.0 GB/2400 MHz: ENABLED/OK [26, Samsung, S/N: YYYYYYYYYYYY]
--Memory 11 (DIMM Socket B3) 32.0 GB/2400 MHz: ENABLED/OK [26, Samsung, S/N: YYYYYYYYYYYY]
--Memory 12 (DIMM Socket B4) 32.0 GB/2400 MHz: ENABLED/OK [26, Samsung, S/N: YYYYYYYYYYYY]
--Memory 13 (DIMM Socket B5) 32.0 GB/2400 MHz: ENABLED/OK [26, Samsung, S/N: YYYYYYYYYYYY]
--Memory 14 (DIMM Socket B6) 32.0 GB/2400 MHz: ENABLED/OK [26, Samsung, S/N: YYYYYYYYYYYY]
--Memory 15 (DIMM Socket B7) 32.0 GB/2400 MHz: ENABLED/OK [26, Samsung, S/N: YYYYYYYYYYYY]
--Memory 16 (DIMM Socket B8) 32.0 GB/2400 MHz: ENABLED/OK [26, Samsung, S/N: YYYYYYYYYYYY]
VDISK
--VDisk 1 (Volume0): OK/ONLINE, RAID-1 (185.75 GB), BadBlock: 0 [Virtual Disk 0 on Integrated RAID Controller 1]
DISK
--PDisk 1 (0:1:0) 185.75 GB: ONLINE, PowerStat: ON, HotSpare: no [ATA, S/N: YYYYYYYYYYYY]
--PDisk 2 (0:1:1) 185.75 GB: ONLINE, PowerStat: ON, HotSpare: no [ATA, S/N: YYYYYYYYYYYY]
SNMP Timeout!

Pdisk status

Hello,
Is it possible to set NON-RAID status as not a warning? A lot of servers have NON-RAID as normal setup

ValueError: invalid literal for float(): <numeric value>(!)

When using warning/critical limits, e.g.:
./idrac_2.2rc4 -p -H x.x.x.x -v2c -c public -m /idrac-smiv2.mib -w SENSOR#4 --temp-warn 29,40 --temp-crit 25,75
, the (!)/(!!) string is added to the numeric recorded value of the monitored value:
tmp[key][stat_t] += '(!)'

Later threshold checks fail after this is added with the following error:
Traceback (most recent call last):
File "./idrac_2.2rc4", line 841, in
result, exit_code = PARSER().main()
File "./idrac_2.2rc4", line 679, in main
hw_dict, exit_code = self.raise_alert(hw_dict, value_on_alert)
File "./idrac_2.2rc4", line 582, in raise_alert
if float(tmp[key][stat_t]) <= conf['sensor_thresholds'][0]:

As a workaround I've stripped these characters from the recorded value:

  •                        if float(tmp[key][stat_t]) <= conf['sensor_thresholds'][0]:
    
  •                        if float(tmp[key][stat_t].strip('(!)')) <= conf['sensor_thresholds'][0]:
    
    Used this for all threshold checks.

Maybe there's a better way to do it, but this did the trick for us.

Regards,
Ovidiu

Surround temperature perfdata labels with '

Line 697 in the script should be replaced with
perf_data = ' | \'%s\'=%s;;;%s;%s' \

so that the temperature labels don't have spaces in them and are read correctly by the monitoring system

SNMP timeout !

Hi there,

i'm trying to monitor my Dell server hardware through the iDrac but i have the error: SNMP timeout!
i have followed the instructions, i have copied the MIB and check_idrac files int the correct folder, the permission should be ok as well.
i have created my command into Nagios which is: "$USER1$/check_idrac -H $HOSTADDRESS$ -v2c -c public"
i'm able to ping the host from Nagios with command check_icmp.
any idea please ?

many thanks in advance,
djano

ValueError: invalid literal for int() with base 10: 'Bad'

Hi

When running the latest version of this script 2,2rc4 with a simple "./check_idrac -H x.x.x.x -v2c -c public" the output looks good until the end where it displays the following error at the command line:
PS
--PS 1: OK, Volt I/O: 264 V/(N/A) V, Current: 0.1 A, Watt I/O: 1260.0 W/1100 W
--PS 2: OK, Volt I/O: 264 V/(N/A) V, Current: 0.0 A, Watt I/O: 1260.0 W/1100 W
DISK
--PDisk 1 (0:1:0) 1117.25 GB: ONLINE, PowerStat: SPUNUP, HotSpare: no [HGST, HDD, S/N: 0EGV1HGF]
--PDisk 2 (0:1:1) 1117.25 GB: ONLINE, PowerStat: SPUNUP, HotSpare: no [HGST, HDD, S/N: 0EGV79GF]
--PDisk 3 (0:1:2) 1117.25 GB: ONLINE, PowerStat: SPUNUP, HotSpare: no [HGST, HDD, S/N: 0EGV6BXF]
--PDisk 4 (0:1:3) 1117.25 GB: ONLINE, PowerStat: SPUNUP, HotSpare: no [HGST, HDD, S/N: 0EGV4X7F]
FAN
--System Board Fan1: 1440 RPM - ENABLED/OK
BATTERY
--System Board CMOS Battery: ENABLED/OK [PRESENCEDETECTED]
--PERC ROMB Battery: ENABLED/OK [PRESENCEDETECTED]
PU
--PU 1: ENABLED/OK, RedundancyStatus: FULL, SystemBoard Pwr Consumption: 84 W
MEM
--Memory 1 (DIMM Socket A1) 8.0 GB/2400 MHz: ENABLED/OK [26, Micron Technology, S/N: 12E5CCF2]
--Memory 2 (DIMM Socket A2) 8.0 GB/2400 MHz: ENABLED/OK [26, Micron Technology, S/N: 12E5CEB0]
--Memory 3 (DIMM Socket A3) 8.0 GB/2400 MHz: ENABLED/OK [26, Micron Technology, S/N: 12E5CE55]
--Memory 4 (DIMM Socket A4) 8.0 GB/2400 MHz: ENABLED/OK [26, Micron Technology, S/N: 12E5CCBA]
VDISK
--VDisk 1 (DATA): OK/ONLINE, RAID-10 (2234.5 GB), BadBlock: 0 [Virtual Disk 0 on RAID Controller in Slot 3]
Traceback (most recent call last):
File "./check_idrac", line 847, in
result, tmp_code = PARSER().main()
File "./check_idrac", line 643, in main
hw_dict = self.classifier(snmp_data, hw_dict) # classify data
File "./check_idrac", line 412, in classifier
item_order = int(_.split()[0].split('.')[-1])
ValueError: invalid literal for int() with base 10: 'Bad'

I believe as a result when this is implemented within Nagios I get a Status of WARNING. Individually when all items are checked rather than all at once - the output is OK.

cant use the cli python

hello i cant use the cli python

run python idrac_2.2rc4 -H xxx -c public -v 2c

then it say no community name spesified and run snmpwalk

USAGE: snmpwalk [OPTIONS] AGENT [OID]

MEM checks

Hello,

Many thanks for this script. It saved us to monitoring ESXis with different version threw Nagios.
Everything is working fine, but i noticed a small bug.
All files are the default files from git, also i have the latest version.

In one server , there is a warning for Correctable memory error . The check finds that the RAM Dimm is in NONCRITICAL state, but it reports as OK.
In other servers, when a component is in NONCRITICAL, for example, ROM Battery, then the Nagios check is in Warning state.

[root@lhvmsrv120 check_idrac-master]# /usr/lib64/nagios/plugins/check_idrac-master/idrac_2.2rc4 -H 10.168.2.64 -v 2c -c public -m /usr/lib64/nagios/plugins/check_idrac-master/idrac-smiv2.mib -w MEM#6
OK - Memory 6 (DIMM Socket A6) 32.0 GB/2133 MHz: ENABLED/NONCRITICAL [26, Hynix Semiconductor, S/N: 10CAA11A]

Can you please check ?

Thank you,
John

ValueError: could not convert string to float: "System Board Inlet Temp"

Hi,

I've an interesting issue with check_idrac on one of our Dell servers (out of probably 100). When I'm running check_idrac, it successfully executes until "System Board Inlet Temp" and then it raises the following exception:

--PDisk 12 (0:1:12) 278.88 GB: ONLINE, PowerStat: SPUNUP, HotSpare: no [HGST, HDD, S/N: XXXXXXXX]
--PDisk 13 (0:1:13) 278.88 GB: ONLINE, PowerStat: SPUNUP, HotSpare: no [HGST, HDD, S/N: YYYYYYYY]
Traceback (most recent call last):
  File "/etc/icinga2/scripts/idrac", line 842, in <module>
    result, tmp_code = PARSER().main()
  File "/etc/icinga2/scripts/idrac", line 688, in main
    hw_dict, exit_code = self.raise_alert(hw_dict, value_on_alert)
  File "/etc/icinga2/scripts/idrac", line 576, in raise_alert
    tmp[key][stat_t] = float(tmp[key][stat_t].strip('(!)'))/10
ValueError: could not convert string to float: "System Board Inlet Temp"

On other servers the output looks like:

--PDisk 14 (0:1:13) 278.88 GB: ONLINE, PowerStat: SPUNUP, HotSpare: no [HGST, HDD, S/N: YYYYYYYY]
--System Board Inlet Temp: 21.0 C ENABLED/OK | Temperature=21.0;;;;

I did some debugging and it looks like that tmp[key][stat_t] holds different values on a working and non-working server:

  • Working server: 210
  • Non-working server: "System Board Inlet Temp"

Thank you,
Matthias

Timeouts should generate UNKNOWN, not WARN

Compared generation 1, I get a lot of SNMP timeout situations when I ask the script to check all hardware (i.e. when I do not provide a -w option). I'm not sure why that is, but I've returned to generation 1 for the time being.

By the way, when the timeouts occur, v. 2.0b6 yields a WARNING (code 1). I believe this is not how it's supposed to be: I believe that it should yield an UNKNOWN (code 3) state. So I suggest adjusting line 439 so that
sys.exit(1)
is changed to
sys.exit(3)

iDRAC8 / Poweredge R930 / Threshold Erros

Hi!
Great Plugin! Just one little problem: i am unable to define thresholds to receive perfdata on a Poweredge R930. Its not working at all. Here is an example from Temp-Sensor:

screen shot 2015-11-05 at 14 01 25

Any advice? Do you need further information?

Kind regards - P

No perf data?

Hi -

I cant seem to figure out why this does not output any type of perfdata which nagios can consume. I read that the --no-alert disables performance data but even the bare data output does not contain any performance data in the way nagios expects it. Nagios expects it as pipe delimeted (meaning the rest of data after | is the perfdata) and then tag valued pair (data_point=value_point) separated by white space to indicate what should be plotted

Did I miss something?

Thanks for your help! Awesome script overall!

Python error when launching

Hi,

I would like to use your script to monitor my Dell iDRAC, but when I launch it against any of my host, I get the following errror :

Traceback (most recent call last):
  File "./idrac_2.2rc4", line 848, in <module>
    result, tmp_code = PARSER().main()
  File "./idrac_2.2rc4", line 765, in main
    else: hw_4 = float(hw[4])/10
ValueError: could not convert string to float: (n/a)

Did you got an idea to address this please ?

Guillaume T.

Parser problem for PS

Hi,

When I remove power on a PS to test a critical problem, i get that a hardware not found. (PS#2)

/usr/lib/nagios/plugins/check_idrac.py -H 10.x.x.x -v2c -c public -w PS#1
OK - PS 1: OK, Volt I/O: 264 V/208 V, Current: 0.4 A, Watt I/O: 900.0 W/750 W

/usr/lib/nagios/plugins/check_idrac.py -H 10.x.x.x -v2c -c public -w PS#2
hardware not found! If you sure the hw exists then you may want to edit TRANSLATOR code (line 612).

/usr/lib/nagios/plugins/check_idrac.py -H 10.x.x.x -v2c -c public -w PS
PS 1: OK, Volt I/O: 264 V/208 V, Current: 0.4 A, Watt I/O: 900.0 W/750 W
PS 2: CRITICAL(!!), Volt I/O: 264 V/(N/A) V, Current: (N/A) A, Watt I/O: 900.0 W/750 W

I was able to correct this, but i think the solution is not permanent one.
I added thoses lines at line 388:

            if self.hardware[2] == 'PS' :
                output = output.replace('No Such Instance currently exists at this OID','0')
            else:

That way I get a CRIT on PS#2.
/usr/lib/nagios/plugins/check_idrac.py -H 10.x.x.x -v2c -c public -w PS#2
CRIT - PS 2: CRITICAL(!!), Volt I/O: 264 V/(N/A) V, Current: 0.0 A, Watt I/O: 900.0 W/750 W

Regards,
Mike

Return warning for NONCRITICAL error

Hi,

Is it possible to add WARNING when a NONCRITICAL error occur?

Via iDRAC I found "Correctable memory error rate exceeded for DIMM_B12".
This is reported via this script as:
Memory 24 (DIMM Socket B12) 32.0 GB/1333 MHz: ENABLED/NONCRITICAL [DDR3, Hynix Semiconductor, S/N: 318E5BCD]

But the exit code is still 0 and a status OK is returned in Nagios, I would like to see this as a WARNING state.

Thanks for a great plugin

Only display hardware status if warning/critical?

Right now, when I use the script to check one of my servers, it prints all the hardware status, even if it's OK. This is a lot of data to sort through.

Is there a way to only output hardware status if the hardware has a problem?

How to use it

I am kind of confused:

where are the python files you are talking about in the howto?

This check seems to be exactly what I am looking for. Please help.

Regards

No Fan Data?

Hi,

tested your script today and saw, that there is always no data rot the fans. is this correct?

Regards,

Marcus

Line 773 - ValueError: could not convert sring to float: (n/a)

The following occurs when I run check_idrac against a server with iDRAC 8 that currently has last redundancy on 1 PSU:

Traceback (most recent call last):
  File "./check_idrac", line 874, in <module>
      result, tmp_code = PARSER().main()
  File "./check_idrac", line 773, in main
       else: hw_4 = float(hw[4]/10

I (temporarily) fixed this by changing line 773 as follows:
else: hw_4 = hw[4].split('(')[0]

I'm no advanced programmer, and I have no idea if that was the proper thing to do, but it seemed reasonable as I was looking at some of the surrounding code.

It should be noted that there is no issue with the original code when it runs against a healthy iDRAC 8 machine.

config_check all hardware

Check of whole hardware with config file does not work. It shows "SNMP version None not supported"-error message. Check of single hardware with config file works fine. Is there missing config_check in check_all part?

Checking all hardware fails

./idrac_2.0b5 -H SOME.HOST.NAME -c public
yields:
coercing to Unicode: need string or buffer, NoneType found

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.