|
北京总部: 4006-505-646 |
天 津 部: 4006-505-646 |
上 海 部: 4006-505-646 |
深 圳 部: 4006-505-646 |
广 州 部: 4006-505-646 |
重 庆 部: 4006-505-646 |
南 京 部: 4006-505-646 |
其它地区: 4006-505-646 | | |
|
|
|
一个用于监控Dell PowerEdge服务器硬件状态的nagios/icinga插件
手头有几台dell服务器, 分别是PE2850和PE R710,想把硬件状态监控加入icinga中,但是网上提供的大多是依赖dell openmanager的snmp服务,用起来有些不对劲,自己对snmp所知较少,尤其是那些OID,一大串数字,不知道具体代表什么。 前几天发现openmanager自带的命令omreport可以直接执行,于是写了这个脚本,很简单,分别检查chassis(基础构件,包括主板,电源)和storage(存储) 1. 脚本 vim /usr/local/nagios/libexec/check_dell_omreport #!/bin/bash # Program : check_dell_omreport # Version : 1.0 # Date : Jul 28 2012 # Author : huky - alonerhu@yahoo.com.cn # Summary : a simple nagios/icinga plugin that checks the status of chassis & # storage on Dell PowerEdge servers with omreport in Dell Openmanager # Licence : GPL - summary below, full text at http://www.fsf.org/licenses/gpl.txt #这里指定openmanager安装路径,默认是/opt/dell/srvadmin DELL_SRV_DIR=/opt/dell/srvadmin PATH=$PATH:$DELL_SRV_DIR/oma/bin:$DELL_SRV_DIR/bin:$DELL_SRV_DIR/sbin #OMREPORT=`find $DELL_SRV_DIR -name omreport 2> /dev/null` STOR_CTRL=/tmp/dell.storage.ctr LOG_FILE=/tmp/dell_omreport.log STATE_OK=0 STATE_WARNING=1 STATE_CRITICAL=2 STATE_UNKOWN=3 if [ ! -d $DELL_SRV_DIR ]; then echo "Please install OpenManger and define the PATH after DELL_SRV_DIR" && exit $STATE_UNKOWN fi /etc/init.d/dataeng status >> /dev/null if [ ! $? -eq 0 ]; then echo "Please start the service dataeng" && exit $STATE_UNKOWN fi #check chassis omreport chassis | grep ^[^Ok] | grep ":" | sed '/COMPONENT/d' > $LOG_FILE #check storage omreport storage controller | grep "^ID" | cut -d":" -f2 > $STOR_CTRL if [ ! -s $STOR_CTRL ]; then echo "Have you installed the package for storage?" >> $LOG_FILE fi for CONTR_ID in `cat $STOR_CTRL` do omreport storage controller controller=$CONTR_ID | grep -2 ^Status | sed '/--/d' | awk '{if (NR%5==0){print $0} else {printf"%s ",$0}}' | grep -v Ok | tr -s " *" " " >> $LOG_FILE done if [ -s $LOG_FILE ]; then paste -s $LOG_FILE > $LOG_FILE.2 if [ `grep -c "Critical" $LOG_FILE` -eq `grep -c "\-Critical" $LOG_FILE` ]; then echo `cat $LOG_FILE.2` && exit $STATE_WARNING else echo `cat $LOG_FILE.2` && exit $STATE_CRITICAL fi else echo "Machine is Health" && exit $STATUS_OK fi 2. 安装 2.1 把脚本放在受控端相应位置(默认是这里:/usr/local/nagios/libexec/check_dell_omreport) 2.2 然后在受控端修改nrpe服务的配置文件 vim /usr/local/nagios/etc/nrpe.cfg 增加一行 command[check_omreport]=/usr/local/nagios/libexec/check_dell_omreport 3. 监控 主控端修改相应的监控配置,我是把这几个服务放在一个服务组里面,如下: define service { use generic-service host_name 主机名1,主机名3,主机名3,主机名4,主机名5 service_description Dell_OM check_command check_nrpe_1arg!check_omreport } define servicegroup{ servicegroup_name Hardware_Status alias 硬件状态 members 主机名1,Dell_OM,主机名2,Dell_OM,主机名3,Dell_OM,主机名4,Dell_OM,主机名5,Dell_OM } 4. 测试 # /usr/local/icinga/libexec/check_nrpe -H 192.168.10.121 -c check_omreport Controllers ID : 0 Status : Non-Critical Name : PERC H700 Integrated Slot ID : Embedded Physical Disks ID : 0:0:0 Status : Non-Critical Name : Physical Disk 0:0:0 State : Online ID : 0:0:1 Status : Non-Critical Name : Physical Disk 0:0:1 State : Online ID : 0:0:2 Status : Non-Critical Name : Physical Disk 0:0:2 State : Online 5. 启用 重启服务后,在服务组里面可以看到相关的信息 | |
|
上一篇:AIX服务器系统命令简介 |
下一篇:HP ProLiant刀片服务器简介 | |
| | |