I know I’m late to the game with this part of my setup, but nonetheless, I’m happy with the results. The short form of it is that Debian’s nagios-nrpe-server package lets my central Nagios server keep track of my clients’ disk space, load averages, etc. Granted, I already had most of that visible through Ganglia, too, but Ganglia’s more of a grapher and collector, and not a notifier. I used to keep track of that stuff with Spong, but when I switched over to Nagios, that functionality went missing. Now, if one of the graduate students (let’s call him “new guy”) fills up 288 GB of /tmp and tells his advisor “the program crashed for some reason”, I won’t be finding the bloody remains days later when another student asks me why their program won’t run at all.
Puppet and other configuration excerpts follow.
classes/nagios-nrpe-server.pp:
class nagios-nrpe-server {
# nagios-nrpe-server for remote monitoring
package {
[ "nagios-nrpe-server", "nagios-plugins" ]:
ensure => installed;
}
file {
"/etc/nagios":
ensure => directory,
owner => root,
group => root,
mode => 0755;
"/etc/nagios/nrpe_local.cfg":
source => "puppet:///files/apps/nagios-nrpe-server/nrpe_local.cfg";
}
service {
"nagios-nrpe-server":
ensure => running,
pattern => "/usr/sbin/nrpe",
subscribe => File["/etc/nagios/nrpe_local.cfg"],
require => Package["nagios-nrpe-server"];
}
}
files/apps/nagios-nrpe-server/nrpe_local.cfg:
allowed_hosts=127.0.0.1,NAGIOSIP command[check_disk_root]=/usr/lib/nagios/plugins/check_disk -w 20 -c 10 -p / command[check_disk_boot]=/usr/lib/nagios/plugins/check_disk -w 20 -c 10 -p /boot command[check_disk_tmp]=/usr/lib/nagios/plugins/check_disk -w 20 -c 10 -p /tmp command[check_disk_var]=/usr/lib/nagios/plugins/check_disk -w 20 -c 10 -p /var command[check_disk_amanda]=/usr/lib/nagios/plugins/check_disk -w 20 -c 10 -p /opt/amanda command[check_disk_home]=/usr/lib/nagios/plugins/check_disk -w 20 -c 10 -p /home
excerpt from /etc/nagios2/conf.d/hosts.cfg:
define service{
use generic-service
host_name ch226-21, ch226-22, ch226-23, ch226-24, ch226-25, ch226-26, ch226-27, ch226-28, ch226-29, ch226-30, ch226-31, ch226-32
service_description Disk Usage - /
check_command check_nrpe_1arg!check_disk_root
contact_groups admins
}
define service{
use generic-service
host_name ch226-21, ch226-22, ch226-23, ch226-24, ch226-25, ch226-26, ch226-27, ch226-28, ch226-29, ch226-30, ch226-31, ch226-32
service_description Disk Usage - /tmp
check_command check_nrpe_1arg!check_disk_tmp
contact_groups admins
}
