I know I’m late to the game with this part of my setup, but nonetheless, I’m happy with the results. The short form of it is that Debian’s nagios-nrpe-server package lets my central Nagios server keep track of my clients’ disk space, load averages, etc. Granted, I already had most of that visible through Ganglia, too, but Ganglia’s more of a grapher and collector, and not a notifier. I used to keep track of that stuff with Spong, but when I switched over to Nagios, that functionality went missing. Now, if one of the graduate students (let’s call him “new guy”) fills up 288 GB of /tmp and tells his advisor “the program crashed for some reason”, I won’t be finding the bloody remains days later when another student asks me why their program won’t run at all.
Puppet and other configuration excerpts follow.
classes/nagios-nrpe-server.pp:
class nagios-nrpe-server { # nagios-nrpe-server for remote monitoring package { [ "nagios-nrpe-server", "nagios-plugins" ]: ensure => installed; } file { "/etc/nagios": ensure => directory, owner => root, group => root, mode => 0755; "/etc/nagios/nrpe_local.cfg": source => "puppet:///files/apps/nagios-nrpe-server/nrpe_local.cfg"; } service { "nagios-nrpe-server": ensure => running, pattern => "/usr/sbin/nrpe", subscribe => File["/etc/nagios/nrpe_local.cfg"], require => Package["nagios-nrpe-server"]; } }
files/apps/nagios-nrpe-server/nrpe_local.cfg:
allowed_hosts=127.0.0.1,NAGIOSIP command[check_disk_root]=/usr/lib/nagios/plugins/check_disk -w 20 -c 10 -p / command[check_disk_boot]=/usr/lib/nagios/plugins/check_disk -w 20 -c 10 -p /boot command[check_disk_tmp]=/usr/lib/nagios/plugins/check_disk -w 20 -c 10 -p /tmp command[check_disk_var]=/usr/lib/nagios/plugins/check_disk -w 20 -c 10 -p /var command[check_disk_amanda]=/usr/lib/nagios/plugins/check_disk -w 20 -c 10 -p /opt/amanda command[check_disk_home]=/usr/lib/nagios/plugins/check_disk -w 20 -c 10 -p /home
excerpt from /etc/nagios2/conf.d/hosts.cfg:
define service{ use generic-service host_name ch226-21, ch226-22, ch226-23, ch226-24, ch226-25, ch226-26, ch226-27, ch226-28, ch226-29, ch226-30, ch226-31, ch226-32 service_description Disk Usage - / check_command check_nrpe_1arg!check_disk_root contact_groups admins } define service{ use generic-service host_name ch226-21, ch226-22, ch226-23, ch226-24, ch226-25, ch226-26, ch226-27, ch226-28, ch226-29, ch226-30, ch226-31, ch226-32 service_description Disk Usage - /tmp check_command check_nrpe_1arg!check_disk_tmp contact_groups admins }