Watching Remote System Status with Nagios and NRPE

I know I’m late to the game with this part of my setup, but nonetheless, I’m happy with the results. The short form of it is that Debian’s nagios-nrpe-server package lets my central Nagios server keep track of my clients’ disk space, load averages, etc. Granted, I already had most of that visible through Ganglia, too, but Ganglia’s more of a grapher and collector, and not a notifier. I used to keep track of that stuff with Spong, but when I switched over to Nagios, that functionality went missing. Now, if one of the graduate students (let’s call him “new guy”) fills up 288 GB of /tmp and tells his advisor “the program crashed for some reason”, I won’t be finding the bloody remains days later when another student asks me why their program won’t run at all.

nrpe-output.png

Puppet and other configuration excerpts follow.

classes/nagios-nrpe-server.pp:

class nagios-nrpe-server {
    # nagios-nrpe-server for remote monitoring
    package {
        [ "nagios-nrpe-server", "nagios-plugins" ]:
            ensure => installed;
    }
    file {
        "/etc/nagios":
            ensure => directory,
            owner  => root,
            group  => root,
            mode   => 0755;
        "/etc/nagios/nrpe_local.cfg":
            source => "puppet:///files/apps/nagios-nrpe-server/nrpe_local.cfg";
    }
    service {
        "nagios-nrpe-server":
            ensure    => running,
            pattern   => "/usr/sbin/nrpe",
            subscribe => File["/etc/nagios/nrpe_local.cfg"],
            require   => Package["nagios-nrpe-server"];
    }
}

files/apps/nagios-nrpe-server/nrpe_local.cfg:

allowed_hosts=127.0.0.1,NAGIOSIP
command[check_disk_root]=/usr/lib/nagios/plugins/check_disk -w 20 -c 10 -p /
command[check_disk_boot]=/usr/lib/nagios/plugins/check_disk -w 20 -c 10 -p /boot
command[check_disk_tmp]=/usr/lib/nagios/plugins/check_disk -w 20 -c 10 -p /tmp
command[check_disk_var]=/usr/lib/nagios/plugins/check_disk -w 20 -c 10 -p /var
command[check_disk_amanda]=/usr/lib/nagios/plugins/check_disk -w 20 -c 10 -p /opt/amanda
command[check_disk_home]=/usr/lib/nagios/plugins/check_disk -w 20 -c 10 -p /home

excerpt from /etc/nagios2/conf.d/hosts.cfg:

define service{
        use                     generic-service
        host_name               ch226-21, ch226-22, ch226-23, ch226-24, ch226-25, ch226-26, ch226-27, ch226-28, ch226-29, ch226-30, ch226-31, ch226-32
        service_description     Disk Usage - /
        check_command           check_nrpe_1arg!check_disk_root
        contact_groups  admins
}
define service{
        use                     generic-service
        host_name               ch226-21, ch226-22, ch226-23, ch226-24, ch226-25, ch226-26, ch226-27, ch226-28, ch226-29, ch226-30, ch226-31, ch226-32
        service_description     Disk Usage - /tmp
        check_command           check_nrpe_1arg!check_disk_tmp
        contact_groups  admins
}