Invoking Cfengine from Nagios

Nagios and Cf3 each have their strengths:

Nagios plugins, frankly, are hard to duplicate in Cfengine. Check out this Cf3 implementation of a web server check:

bundle agent check_tcp_response {
  vars:
    "read_web_srv_response" string  => readtcp("php.net", "80", "GET /manual/en/index.php HTTP/1.1$(const.r)$(const.n)Host: php.net$(const.r)$(const.n)$(const.r)$(const.n)", 60);

  classes:
    "expectedResponse" expression   => regcmp(".*200 OK.*\n.*", "$(read_web_srv_response)");

  reports:
    !expectedResponse::
      "Something is wrong with php.net - see for yourself: $(read_web_srv_response)";

}

That simply does not compare with this Nagios stanza:

define service{
    use                             local-service         ; Name of service template to use
    hostgroup_name                  http-servers
    service_description             HTTP
    check_command                   check_http
}
define command{
    command_name                    check_http
    command_line                    $USER1$/check_http -I $HOSTADDRESS$ $ARG1$
}

My idea, which I totally stole from this article, was to invoke Cfengine from Nagios when necessary, and let Cf3 restart the service. Example: I've got this one service that monitors a disk array for faults. It's flaky, and needs to be restarted when it stops responding. I've already got a check for the service in Nagios, so I added an event handler:

define service{
    use                             local-service         ; Name of service template to use
    host_name                       diskarray-mon
    service_description             diskarray-mon website
    check_command                   check_http!-H diskmon.example.com -S -u /login.html
    event_handler                   invoke_cfrunagent
}
define command{
    command_name invoke_cfrunagent
    command_line $USER2/invoke_cfrunagent.sh  -n "$SERVICEDESC" -s $SERVICESTATE$ -t $SERVICESTATETYPE$ -a $HOSTADDRESS$
}


Leaving out some getopt() stuff, invoke_cfrunagent.sh looks like this:

# Convert "diskarray-mon website to disarray-mon_website":
SVC=${SVC/ /_}
STATE="nagios_$STATE"
TYPE="nagios_$TYPE"

# Debugging
echo "About to run sudo /var/cfengine/bin/cf-runagent -D $SVC -D $STATE -D $TYPE" | /usr/bin/logger
# We allow this in sudoers:
sudo /var/cfengine/bin/cf-runagent -D $SVC -D $STATE -D $TYPE


cf-runagent is a request, not an order, to the running cf-server process to fulfill already-configured processes; it's like saying "If you don't mind, could you please run now?"

Finally, this was to be detected in Cf3 like so:

  methods:
    diskarray-mon_website.nagios_CRITICAL.nagios_HARD::
      "Restart the diskarray monitoring service" usebundle => restart_diskarray_monitor();


(This stanza is in a bundle that I know is called on the disk array monitor.)

Here's what works:

What doesn't work:

What might work better is using this Cf3 wrapper for Nagios plugins (which I think is the same approach, or possibly code, discussed in this mailing list post).

Anyhow...This is a sort of half-assed attempt in a morning to get something working. Not there yet.