It's a love affair...mainly Nagios and my network

I can get really, really focussed sometimes. Every now and then that happens with Nagios.

Yesterday I had some time to kill before I went home, so I looked over my tickets in RT. (I work in a small shop, so a lot of the time the tickets in RT are a way of adding things to my to-do list.) There was one that said to watch for changes in our web site's main page; I'd added that one after MySQL'd had problems one time -- ran out of connections, I think -- and Mambo had displayed a nice "Whoops! Can someone please tell the sysadmin?" page (a nice change from the usual cryptic error when there's no database connection). Someone did, but it would've been nice to be paged about it.

At home I use WebSec to keep track of some pages that don't change very often (worse luck…), and I thought of using that. It sends you the new web page with the different bits highlighted, which is a nice touch. But I wanted something tied in with Nagios, rather than another separate and special system.

So I started looking at the Nagios plugins I had, and I was surprised to find that check_http has a raft of different options, including the ability to check for regexes in the content. Sweet! I added a couple strings that'll almost certainly be there until The Next Big Redesign(tm), and done.

I started looking at the other plugins, and noticed check_hpjd. A few minutes later I was checking our printers for errors...just in time to notice a weird error that someone had emailed me about 30 seconds before. Nice!

This morning (I work from home on Saturdays in return for getting Wednesdays off to take care of Arlo) I was checking Cacti (which rocks even if they do call it a solution). /home/visitors with no free space? Wha'? Someone had run a job that'd managed to fill the whole damned partition.

Well, there's check_disk, but that's only for mounted disks — and I don't want the monitoring machine freezing if there's a problem with NFS. SNMP should do this, right? Right — the net-snmp project has the ability to throw errors if there's less than a certain amount of free space on a disk. For some reason I'd never set that up before, nor got Nagios to monitor for it. A few minutes later and check_snmp was looking for non-empty error messages:

$USER1$/check_snmp -H $HOSTADDRESS$ -o UCD-SNMP-MIB::dskErrorMsg.$ARG1$ -s ""

I looked ahead in snmpd.conf and noticed the process section. Well, hell! It's all very good to check that the web server is running, but what if there are too many Apache processes? Or too few of MySQL? Or no Postfix? Can't believe I never set this up before…

I've finally come up for breath. This wasn't what I planned on doing this morning, but I love it when a plan will come together next time.