I can get really, really focussed sometimes. Every now and then that happens with Nagios.
Yesterday I had some time to kill before I went home, so I looked over my tickets in RT. (I work in a small shop, so a lot of the time the tickets in RT are a way of adding things to my to-do list.) There was one that said to watch for changes in our web site's main page; I'd added that one after MySQL'd had problems one time -- ran out of connections, I think -- and Mambo had displayed a nice "Whoops! Can someone please tell the sysadmin?" page (a nice change from the usual cryptic error when there's no database connection). Someone did, but it would've been nice to be paged about it.
At home I use WebSec to keep track of some pages that don't change very often (worse luck…), and I thought of using that. It sends you the new web page with the different bits highlighted, which is a nice touch. But I wanted something tied in with Nagios, rather than another separate and special system.
So I started looking at the Nagios plugins I had, and I was surprised
to find that
check_http has a raft of different options, including
the ability to check for regexes in the content. Sweet! I added a
couple strings that'll almost certainly be there until The Next Big
Redesign(tm), and done.
I started looking at the other plugins, and noticed
few minutes later I was checking our printers for errors...just in
time to notice a weird error that someone had emailed me about 30
seconds before. Nice!
This morning (I work from home on Saturdays in return for getting
Wednesdays off to take care of Arlo) I was checking Cacti
(which rocks even if they do call it a solution).
with no free space? Wha'? Someone had run a job that'd managed to fill
the whole damned partition.
check_disk, but that's only for mounted disks — and I
don't want the monitoring machine freezing if there's a problem with
NFS. SNMP should do this, right? Right — the net-snmp project has
the ability to throw errors if there's less than a certain amount of
free space on a disk. For some reason I'd never set that up before,
nor got Nagios to monitor for it. A few minutes later and
was looking for non-empty error messages:
$USER1$/check_snmp -H $HOSTADDRESS$ -o UCD-SNMP-MIB::dskErrorMsg.$ARG1$ -s ""
I looked ahead in
snmpd.conf and noticed the process section. Well,
hell! It's all very good to check that the web server is running, but
what if there are too many Apache processes? Or too few of MySQL? Or
no Postfix? Can't believe I never set this up before…
I've finally come up for breath. This wasn't what I planned on doing this morning, but I love it when a plan will come together next time.