Quotas are on, right?

Tomorrow I've upgrading firmware on a disk array that's attached to a small cluster I manage; yesterday, in preparation for that, I ran a full backup of the disks in question. I noticed that the home directories were taking longer than I thought, so I checked out how full they were. The answer was 97%. Oh, fuck.

The prof whose cluster this is asked for quotas to be set up for everyone; he didn't have a lot of disk space to attach, and wanted to impose some discipline on his lab. And I'd done so...only somehow, the quotas were off now, probably because I'd left it off the last time I'd had to fiddle with quotas. Because of that, one user was taking up nearly half the disk, and another was taking up almost a third. To make things worse, I had not set up my usual Nagios monitoring for this machine (disk space, say) because Ganglia was set up on it, and I'd vaguely thought that two such systems would be silly...so I was not getting my usual "OMG WTF BBQ" messages from Nagios.

It gets worse. I'd put in cron scripts that maintained the quota files, nagged users by email and CC'd me...but the permissions were 544, which meant they never ran. No email? Well, then, everything must be fine, right? Sigh.

So: