Niet zo goed

30 Nov 2012

So yesterday I got an email from another sysadmin: "Hey, looks like there's a lot of blocked connections to your server X. Anything happening there?" Me: "Well, I moved email from X to Y on Tuesday...but I changed the MX to point at Y. What's happening there?"

Turns out I'd missed a fucking domain: I'd left the MX pointing to the old server instead of moving it to the new one. And when I turned off the mail server on the old domain, delivery to this domain stopped. Fortunately I was able to get things going again: I changed the MX to point at the new server, and turned on the old server again to handle things until the new record propogated.

So how in hell did this happen? I can see two things I did wrong:

Poor planning: my plans and checklists included all the steps I needed to do, but did not mention the actual domains being moved. I relied on memory, which meant I remembered (and tested) two and forgot the third. I should have included the actual domains: both a note to check the settings and a test of email delivery.
No email delivery check by Nagios: Nagios checks that the email server is up, displays a banner and so on, but does not check actual email delivery for the domains I'm responsible for. There's a plugin for that, of course, and I'm going to be adding that.

I try to make a point of writing down things that go bad at $WORK, along with things that go well. This is one of those things.

Name (required)
Email (required)
Website
The site is named after Saint ________ the Carpeted: (required)
Comment

Carousel is a LIE!

Niet zo goed

Add a comment:

Related Posts

QRP weekend 08 Oct 2018

Open Source Cubesat Workshop 2018 03 Oct 2018

mpd crash? try removing files in /var/lib/mpd/ 11 Aug 2018