A nice thing about working at a university is that you get all this
time off at Xmas, which is really nice; however, it's also the best
possible time to do all the stuff you've been saving up. Last year my
time was split between this job and my last; now, the time's all mine,
baby.
Today will be my last of three days in a row where the machines have
been all mine to play with^W^Wupgrade. I've been able to twiddle the
firewall's NIC settings, upgrade CentOS using Cfengine, and set up a
new LDAP server using Cobbler and CentOS Directory Server.
I've tested our UPS' ATS, but discovered that NUT is
different from APCUPSD in one important way: it doesn't easily
allow you to say "shut down now, even though there's 95% battery
left". I may have to leave testing of that for another day.
It hasn't all gone smoothly, but I've accomplished almost all the
important things. This is a nice surprise; I'm always hesistant when
I estimate how long something will take, because I feel like I have no
way of knowing in advance (interruptions, unexpected obstacles...you
know the drill). In this case, the time estimates for individual
tasks were, in fact, 'way paranoid, but that gave me the buffer that I
needed.
One example: after upgrading CentOS, two of our three servers attached
to StorageTek 2500 disk arrays reported problems with the disks. Upon
closer inspection, they were reporting problems with half of the LUNs
that the array was presenting to them -- and they were reporting them
in different ways. It had been a year or longer since I'd set them
up, and my documentation was pretty damn slim, so it took me a while
to figure it out. (Had to sleep on it, even.)
The servers have dual paths to the arrays. In Linux, the multipath
drivers don't work so well with these, so we used the Sun drivers
instead. But:
- You have to rebuild the drivers after a kernel change.
- This only showed up on two servers because the third server had not
upgraded its kernel (or indeed, any of its packages). Why?
cfservd
had refused its connection because I had the MaxConnections
parameter too low.
- And of the two that did upgrade, the one machine we'd tested the
Linux drivers on still had an old multipath.conf file in /etc, which
even though the multipathd. service wasn't starting up was enough
to get drivers loaded. This took a while to figure out because I'd
completely forgotten how to tell which driver was in use.
I got it fixed in the end, and I expanded the documentation
considerably. (49,000 words and counting in the wiki. Damn right I'm
bragging!)
Putting off 'til next time, tempted though I am: reinstalling CentOS
on the monitoring machine, which due to a mix of EPEL and
Dag repos and operator error appears to be stuck in a
corner, unable to upgrade without ripping out (say) Cacti. I moved
the web server to a backup machine on Tuesday, and I'll be moving it
back today; this is not the time to fiddle with the thing that's
going to tell me I've moved everything back correctly.
(Incidentally, thanks to Matt for the rubber duck, who
successfully talked me down off the roof when I was mulling this
over. Man, that duck is so wise...)
Last day today. (Like, ever!) If I remember correctly I'm going to
test the water leak detector...and I forget the rest; it's all in my
daytimer and I'm too lazy to get up and look right now. Wish me luck.
And best of 2010 to all of you!