(Mostly) Done, thank the gods

Saturday I upgraded the big machine at work to Solaris 10 11/06. This did not go well.

First off, I ended up installing onto a disk that held home directories. The install was a manual one, and I'd carefully noted in advance the disk I'd be installing to: the second internal hard drive, the one I'd tried doing the luactivate on a couple weeks ago.

Only the disk targets/names/whatever changed, and so c1t0d1 (say) was now one of the home partitions mounted from the external StorEdge array. Fuck. There were backups: I'd taken a backup before starting the install. Unfortunately, they were taken 3 hours before the install started, and during that time the machine had been up and running. The install started at 8am, so I'm hopeful there wasn't too much lost between 5am and 8am. But don't think I'm trying to minimize that mistake.

Second, I'd also managed to bork the disklabel for the original Solaris 9 install. I dug up the original disklabel somewhere — it wasn't in the documentation we've got, and I should have put it in there a long time ago — and restored everything to the way it was. It hadn't been formatted, so everything was okay.

Third, when it came up only one of the three external drives from the StorEdge was present, and I could not figure out where the others had gone. (It took me a while to figure this out; when first I realized my first mistake, I thought I'd installed over all the home directories. That was an awful moment.)

It took a lot of Googling to figure out what I should have already known about Solaris in general, and what should have been documented about this machine in particular: that /kernel/drv/sd.conf had been modified to add additional entries for LUNs that otherwise Solaris wouldn't have looked for.

(Many thanks to Brandon Hutchinson, whose entry on this very subject saved my butt. I wrote him a grateful email, and I wish him the best.)

(Incidentally, a reconfiguration reboot on a VS480 takes between 10 and 25 minutes. It's not a fast process. Also not a fast process is installing Solaris patches; I spent at least two hours on this all told, not counting reconfiguration reboots.)

I restored the one home directory (having recreated it in ZFS…one bright spot in all that) and mounted the others. All this got me, at 6pm, where I should have been at noon.

I was there 'til 11:30pm on Saturday fixing things up to the point where it was more or less ready for SSH-based logins. Then I took a cab home. Then I came in yesterday at 10am and got almost everything else working: SunRays (oh, the new desktop is beautiful), printing, software, and I can't even remember what all at this point.

I took lots of notes and did everything from within screen with logging turned on. (Bonus points for next time: set the prompt to show the time, so I can tell what order I did things in.) I'll be going over all of it to do things better next time.

Here's some stuff I already know:

(Incidentally, on that front I owe Blastwave an apology: right on the goddamn HOWTO page there's a section on automation. My mistake. But I still don't like the fact that the remove option (-r) is undocumented, and presumably undocumented because of the warning it prints that it's not very smart and shouldn't be used.)

Sometimes it really amazes me that I get paid to do this work because it's so much fun. And sometimes I'm amazed because I figure I shouldn't be allowed to touch computers with a ten-foot pole.

I'm feeling pretty damned humble this morning. With luck that feeling will stay.