Saturday I upgraded the big machine at work to Solaris 10 11/06. This
did not go well.
First off, I ended up installing onto a disk that held home
directories. The install was a manual one, and I'd carefully noted in
advance the disk I'd be installing to: the second internal hard drive,
the one I'd tried doing the luactivate on a couple weeks ago.
Only the disk targets/names/whatever changed, and so c1t0d1 (say)
was now one of the home partitions mounted from the external StorEdge
array. Fuck. There were backups: I'd taken a backup before starting
the install. Unfortunately, they were taken 3 hours before the install
started, and during that time the machine had been up and running. The
install started at 8am, so I'm hopeful there wasn't too much lost
between 5am and 8am. But don't think I'm trying to minimize that
Second, I'd also managed to bork the disklabel for the original
Solaris 9 install. I dug up the original disklabel somewhere — it
wasn't in the documentation we've got, and I should have put it in
there a long time ago — and restored everything to the way it was. It
hadn't been formatted, so everything was okay.
Third, when it came up only one of the three external drives from the
StorEdge was present, and I could not figure out where the others had
gone. (It took me a while to figure this out; when first I realized my
first mistake, I thought I'd installed over all the home
directories. That was an awful moment.)
It took a lot of Googling to figure out what I should have already
known about Solaris in general, and what should have been documented
about this machine in particular: that
/kernel/drv/sd.conf had been
modified to add additional entries for LUNs that otherwise Solaris
wouldn't have looked for.
(Many thanks to Brandon Hutchinson, whose entry on this very
subject saved my butt. I wrote him a grateful email, and I wish
him the best.)
(Incidentally, a reconfiguration reboot on a VS480 takes between 10
and 25 minutes. It's not a fast process. Also not a fast process is
installing Solaris patches; I spent at least two hours on this all
told, not counting reconfiguration reboots.)
I restored the one home directory (having recreated it in ZFS…one
bright spot in all that) and mounted the others. All this got me, at
6pm, where I should have been at noon.
I was there 'til 11:30pm on Saturday fixing things up to the point
where it was more or less ready for SSH-based logins. Then I took a
cab home. Then I came in yesterday at 10am and got almost everything
else working: SunRays (oh, the new desktop is beautiful), printing,
software, and I can't even remember what all at this point.
I took lots of notes and did everything from within
logging turned on. (Bonus points for next time: set the prompt to show
the time, so I can tell what order I did things in.) I'll be going
over all of it to do things better next time.
Here's some stuff I already know:
Backups. It's said you never know how much you need 'em 'til you
need 'em. True 'nuff.
DOCUMENTATION. I spent a good part of yesterday getting information
on every disk while waiting for other software to install. I
should have done this long, long ago.
(Incidentally, on that front I owe Blastwave an apology: right on the
goddamn HOWTO page there's a section on automation. My
mistake. But I still don't like the fact that the remove option (
is undocumented, and presumably undocumented because of the warning it
prints that it's not very smart and shouldn't be used.)
Know what you're dealing with. The home partition I erased was
bigger than the disk I expected to install on, but I wasn't sure of
Stop if you're not sure. I should have stopped at the last point.
Be paranoid. Usually I am, but it would have been good to disconnect
every superfluous drive rather than go through all this hell.
Sometimes it really amazes me that I get paid to do this work because
it's so much fun. And sometimes I'm amazed because I figure I
shouldn't be allowed to touch computers with a ten-foot pole.
I'm feeling pretty damned humble this morning. With luck that feeling