The Life of a Sysadmin

Carousel is a lie!

Entries from August 2007.

Interesting website generation
Thu Aug 2 15:29:49 EDT 2007

Just came across norman.walsh.name while looking for information on Mercurial, and I'm intrigued. I'll have to take a look at the Makefile and maybe steal some ideas...by the beard of Saint Tim, this site could use a rewrite.

Tags: meta.
Solaris Live Upgrade
Sat Aug 4 14:30:23 EDT 2007

I'm running Solaris Live Upgrade at work to upgrade our main server from Solaris 9 to Solaris 10, and one thing I haven't seen mentioned in all the things I've read about it is how long it takes.

Right now, for example, I'm running luactivate to activate the new boot environment. It's been running for half an hour now, with no indication about how long it's going to take. If I'd known it would take this long, I'd have scheduled it for earlier this morning. And yeah, it would've been obvious if I'd thought about it…

Shet my mouth, it just finished after 38 minutes. For the record, this is on a V480/ 16GB of RAM, and call it 50GB total of disks to be synced.

Tags: solaris.
Well, \*that\* happened
Sun Aug 5 15:34:08 EDT 2007

The upgrade to Solaris 10 did not work. The main problem was that logging in at the console (even as root!) simply would not work: I'd get logged right back out again each time, with no error message or anything. WTF?

I managed to go into single-user mode, provide the root password (see? they do trust me) and get access that way. But I still couldn't figure out what was going wrong. Eventually I came across this entry in the logs

svc.startd[7]: [ID 694882 daemon.notice] instance svc:/system/console-login:default exited with status 16

And /var/svc/log/system-console-login:default.log said:

[ Aug  4 14:23:48 Executing start method ("/lib/svc/method/console-login") ]
[ Aug  4 14:24:05 Stopping because all processes in service exited. ]

Eventually I had to give up and revert back to Solaris 9. That part worked well, at least.

I've no idea what went wrong at this point, but since I haven't come across this before with other Solaris 10 installs I'm starting to wonder if it's a product of luupgrade attemting to merge the machine's current settings with Sol10. Between that suspicion and the increase in disk space needed to run luupgrade (not sure why, but for example /usr needed a couple extra GB of space in order to complete luupgrade; I presume something's being added or kept around, but there's no explanation I can find for this), I'm starting to think that just going with a clean install of Sol10 is the way to go.

Arghh. Live Upgrade was supposed to just work.

Tags: solaris, upgrade, warstory.
New Gibson!
Mon Aug 6 10:10:23 EDT 2007

I had no idea. And he's speaking about it here in Vancouver. 12 years here and I still haven't run into him, unlike folks I know. Here's hoping I win tickets.

Tags: books.
Time to fire up the IPv6 tunnel again
Mon Aug 6 14:57:19 EDT 2007

I've been fiddling with IPv6 for years, but have never actually done anything serious with it. When I started work at Dowco, and my web server was a 200MHz Pentium I inherited from friends of mine, my plan was to get a tunnel from Hurricane Electric, then run a tunnel broker service of my own for customers. (There was a burning thirst for IPv6 subnets, let me tell you…) It foundered when it got to the point of coming up with a website that'd let you register; cookies and sessions and I don't know what all just bored me to tears.

At my next/last job, IPv6 was used in-house. The sysadmin before me had set up 6to4 because he wanted to connect to his machine at work without NAT. I kept it going long past the time he left, and as far as I know it's still there. But beyond presenting many more ways for DNS problems to screw things up, not much was ever done with it.

Last year I signed up for another account from HE, got a prefix, then lost track of it when it came time to add IPv6 rules for the firewall. Of course, there was other stuff going on too.

This year HE's registration form is borked, saying that it can't insert my MD5'd password into MySQL, so I've applied for an account with SixXS. (Sadly, it seems that despite appearances, my ISP isn't interested.) I've got a week's vacation coming up, so along with moving the server from Atlanta to home I think I'll try to get IPv6 working as well.

Next beer in Jerusalem! (Which, shet my mouth, is not even close to original.)

Tags: ipv6.
Memo to myself(2)
Thu Aug 9 19:06:23 EDT 2007

There is always time to document something. Even if it's just throwing a typescript file on a wiki. And there is always time to turn the typescript dump into real documentation once things have calmed down.

No tags
Bats and Leathermen and Hunter
Fri Aug 10 23:36:14 EDT 2007

When I got my first job in IT, a friend of mine bought me a copy of the third edition of Unix in a Nutshell. (Incidentally, why does O'Reilly's search, which in my client returns "Sorry, no matches were found containing ." (sic), suck so much?) Sure, it was help desk on a small ISP, but it was something. I read that book front to back on the bus to and from work, and filled it full of stickers from all the servers or PCs I assembled.

The sysadmin at that first job also had a cordless drill, and that made things so much easier when assembling or racking servers. I wanted one, but I didn't buy one 'cos I figured I hadn't earned it yet. When my Italian millwright father-in-law bought me one, I felt like it was a vote of confidence in a way.

Another thing the sysadmin had was a Leatherman Wave. Again, I wanted one, but I didn't think I'd earned it yet. Last week, I decided to get one; and if I was going to get one, I was going to wear the damn thing. I started wearing the sheath on my belt, and waited for a chance to use it.

Today I had that chance.

I got to work and went to the kitchen to grab a coffee. "There's a bat behind the fridge," I heard.

What?

The cleaning woman pointed. "I moved out the fridge to clean it," she said. "There was a bat behind it. I don't want to touch it."

I looked, and sure enough there was one hanging by the edge of the cupbard. It was small, like a mouse wearing an overcoat. (Goth mouse?)

And then my moment came.

There were no gloves (I was worried about rabies), but there was a towel. I draped the towel over the bat while frightened coworkers watched, and then covered it with a recycling bin.

And then I took out the Leatherman, and flipped out the knife. "I need help cutting cardboard," I said, and the receptionist came to help. She sliced up a cardboard box and gave me a square of it. I slid it between the cupboard and the towel, sandwiching the bat gently between it and the towel, with the recycling bin behind.

I carried it outside to a clump of trees (ah, the advantages of living on a beautiful campus), found a stick, coaxed it onto it and then left it up a tree.

But I couldn't have done it...

...without the Leatherman.

(This writing style brought to you by my third reading of Battlefield Earth. Our motto: Yeah, it's trash...so what?)

In other news, Hunter Matthews is giving a workshop on server room best practices at LISA '07. I met him at LISA last year, when he was another attendee of an otherwise thin tutorial on setting up server rooms/closet. He was also at the documentation BOF, and the one who said "I've got one user who considers 7-bit ASCII a luxury compared to what you can get from 5 or 6 bits." (Oh, and: "Cooperative collaboration. Yeah, its part of our vision statement.") He's a good guy and a good teacher, and if you're going to LISA you could do a lot worse than going to his workshop.

Tags: books, lisa.
That took a while...
Mon Aug 13 21:46:58 PDT 2007

The move of all the websites and mail from the server in Atlanta to home took longer than I thought. First I came across problems with the quad-hme interface in the Sparc Ultra 1 workstation I'd been using as a firewall, and I had to resurrect Francisco, an AMD Pentium clone, and install OpenBSD 4.1 on it. Then using pf and spamd to do greylisting didn't work so well, and I had to turn it off. Then some DNS/routing stuff I'd missed before…

Done, though, at long last. Time to sleep.

Tags: meta.
Holy crap, I got aggregated!
Tue Aug 14 16:24:40 PDT 2007

While obsessively prowling my referrers today, I noticed that I've been aggregated on Planet Sysadmin. I'm incredibly flattered. Looks like there's some damn fine reading there, and it looks like I have to fix my RSS feed...apologies for the lack of paragraph breaks.

Tags: meta.
The deluge opens
Tue Aug 14 20:14:08 PDT 2007

Somehow in the move of the websites and files from Linode back to Thornhill (home server on the other end of DSL; 1.5GHz Sempron and 1GB of RAM in a nice Shuttle box), I copied ~/.spamassassin to the wrong directory...and wow, did this ever make a difference to spam filtering. My mailbox was flooded with stuff coming in to an old (12 years!) address that I pretty much just use for WHOIS contacts these days.

I didn't realize what was going on at first, so I tried training it on my saved spam and ham. 90k messages later, it still didn't do it properly. I did some digging, then figured out what had happened and copied the files to the right place. Boom — the sweet, sweet sound of a nearly-empty inbox.

The user_prefs files were the same each time, so it was just the Bayes token files that were different. The only thing I can think of is that the working files were the result of training SA on its mistakes, rather than on its successes.

Of course, I should probably just get the address cancelled or changed…the last time I looked, well over 95% of the spam I've got came to that address. But still, I'm starting to think that I should be keeping the Bayes files under revision control...

Tags: spam.
IPv6, Gibson, missing links
Fri Aug 17 05:56:13 PDT 2007

I spent the better part of the day yesterday setting up IPv6 at home now that I've got my subnet from SixXS. I'm running rtadvd on my OpenBSD firewall, and was testing it with rtsold on a laptop running OpenbSD. I'm not sure what I was doing wrong, but for the longest time all the laptop would pick up was the gateway; it would not set up a global address, but stick with the link-local address only. Every time I tried to ping the dancing turtle it would try sending it with the fe80 address, which of course did not work.

In the end, after a few reboots of both machines, it did work. My notes were a little thin (hey, this is my vacation here :-), but I can't think of what changed…the laptop just started setting itself a global address, routing worked, and that was that. Weird.

Next up will be to get the website working on IPv6. Maybe a dancing daemon or something…

And hey, I won tickets to see William Gibson speak! "Hey, Mr. Gibson...you know that book you wrote called Virtual Light? …..It was really cool." Ah, fanboys. But my wife wants to go too, 'cos she loved Pattern Recognition. Should be a fun night.

And I just realized that although I've been generating an RSS2 feed, I've never linked to the RSS2 feed until now. Enjoy.

Tags: books, ipv6.
This is ridiculous
Mon Aug 20 15:32:57 PDT 2007

I've complained about Blastwave before, but this is just terrible.

Trying to install VLC on a Solaris 10 machine using Blastwave. Says that CSWcommon is out of date, so please run pkg-get -u. As this always includes thousands of prompts that look like this:

The following package is currently installed:
CSWoldapclient  openldap_client - OpenLDAP client executables (oldapclient)
               (sparc) 2.3.31,REV=2007.01.07

Do you want to remove this package? [y,n,?,q] y

## Removing installed package instance <CSWoldapclient>
## Verifying package <CSWoldapclient> dependencies in global zone
WARNING:
The <CSWoldap> package depends on the package currently
being removed.
Dependency checking failed.

Do you want to continue with the removal of this package [y,n,?,q]

...I look around for a way to automate this. And surprise, there is, and I've missed it the whole time. My bad. So: pkg-get -f upgrade it is, then.

It runs for 45 minutes and stops with an error about CSWcommon:

Current administration requires that a unique instance of the
<CSWcommon> package be created.  However, the maximum number of
instances of the package which may be supported at one time on the
same system has already been met.

Hm, sez I. That's strange, but maybe that's what it's like for package managers that suck. pkg-get -r common and pkg-get -i common, and I'm ready for the upgrade again.

Somehow in the process I managed to remove the pkg_get package, which (surprise) contains the pkg-get command. Fortunately I have a backup copy around and use that to install pkg_get. Life continues.

And it's not for another 15 minutes after that that I notice that the package manager is going in loops. It keeps going over the same packages again and again, giving the same errror about unique instances each time. A quick search turns up this link, which tells me I'm a fool for believing the help offered by pkg-get:

$ pkg-get -h
pkg-get,   by Philip Brown , phil@bolthole.com
 (Internal SCCS code revision 3.6)
Originally from http://www.bolthole.com/solaris/pkg-get.html

pkg-get is used to install free software packages
pkg-get
Need one of 'install', 'upgrade', 'available','compare'
  '-i|install'   installs a package
  '-u|upgrade'   upgrades already installed packages if possible
  '-a|available' lists the available packages in the catalog
  '-c|compare'   shows installed package versions vs available
  '-l|list'      shows installed packages by software name only

Optional modifiers:
  '-d|download'  just download the package, not install
  '-D|describe'  describe available packages, or search for one
  '-U|updatecatalog'   updates download site inventory
  '-S|sync'      Makes update mode sync to version on mirror site
  '-f'           dont ask any questions: force default pkgadd behaviour
         Normally used with an override admin file
         See /var/pkg-get/admin-fullauto

  '-s ftp://site/dir'  temporarily override site to get from

and that the correct way to do what I want is to run:

true | sudo pkg-get upgrade

I admit that I neither knew nor sought to find out what "default pkgadd behaviour" would be, so that's my fault. I admit that I was the one who borked things by removing the pkg-get command. I admit that I did not think to record all of this with script, so at the moment I'm going on scribbled notes and memory. This is not a bug report, which is what I really should be writing. These are all things I did wrong or badly.

But isn't this what apt has fixed? On its worst day, I've never had to set up yes to be the drinking bird that would let me get stuff done. And — when all was done, and I got to go back to installing VLC — I've never had it depend on gcc.

Arghh. Arghh arghh arghh.

1 comments. Tags: packagemanagement, rant, solaris.
Wha'?
Wed Aug 22 05:40:40 PDT 2007

I never expected to read that Ken MacLeod has Prince tickets to sell.

(Incidentally, if you haven't read his books already I can't recommend them enough. Start with Cosmonaut Keep and just keep on going.)

Tags: books.
(Mostly) Done, thank the gods
Mon Aug 27 05:57:24 PDT 2007

Saturday I upgraded the big machine at work to Solaris 10 11/06. This did not go well.

First off, I ended up installing onto a disk that held home directories. The install was a manual one, and I'd carefully noted in advance the disk I'd be installing to: the second internal hard drive, the one I'd tried doing the luactivate on a couple weeks ago.

Only the disk targets/names/whatever changed, and so c1t0d1 (say) was now one of the home partitions mounted from the external StorEdge array. Fuck. There were backups: I'd taken a backup before starting the install. Unfortunately, they were taken 3 hours before the install started, and during that time the machine had been up and running. The install started at 8am, so I'm hopeful there wasn't too much lost between 5am and 8am. But don't think I'm trying to minimize that mistake.

Second, I'd also managed to bork the disklabel for the original Solaris 9 install. I dug up the original disklabel somewhere — it wasn't in the documentation we've got, and I should have put it in there a long time ago — and restored everything to the way it was. It hadn't been formatted, so everything was okay.

Third, when it came up only one of the three external drives from the StorEdge was present, and I could not figure out where the others had gone. (It took me a while to figure this out; when first I realized my first mistake, I thought I'd installed over all the home directories. That was an awful moment.)

It took a lot of Googling to figure out what I should have already known about Solaris in general, and what should have been documented about this machine in particular: that /kernel/drv/sd.conf had been modified to add additional entries for LUNs that otherwise Solaris wouldn't have looked for.

(Many thanks to Brandon Hutchinson, whose entry on this very subject saved my butt. I wrote him a grateful email, and I wish him the best.)

(Incidentally, a reconfiguration reboot on a VS480 takes between 10 and 25 minutes. It's not a fast process. Also not a fast process is installing Solaris patches; I spent at least two hours on this all told, not counting reconfiguration reboots.)

I restored the one home directory (having recreated it in ZFS…one bright spot in all that) and mounted the others. All this got me, at 6pm, where I should have been at noon.

I was there 'til 11:30pm on Saturday fixing things up to the point where it was more or less ready for SSH-based logins. Then I took a cab home. Then I came in yesterday at 10am and got almost everything else working: SunRays (oh, the new desktop is beautiful), printing, software, and I can't even remember what all at this point.

I took lots of notes and did everything from within screen with logging turned on. (Bonus points for next time: set the prompt to show the time, so I can tell what order I did things in.) I'll be going over all of it to do things better next time.

Here's some stuff I already know:

(Incidentally, on that front I owe Blastwave an apology: right on the goddamn HOWTO page there's a section on automation. My mistake. But I still don't like the fact that the remove option (-r) is undocumented, and presumably undocumented because of the warning it prints that it's not very smart and shouldn't be used.)

Sometimes it really amazes me that I get paid to do this work because it's so much fun. And sometimes I'm amazed because I figure I shouldn't be allowed to touch computers with a ten-foot pole.

I'm feeling pretty damned humble this morning. With luck that feeling will stay.

Tags: solaris, upgrade.

RSS Feed