3 new workstations with OpenSuSE. Can't figure out the autoinstall,
so it's checklist time, baby.
Software upgrade for a fairly important server + 3 slave nodes.
Natch, after rebooting one of the ILOMs for the servers just...went
away. Can't ping it from the network. Works fine with an
interactive ilom shell from Linux. Sometimes I really hate Dell
software.
Got a call from the reseller for a major hardware vendor who just
got taken over by a major database vendor; said db vendor has just
turned off educational discounts we'd spent THREE MONTHS
negotiating/waiting to have approved. I am unimpressed. Strongly
tempted to call up random hardware vendors and throw money at them
'til they give us stuff.
Finally got leak detection working in the server room. Stupidly
long time, it took.
Working on a "Lessons Learned" presentation for LISA that'll include
mention of the leak detection (among other things). Not sure how
it'll be received, but I figure it's their job to tell me it
sucks, not mine.
New term coming, so about six new people coming. But at least I
know about them in advance.
And this...and this...just amuse me. (Warning: Flash
eats babies and sells them to Chinese hackers.)
Taxes.
But hey! Turns out we live in a constitutional democracy after
all. There was some debate about this at 24 Sussex Drive, I
understand. Score one for the good guys.
Just got off the phone w/a Sun rep who called up to see how I was
doing, did I need any coasters, etc. I took the opportunity to put a
bug in his ear about Solaris.
If Oracle removes the entitlement to run Solaris on non-Sun
hardware, then what the hell do I have to play with? I've got a bunch
of Sun hardware, but only one machine running Solaris -- and that's
in production, holding home directories on ZFS; I'm not playing with
that.
OpenSolaris folks are asking for answers and not getting
any. And saying "Go run OpenSolaris" ignores the problem of
figuring out what's in Solaris proper, what's going to be there RSN,
and what's two or more releases out.
If Solaris disappears, then I'm not going to figure out how it's
better; that's just how:
my brain (I teach myself)
my budget (Dell hardware irritates me but it's easily half the price)
and my career (Open Source/Free Software FTW)
all work.
I like Solaris for precisely two things: ZFS and DTrace. Solaris has
more, I know, but those are the things that matter to me. In all
other respects, for me and my situation, Linux or the BSDs are good
enough or better. And oh: FreeBSD has DTrace; DragonFly BSD has
HAMMER; Linux has *@#%$)%! packaging.
No good ending for this, so we'll just call it quits.
I'm at work today. There was a scheduled power outage in the building
that holds our server room. It was set for 7am to 11am; I got in at
6am 'cos I'm a keener and wanted to make sure I had lots of time to
swap to the backup website.
Power came back on at 10:30 or so. 10:45 I wandered over to the
building to see if that was it; didn't want to rush anyone, but
thought it'd be worth asking. There was no one there. Sweet, thinks
I, let's head down and turn on some machines. (Most are Sun machines
and therefore work just fine remotely; some are older IBM machines,
where I haven't figured out how to do IPMI over the network, and some
are Dell machines where, whee! who knows how it'll work today?)
Turns out A/C's still off. Call the university folks; turns out
someone's still working on it. But they're off for lunch, so no idea
just yet how long that'll be. I'll be calling back in an hour to
see if we have any idea. If they're working on it 'cos something went
wrong, well, that's life. If they're working on it 'cos they had
scheduled something, then I'm irritated I wasn't told beforehand. Oh
well, we'll see what happens. (Update: Our rooftop A/C failed to
come on after the power came back on. A full investigation, with
twelve helicopters full of determined journalist-engineers, will be
launched tomorrow. THIS GOES ALL THE WAY TO THE TOP, PEOPLE.)
In other news, it wasn't a complete waste of time today:
I found the Python LDAP module and used it to add a bunch of
AutoFS entries to the tree. The simplebrowser.py script included in
the examples is quite nice. (Though it does make me wonder why I
haven't started using one of the billion-and-three LDAP browser
tools instead of a) complaining about phpLDAPadmin or b) trying to
remember the syntax for ldapsearch).
I got to wear my safety shoes.
Downloading Rocks to try installing it on three old servers
haning around.
Discovered that a workstation's hard drive is failing; fortunately
it's not needed right away.
We've got four new Dell R410 servers at work. Natch, I want 'em
working with serial consoles so I don't have to sit in the server
room. Three of them worked; the fourth did not, despite having
identical BIOS/Grub settings.
The symptom was quite maddening: After getting past the various BIOS
checks, the Grub menu would not appear unless you sat there and
typed something. After that, you'd get the usual Grub entries and
could boot as usual. If you did not hit a key, the machine would just
hang -- no response to keypresses at all, and you'd have to power cycle.
I spent a stupid amount of time comparing BIOS and Grub settings but
was unable to find anything different. Finally today I typed "grub
console timeout serial dell" into Google and found this bug in
Launchpad, with this comment as the last one:
Having the same hanging issue at the Grub 1.5 stage on brand new R200
Dell servers running OpenSuse 10.3. The terminal timeout is set to 10
and we get 10 press any key to continue messages and then a full
system hang requiring a hard reboot.
If we do press any key on a connected console (using Dell's Serial
Over Lan) or locally before then end of the timeout then it boots fine
so seems to be a bug in continuing at the end of the wait time.
Removing the terminal line from /boot/grub/menu.1st seems to fix the
issue on our servers. The console in this case is sent by BMC to both
the local screen and the remote console with no timeout so works a
treat. This may only work with Dell's BMC/SOL but thought I'd mention
it in case anyone else has spent a day getting frustrated with this
like we have.
This worked a treat, with the added bit of weirdness that I had two
"terminal" lines:
terminal --timeout=2 serial console
serial --unit=0 --speed=9600
default=0
timeout=5
serial --unit=1 --speed=115200
terminal --timeout=5 serial console
and now I have one:
terminal --timeout=2 serial console
serial --unit=0 --speed=9600
default=0
timeout=5
serial --unit=1 --speed=115200
# terminal --timeout=5 serial console
Yes, I know that's redundant, but again: it worked on the other three
machines.
I don't know if this is a problem with Grub, with Dell's firmware or
something else, but Gott in himmell I hate bugs like this.
Some kind of verb, some kind of moving thing
Something unseen, some hand is motioning to rise, to rise, to rise
Too fat fat, you must cut clean
You gotta take the elevator to the mezzanine
Chump change, and it's on, super bon bon
Super bon bon, super bon bon...
"Super Bon Bon", Soul Coughing
Tonight was a great deal of fun. I met up with Matt, who had
invited me out for Turkish food earlier. I found that the group also
included Tom Limoncelli and Doug Hughes, who is one of the
Invited Talks coordinator and a very fun guy to boot.
We walked maybe 20 minutes across town to Cazbar on North Charles
Street, and which I can recommend to anyone wanting good food. I had
a lovely lamb and mozarrella Pide (like a pizza but more ethnic :-),
did not like the Raki, but enjoyed the Sierra Nevada well enough.
Lovely food and fun conversation...like the guy who needed a Windows
box to run Dell monitoring software, but decided to replace Explorer
with Blackbox window manager and some kind of Apple Spotlight-like
tool for Windows. My jaw dropped. "You've come this close to
making Windows enjoyable for me."
After settling up the bill (non-trivial with 20 people, but we made
it) we walked back again. I got to talk with Tom, which was neat (see
2006 entries from LISA re: accidental stalking); always fun to indulge
in a little bit of hero worship.
Me: Oh, check it out: it's the Barnes and Noble store! Let's go
party there!
Tom: What?
Me: Yeah, I've heard all about it! Free tequila shots at the door,
cashiers dancing on top of their tills, DJs 'til 10am...
Tom: Oh, you're thinking of Borders.
I got to see the USS Constitution, which since I've been
devouring the Master and Commander books over the last year or so
I simply must visit. (Don't know when exactly...)
And so back to the bar. And so to bed. (tm Samuel Pepys.)
Ran into a little problem this week when I tried to do a restore from
a backup at work. Bacula loaded the tape, then said it couldn't read
the label. Wha?
After much investigation, during which I completely neglected to
cut-n-paste the error messages, I think I've figured out what
happened:
I upgraded the license key for our storage library;
I rebooted the library, 'cos that's what you gotta do;
but the tape was still in there, say halfway through after the last
batch of backups;
so the drive rewound the tape after being power-cycled;
and Bacula didn't know this;
so it wrote the next backups that night at the beginning of the
tape, not realizing this would be a Bad Thing(tm).
Ack. Needless to say, this was not good. Fortunately, the file in
question was not a terribly important one; unfortunately, that's about
the last 2 weeks of incrementals gone. Lesson learned: don't assume
your backup program knows what's going on when hardware reboots from
under it.
In other news: on Thursday I got 5 new Dell servers. Woot! One of
'em will be our new LDAP/web/email/FTP server (Xen ftw!); the rest are
going to be running protein search engines for various researchers
across BC. They're racked and I'm stoked, except that it turns out
the difference between the DRAC6 Express and Enterprise, besides a few
hundred dollars, is that the Enterprise does console redirection and
the Express doesn't. Dammit.
I'm going to see if there's any trickery that can be done, but I'm not
holding out hope. I have got a 32-port console server, but it's two
racks away...might have to run a small batch o' cables up and over to
make this work.
The flash demo for Dell's ML6000 tape library boasts that it's "completely self-aware". Not sure I want SkyNet running my backups…
O'Reilly has an upcoming webcast on -- deep breath -- "Advanced Twitter for Business". (At least they didn't call it a webinar. When I told my wife about this, she said "So...you and O'Reilly break up yet?"
And did I mention the dream I had a while back about a Sun laptop that looked like an X4200 server folded in half? In the dream it ran nearly perfectly, except when you tried to go to a web page with flash; then it would crash, and a movie of Matt Stone would play, apologizing on behalf of Jonathan Schwartz and everyone else at Sun.
I'm playing with the CVS version of Emacs after reading about some of the new features in what will become Emacs 23. It's nice, but the daemon mode isn't quite multi-tty — you can run Emacs server, detached from any TTY, but if you try connecting to it with multiple emacsclient instances, the first one is where all the TTY action goes. Not sure what I'm missing.
New Dell 2950 server. 2 x quad-core Xeons, 2 x 6MB cache on each die,
16GB RAM, 6 x 300GB SAS 10K SCSI drives in a RAID-6 array using the
PERC/6 controller.
/usr/src/linux-source-2.6.18# time make -j 9 bzImage
[snip]
Root device is (8, 3)
Boot sector 512 bytes.
Setup is 7295 bytes.
System is 1222 kB
Kernel: arch/i386/boot/bzImage is ready (#1)
real 0m22.668s
user 2m20.425s
sys 0m14.537s
Dude, my laptop screen just turned blue. I'd booted into OpenBSD (4.2) and was trying to figure out how to turn off the audible bell. I'd gone from X to a virtual console to see if the problem happened there (it did), then tried ctrl-alt-f5 to get back to X.
My laptop screen turned from black with white text to grey with grey text to light blue with dark blue text, over the course of a minute or so. I thought I'd suddenly borked the LCD screen, but when I rebooted to Debian it was all fine. Just tried switching to a console, then back to X (alsoin Debian), and that's fine too. Bizarre.
Just checked the logs in OpenBSD and found a series of entries like this:
Nov 1 16:47:17 laptop /bsd: agp_release_helper: mem 0 is bound
Nov 1 16:47:17 laptop /bsd: agp_release_helper: mem 1 is bound
Nov 1 16:47:17 laptop /bsd: agp_release_helper: mem 2 is bound
Nov 1 16:47:17 laptop /bsd: agp_release_helper: mem 3 is bound
Nov 1 16:47:17 laptop /bsd: agp_release_helper: mem 4 is bound
Nov 1 16:47:24 laptop /bsd: agp_release_helper: mem 5 is bound
Nov 1 16:47:24 laptop /bsd: agp_release_helper: mem 6 is bound
Nov 1 16:47:24 laptop /bsd: agp_release_helper: mem 7 is bound
Nov 1 16:47:24 laptop /bsd: agp_release_helper: mem 8 is bound
Nov 1 16:47:24 laptop /bsd: agp_release_helper: mem 9 is bound
Nov 1 16:47:31 laptop /bsd: agp_release_helper: mem 10 is bound
Nov 1 16:47:31 laptop /bsd: agp_release_helper: mem 11 is bound
Nov 1 16:47:31 laptop /bsd: agp_release_helper: mem 12 is bound
Nov 1 16:47:31 laptop /bsd: agp_release_helper: mem 13 is bound
Nov 1 16:47:31 laptop /bsd: agp_release_helper: mem 14 is bound
Nov 1 16:47:38 laptop /bsd: agp_release_helper: mem 15 is bound
Nov 1 16:47:38 laptop /bsd: agp_release_helper: mem 16 is bound
Nov 1 16:47:38 laptop /bsd: agp_release_helper: mem 17 is bound
Nov 1 16:47:38 laptop /bsd: agp_release_helper: mem 18 is bound
Nov 1 16:47:38 laptop /bsd: agp_release_helper: mem 19 is bound
Very weird. On the bus, so Googling that'll have to wait. Although I do have the code on that partition…here we go: says it's the AGPIOC_RELEASE ioctl for agp. Aha! Maybe I'll explain money laundering while I'm at it.
And btw, here's a memo for the world: if you're on the toilet, don't take a phone call. It's really not that important.
Update, October 15 2008: Still happening with OpenBSD 4.3. And for the record, this is a Dell C300 laptop.
Turns out you can get the built-in Broadcom wireless card in my
laptop (Dell C400) to work, but it did take me a bit of effort.
First off, I'd been looking at the wrong web page for the BCM43XX
project — the right one, as Prakash pointed out, is much
more up-to-date.
Second, again at Prakash's suggestion (thanks for that!), I downloaded
the drivers for the Dell 1370. Running the .exe in Wine extracted the
.sys file successfully. However, when I pointed fwcutter at them I got
this message:
Sorry, the input file is either wrong or not supported by b43-fwcutter.
This file has an unknown MD5sum 8d49f11238815a320880fee9f98b2c92.
So that .sys file was one not supported…at least, not for a while
now. That commit message was one of the few I could find that
mentioned this number. So I checked out revision 396 from the
Subversion repo, compiled it and pointed at the sys file…success!
Extraction!
Except that it still didn't work:
bcm43xx: Error: Microcode "bcm43xx_microcode5.fw" not available or load failed.
Turns out it had extracted all the files to /lib/firmware/bcm430x_*,
rather than /lib/firmware/bcm43xx_*. Quick little shell-fu:
for i in bcm430x_* ; do j=$(echo $i | sed -e's/bcm430x/bcm43xx/') ; sudo ln -s $i $j ; done
and it worked when next I inserted the module…working right now, in
fact, despite lots of error messages like:
No idea why I had to go through so much rigamarole, but hopefully
this will save time for someone else. Oh, and for the record: this is
with Debian Etch, 2.6.22 kernel from backports.org.
The laptop I bought off eBay arrived at work on Wednesday...which is my
day at home with Arlo. Thursday I was off sick with flu. Yesterday I
was back at work and slashing open the box it came in, eager to see
what I'd got.
Well, I already knew: it's a Dell C400. 12" screen, 1.2GHz P3 (but
running at 800MHz with SpeedStep and all), 256MB RAM and a 30GB
drive. Not a whole lot of memory, and a bigger hard drive would
always be nice, but I can always upgrade. There's no CD drive in this
thing, and I hadn't plumped for the docking station, so I set up PXE
booting to install Debian. It was a trifle slow, but it worked!
(Especially the second time, after I'd accidentally overwritten Debian
trying to install OpenBSD on another partition. :-)
I'm surprised at how much Just Works in this thing: X.org (no
configuration needed, just start up XDM...mann, that's nice),
suspend-to-disk, ethernet (well, it's a 3c905; what do you
expect?). Even the battery, which I'd written off in advance, appears
to hold a decent charge -- about four hours so far. The one thing
that's dicy is the onboard wireless, a Dell 1370 from everybody's
favourite company. But again, I'd written that off in advance.
Next up: I've ordered the OpenBSD 4.2 CD set, so I'll be
installing that once it arrives. And Noah has shown the way to
longer battery life; I'm getting my 2.6.22 kernel now from
Backports. (Oh, the shame of not compiling my own kernel...)
On another note, I think someone had one too many Dilbert moments:
$ dig newcastle.edu.au mx
; <<>> DiG 8.3 <<>> newcastle.edu.au mx
;; res options: init recurs defnam dnsrch
;; got answer:
;; ->> HEADER <<- opcode: QUERY, status: NOERROR, id: 2
;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 4, ADDITIONAL: 4
;; QUERY SECTION:
;; newcastle.edu.au, type = MX, class = IN
;; ANSWER SECTION:
newcastle.edu.au. 11h59m12s IN MX 10 proactive.newcastle.edu.au.
newcastle.edu.au. 11h59m12s IN MX 10 synergy.newcastle.edu.au.
Just updated my resume for the first time since starting my
current job. It's nice to look back at what you've done and
realize that, hey, there's been a lot.
In other news, I finally gave in to lust the other day and bought a
Dell C400 on eBay. Nothing too special — 1.2GHz, 256MB, 30GB hard
drive — but I was mainly after the 12" screen, so that I'd be able to
(say) debug raw ethernet frames on my daily commute. About $280 when
all was said and done; the strong Canuckistan peso was part of the
incentive to buy now. Should be at the office in a week or so, and I
can't wait.
It amazed me to see how many off-lease laptops were available, and
just how cheap you could pick them up. A whilte back my boss got
a D420; with extra memory and a few other things, it came in at about
$1700 or so Canadian. But if you look around, there are plenty of
D400s and D410s around for less than $500 — even less than $400 if you
look hard. Add another $100 (say) for a working battery, and you're in
pretty good shape.
On Tuesday, I'm giving a short presentation on my work's subnet at
SNAG, the UBC System and Network Administrator's Group. I found
Bruce in OpenBSD's ports tree on my laptop; the documentation is
(ahem) thin, but it works. Wish me luck.