The Life of a Sysadmin

Carousel is a lie!

Entries from November 2007.

Vishnu ate my laptop
Thu Nov 1 21:02:39 PDT 2007

Dude, my laptop screen just turned blue. I'd booted into OpenBSD (4.2) and was trying to figure out how to turn off the audible bell. I'd gone from X to a virtual console to see if the problem happened there (it did), then tried ctrl-alt-f5 to get back to X.

My laptop screen turned from black with white text to grey with grey text to light blue with dark blue text, over the course of a minute or so. I thought I'd suddenly borked the LCD screen, but when I rebooted to Debian it was all fine. Just tried switching to a console, then back to X (alsoin Debian), and that's fine too. Bizarre.

Just checked the logs in OpenBSD and found a series of entries like this:

Nov  1 16:47:17 laptop /bsd: agp_release_helper: mem 0 is bound
Nov  1 16:47:17 laptop /bsd: agp_release_helper: mem 1 is bound
Nov  1 16:47:17 laptop /bsd: agp_release_helper: mem 2 is bound
Nov  1 16:47:17 laptop /bsd: agp_release_helper: mem 3 is bound
Nov  1 16:47:17 laptop /bsd: agp_release_helper: mem 4 is bound
Nov  1 16:47:24 laptop /bsd: agp_release_helper: mem 5 is bound
Nov  1 16:47:24 laptop /bsd: agp_release_helper: mem 6 is bound
Nov  1 16:47:24 laptop /bsd: agp_release_helper: mem 7 is bound
Nov  1 16:47:24 laptop /bsd: agp_release_helper: mem 8 is bound
Nov  1 16:47:24 laptop /bsd: agp_release_helper: mem 9 is bound
Nov  1 16:47:31 laptop /bsd: agp_release_helper: mem 10 is bound
Nov  1 16:47:31 laptop /bsd: agp_release_helper: mem 11 is bound
Nov  1 16:47:31 laptop /bsd: agp_release_helper: mem 12 is bound
Nov  1 16:47:31 laptop /bsd: agp_release_helper: mem 13 is bound
Nov  1 16:47:31 laptop /bsd: agp_release_helper: mem 14 is bound
Nov  1 16:47:38 laptop /bsd: agp_release_helper: mem 15 is bound
Nov  1 16:47:38 laptop /bsd: agp_release_helper: mem 16 is bound
Nov  1 16:47:38 laptop /bsd: agp_release_helper: mem 17 is bound
Nov  1 16:47:38 laptop /bsd: agp_release_helper: mem 18 is bound
Nov  1 16:47:38 laptop /bsd: agp_release_helper: mem 19 is bound

Very weird. On the bus, so Googling that'll have to wait. Although I do have the code on that partition…here we go: says it's the AGPIOC_RELEASE ioctl for agp. Aha! Maybe I'll explain money laundering while I'm at it.

And btw, here's a memo for the world: if you're on the toilet, don't take a phone call. It's really not that important.

Update, October 15 2008: Still happening with OpenBSD 4.3. And for the record, this is a Dell C300 laptop.

Tags: bsd, dell, hardware.
Greylisting bug with Exchange
Fri Nov 2 21:15:44 PDT 2007

Earlier this week the boss forwarded some bounced emails to me and asked me to figure out what had gone wrong. The weird thing was that the email was being greylisted, so it shouldn't have bounced:

This is the Symantec Mail Security program at host
mail.globalsuite.net.

I'm sorry to have to inform you that your message could not
be delivered to one or more recipients. It's attached below.

For further assistance, please send mail to <postmaster>

If you do so, please include this problem report. You can
delete your own text from the attached returned message.

                        The Symantec Mail Security program

<example@example.com>: host smtpbackup.example.com said: 451
<example@example.com>: Recipient address rejected: Please
try sending again. (in reply to RCPT TO command)

Turns out that Symantec Mail Security is meant to sit in front of an Exchange server, and it turns out that Exchange has a bug (or had; I'm unsure if it's been fixed) where doesn't requeue email that's been greylisted, and later on bounces it back to the sender without ever having retried.

From what I can tell, globalsuite.net is run by guest-tek.com, which provides high-speed access for hotels…so I'm probably not the only one being asked to explain this bug. :-)

Tags: spam.
pkgtool error: "libtool: ar: not found"
Sat Nov 3 16:23:50 PDT 2007

One of the things about pkgsrc is that it's very sensitive to paths and which compiler you use. (And fair enough; the whole process of bootstrapping a working set of tools for eight hundred thousand different OS' is ridiculous enough that it's a wonder it works at all. But I digress.)

Case in point: Solaris 10 machine today, installing pkgsrc on it for the first time. I successfully compiled gcc34, added GCC_REQD=3.4 to mk.conf, and then went to compile kile. During compiling of Mesalibs, one of its 3.2x10^6 dependencies, I got this error during the final linking phase:

/opt/pkg/bin/libtool: ar: not found

Naturally it was there in my path, so WTF?

I eventually came across a message to the pkgsrc user's list which suggested rebuilding libtool-base. This made a certain amount of sense to me, as I'd built that package using the bootstrap (ie, not-installed-from-pkgsrc) version of gcc to compile it; it was before I figured out the GCC_REQD directive. So I ran:

$ pkg_delete libtool
$ cd /opt/pkgsrc/devel/libtool
$ bmake clean && bmake install
$ cd /opt/pkgsrc/graphics/MesaLib
$ bmake clean && bmake install

and everything was right again.

Tags: packagemanagement, solaris.
Social Message Transport Protocol
Mon Nov 5 16:12:18 PST 2007

This is hilarious.

pkgsrc is still kicking my ass. The latest is a dupe of this bug; I can't tell right now if it's more weirdness with switching GCCs too soon, or something else.

OTOH, I came across MyReview today, and holy crap does it ever look like something my work could use. I've emailed the project thanking them profusely, and suggesting a Freshmeat page (am I the only one who turns there first when looking for Free software goodness?).

Tags: packagemanagement.
Hiding behind the desk
Mon Nov 5 20:12:49 PST 2007

Every now and then it occurs to me that the great part of being a sysadmin, for me, is being able to hide behind the desk. I'm what you might call retiring (read: introverted) and for the most part I'm happy being by myself. I don't want to talk to people, most of the time; I want to stare at something and understand it, make it do neat things.

The last few weeks I haven't been doing that very much. The boss has taken an interest in the long-delayed upgrade to our website, and so that has become my priority. That means talking to people: soliciting proposals from contractors, talking with the communications person, talking to staff to figure out what's needed, what works, and what we'd like if money were no object.

I sometimes think that last part is exactly the wrong thing for me to be doing. I'm pretty comfortable with technology, I like the command line, and I don't do the work that other people do (filling out forms, dealing with money, writing theses, etc.). My needs are obvious to me but difficult to explain to someone not familiar with my job; that's no less true for an accountant, or an administrator, or a student.

It's hard for me to understand sometimes why Exchange really might be the best scheduling software for someone who doesn't have to take care of it. (The snide tone of that comment is made w/o any experience of administering an Exchange server, so please discount it.) Since I don't add records to the database all day, it can be hard for me to really be motivated to add that extra feature, rather than do the odd SQL insert every now and then. And since it's obvious to me that word processors cause chromosome damage, keeping up with the latest versions just doesn't appeal when (say) it's obvious that the firewall rules are in serious need of revision. (Actually I just took a look at them today and they're not as bad as I thought. Either my standards are slipping or my memory is.)

No great insight at the end of all this...

No tags
Power outage
Tue Nov 6 20:23:23 PST 2007

We had a power outage today at work. The good news is, the UPS' worked. The bad news is, the servers were not set to shut themselves down automatically, and the UPS' ran out literally two minutes before the power came back on. Arghh.

Having a flashlight in the server room is a good thing. So is making sure that your servers are all connected to switches powered by the UPS. So is making sure that you have a laptop with a charged battery and a ready-to-use serial cable connected to your otherwise-accessible-through-SSH console server. So is Sun making an x86-based OS that doesn't hang every time it reboots badly.

In other news: as mentioned on the Dragonfly BSD digest, ICANN blogs (!). They've taken this moment to let us know that the address of L.ROOT-SERVERS.NET has changed. Now you know.

Tags: hardware.
f(220R) = 280R
Wed Nov 21 20:56:03 PST 2007

At work, our mail server is an aging E220R. While underpowered for all it does, it has behaved well, more or less, until recently.

A couple of months ago it power cycled itself for no apparent reason. This weekend, it did the same thing. This is exactly the same behaviour I saw from another E220R at $other_university, and in that case it got progressively worse. Another sysadmin here says he's seen the same behaviour with two in his care. I'm preparing for the worst.

Part of that has meant preparing to move its functionality to another machine; this has been an excellent chance to delve into the bowels of our mail and list system. I've been steadily improving (read: creating) this for some time now, but this points out some bits I hadn't. So that's good.

Plan C is a loaner E280R from the other sysadmin (op cit.). I ran into trouble getting it working, though. First, I couldn't get a serial console working. (Getting a serial port working always seems to be a pain for me, no matter what the machine.) It has two of the old DB-25 ports; no problem, since I had a splitter and had got that working on the E220R. Except that it didn't work: no matter which port I hooked it up to, I couldn't see any output. I tried flipping the key around to diagnostic mode, but I still didn't see anything. (The manual said that you should be able to force output to ttyA by power-cycling the machine and hitting the power button twice when the amber service LED started blinking…but I never saw the blinking.)

This was especially weird to me because I had been able to get output from the RSC card using the same setup: OpenBSD laptop -> usb serial adapter -> DB-9 to RJ-45 adapter -> Cat 5 cable -> RJ-45 on RSC card. (The only difference was that, with the DB-25 port, the Cat5 cable had fit into the back of the DB-25 splitter.) But I couldn't log into the RSC card, and a quick Google turned up no easy way of resetting its password. (Putting it into the other E280 I have, which runs our database and website, was not an option.)

Out of desperation I finally hooked up the Cat5 to the DB-25 splitter on one side, and the console server on the other…and that worked. Damned if I know what was going on.

But then I had another problem: when it booted, I kept seeing line after line of I2C reset error; after a while, it would power-cycle itself and the pattern would start again. I remembered that op cit. had slotted the second CPU for me, so what the hell: I reseated it, and that did the trick.

Next up is detaching $failing_machine's second hard drive from the mirror and seeing if I can get it to boot in the 280. Let's hope.

In other news, LinuxFest Northwest is calling for papers. Were that not right around the due date of Project U-14, I might try submitting something and see what happens. Oh well...next beer in Jerusalem!

And there's the laptop battery...shoulda charged it at work.

Tags: hardware, solaris.
Scratch that
Fri Nov 23 05:55:34 PST 2007

E280R takes different SCSI drives than the E220R. Serial ports and SCSI connectors: A Study in Nemesisssysadminss. Discuss.

Tags: hardware.
The pain
Fri Nov 23 12:38:11 PST 2007
$ sudo -u sympa /opt/pkg/bin/perl /opt/pkg/sympa/bin/sympa.pl --help
Line 38, unknown field: bounce_path in sympa.conf
No web archives directory: /opt/pkg/arc

MHonArc is not installed or /usr/bin/mhonarc is not executable.
Language::SetLang(), missing locale parameter
Missing Return-Path in mail::smtpto()
Missing directory '/opt/pkg/bounce' (defined by 'bounce_path'
parameter)
Configuration file /opt/pkg/etc/sympa.conf has errors.

What this error message doesn't bother saying is that it has silently sourced wwsympa.conf as well as sympa.conf, and that the errors come from that file. And no, there is no explicit sourcing of wwsympa.conf in sympa.conf.

God, I hate this software.

Tags: rant.

RSS Feed