The Life of a Sysadmin

Carousel is a lie!

Entries from December 2007.

New disk
Sat Dec 1 11:13:35 PST 2007

At $other_university today adding a new hard drive to our server here: 300GB, instead of 30GB. The users will be very happy. And what with the snow coming down, I'll be very happy if transit keeps running 'til I'm all done.

And now a story about how sometimes it's not all Sympa's fault…

As part of the premptive strike against the mail server's impending failure, I upgraded Sympa (shudder) on $big_server using pkg-src. After copying the list config files over, and pointing it at a separate database, I tried telling it to update its list of subscribers. When I compared a few lists to the already-existing ones, though, they were short random numbers of subscribers. One that I used as a test case, for example, was short about 100 subscribers. That was a concern to me.

The lists that were short all seemed to be ones that grabbed information from the LDAP server, so I tried looking at the queries that Sympa made. The query itself was pretty simple:

(&(objectclass=inetLocalMailRecipient)(gidNumber=10000))

with Sympa being told that mailLocalAddress was the important bit. Should be simple to compare the results from this server and the one where things work…But I lost a good hour of my life, and possibly a couple years of life expectancy, when I became convinced that, somehow, replication to that server was failing, and big chunks of information (like mailLocalAddress) were being lost. Finally, though, I figured out that I'd stupidly been querying the two servers using different credentials. Unfortunately, I figured that out on the way home Friday night.

(Obviously I'd forgotten Aeileen Frisch's rules about system administration:

  1. It's a permission problem.
  2. If it's not a permission problem, it's a DNS problem.)

Monday, though, was a fresh start and a whole new day. I started by looking again at the queries Sympa made. On $old_server, we'd get about 210 results, with mailLocalAddress in each one of 'em. But on $big_server, we'd get about 210 results, with mailLocalAddress missing in about 100 of them. No wonder Sympa was short.

I double-checked by specifically requesting mailLocalAddress on the problematic server, and it was returned. But $big_server didn't volunteer that information.

I did some digging, and it seems this may be a bug in Sun Directory Server: it should be returning mailLocalAddress as one of the attributes. However, it does not do so for all entries, even when the querying user should have permission to see them. However, I'm unable to see the PR that the thread mentions, since we're not paying for support. (Thank you, Sun.)

Digging into Sympa, though, I found out that this was not the entire reason for the failure. Sympa uses Perl's Net::LDAP module to do its queries. It turns out that Net::LDAP wants a list when you're asking for particular attributes. But in List.pm's _include_users_ldap function, the search is created like so:

     $fetch = $ldaph->search ( base => "$ldap_suffix",
                                       filter => "$ldap_filter",
                                       attrs => "$ldap_attrs",
                                       scope => "$param->{'scope'}");

Changing one line:

$ rcsdiff -r1.1 List.pm
===================================================================
RCS file: RCS/List.pm,v
retrieving revision 1.1
diff -r1.1 List.pm
8646c8646,8647
<                                     attrs => "$ldap_attrs",
---
>                                     # attrs => "$ldap_attrs",
>                                     attrs => ["$ldap_attrs"],

meant that, instead of asking for the default attributes (which $big_server was calculating incorrectly), it was asking for mailLocalAddress and succeeding.

And now you know the rest of the story.

No tags
Stay on target...
Fri Dec 21 15:23:09 PST 2007

Holy crap, it's been a while since I last wrote here. Mainly that's because I've been working on web stuff at work and have felt very little like a sysadmin of late. Thankfully we've got a webmaster hired, and to some extent the work'll be shifted to him in the new year. Of course, that still leaves the redesign of the website and its back end…that's not done 'til it's done.

This week, though, has been slow, and I've been catching up a little on sysadmin work. Part of it was setting up a devel server for the webmaster, and detailing what I was doing in Cfengine as I went along. It was gratifying to get LDAP working (I haven't done that on a Linux machine before; shame on me), and irritating when I realized that I couldn't mount the home directories from the server because I hadn't restarted nscd on the server.

The last two days were spent trying to get encrypted Bacula working between here and $other_university. This was an enormous pain in the ass for two reasons:

  1. The Right Way (tm) of doing it is by using TLS, which is what the kids are calling SSL these days, and I have never fully grokked SSL, or the openssl command. I know that there's encryption going on; I know that there are certificates signed by CAs; I know that there's a lot of negotiating of different options. But start throwing in x509 versus PEM, Diffie-Helman parameters and the single most cryptic set of error messages I've ever come across, and I just feel thick. I was reduced to looking at tcpdump output of the negotiation to figure out what was going on, and I couldn't; the Bacula FD client complained that the Bacula Director wasn't producing a certificate, and that was all I knew. The otherwise incredibly excellent docs from Bacula were a trifle thin on all of this, and I couldn't find out much about my situation (going the self-CA route).

  2. So okay, fuckit, right? That's why God invented OpenSSH. So whee, start tunnelling port 9102 over SSH so the Director can contact the FD at $other_university, and 9103 back so the FD can contact the Storage Daemon. Only it turns out (my bad for not knowing this before) that not only does the client want to contact the SD, so does the director. Thus, my plan to tunnel to the firewall at the other end and tell the client that it could find the Storage Daemon there didn't work, 'cos the director wanted to contact it there too. (I did briefly try allowing the director to contact the tunnel at the other end: so even though the Storage was working on the same machine as the director, for that one job the Director's connection to it was going to the remote end and getting tunnelled back over SSH. But:

    1. that's horrible, and
    2. I was afraid that when it came time to restore, the Director would figure that it had to contact the Storage Daemon remotely again, complicating an already complicated setup.)

And why was I trying to connect to the remote firewall via SSH, rather than the client I'm trying to back up itself? Because that client is a Solaris machine authenticating against LDAP, and that turns out to bork key-based logins over SSH. What a crock.

Oh well. I did add three other machines here to Bacula this week, so that's good.

Project U-13 is coming along. I'm pretty close to a 0.0.2 release (woot), which should have the following working:

And by "working" I mean "installed". But I've got a decent setup on my laptop for building and testing it, which means I get up to a couple hours a day to work on it (New Westminster -> UBC == long). Thanks to Andy, he of the amazing speaking skills, for kicking my ass into action.

I'm learning a bit more about Mercurial in the process. After coming from CVS and Subversion, it seems really weird to me that the usual way of branching is "Go ahead, clone another repo! We're Mercurial! We don't care! Repos for everyone!" But if you figure on distributed development — something Linux-y than a controlled work environment — then it makes sense. Not that I think I'll have lots of people working on this thing, but it makes sense that if someone were to take this for their own ends, they wouldn't want to bother copying all the branches…just the one(s) they're interested in.

Last word to my son:

Q: What does a Camel say, Arlo? A: Purhl!

2 comments. Tags: cfengine, projectu13.
Merry Xmas!
Tue Dec 25 08:06:54 PST 2007

Arlo and the Xmas tree

Tags: geekdad.

RSS Feed