Carousel is a lie!

Entries from September 2009.

zypper problems
2nd September 2009

I recently ran into problems with a home-made repo for OpenSuSE. (Weirdly enough, this seems to have cropped up after the repo was already in use.) When I tried to install a package from the repo, I got this error:

Retrieving repository 'foo' metadata [error]
Repository 'foo' is invalid.
File /var/tmp/TmpFile.0aLr5H doesn't contain public key data
Please check if the URIs defined for this repository are pointing to a
valid repository.
Warning: Disabling repository 'foo' because of the above error.

There wasn't much to find about this problem; even re-installing the key didn't help. Finally I thought to look in the webserver logs, where I found this:

[Wed Sep 02 09:59:59 2009] [error] [client 10.0.0.1] File
does not exist:
/var/www/repo/opensuse/11.1/x86_64/repodata/repomd.xml.key

That led to this article, and the solution was easy:

gpg -a --export "Repository Key" > /var/www/repo/opensuse/11.1/x86_64/repodata/repomd.xml.key

Sweet!

In other news: my tortilla filled with fried rice is falling apart. Film at 11.

Tags: opensuse, packagemanagement.
mmm_mysql
4th September 2009

I've spent many hours today at $WORK banging my head against the keyboard, trying to figure out why MMM-MySQL didn't work. MMM is meant to switch write roles, or master-slave roles, among different database servers for failover and such.

While the task as a whole is complex, the steps are simple enough: the monitor daemon accepts commands from a client, then forwards those commands to agents on the different MySQL servers. At its heart it's a bunch of Perl scripts that do the things this task entails: switching IP addresses, sending arp packets, toggling write-only status on the databases, and so on.

The problem came when, for example, the monitor would tell everyone to change their IP addresses and report success -- only I could see that wasn't working. Or the agent would run the command to turn the database write-only and report success, yet I could see that it wasn't working.

There were two factors at work here.

In the latter example, the agent would run the command bin/mysql_allow_write. Here's the relevant bit of code, edited for clarity:

# Read config file and status
our $config = ReadConfig("mmm_agent.conf");

print MySqlAllowWrite();

exit(0);

sub MySqlAllowWrite($) {

    [snip]

    # connect to server
    my $dsn = "DBI:mysql:host=$host;port=$port";
    my $dbh = DBI->connect($dsn, $user, $pass, { PrintError => 0 });
    return "ERROR: Can't connect to MySQL (host = $host:$port, user = $user)!" unless ($dbh);

    # set read_only to OFF
    (my $read_only) = $dbh->selectrow_array(q{select @@read_only});
    return "ERROR: SQL Query Error: " . $dbh->errstr unless (defined $read_only);
    return "OK" unless ($read_only);

    my $sth = $dbh->prepare("set global read_only=0");
    my $res = $sth->execute;
    return "ERROR: SQL Query Error: " . $dbh->errstr unless($res);
    $sth->finish;

    $dbh->disconnect();
    $dbh = undef;

    return "OK";
}

The subroutine is reporting errors but nothing watches for them. The code that calls the script itself just uses backticks and does no checking:

sub ExecuteBin {
    my $command = shift;
    my $params = shift;
    my $return_all = shift;

    my $path = "$config->{bin_path}/$command";

    return undef unless (-x $path);
    LogDebug("Core: Execute_bin('$path $params')");
    my $res = `$path $params`;

    unless ($return_all) {
        my @lines = split /\n/, $res;
        return pop(@lines);
    }

    return $res
}

The code to change IP address is much the same:

sub AddInterfaceIP($$) {
    my $if = shift;
    my $ip = shift;

    if ($^O eq 'linux') {
        `/sbin/ip addr add $ip/32 dev $if`;
    } elsif ($^O eq 'solaris') {
        `/usr/sbin/ifconfig $if addif $ip`;
        my $logical_if = FindSolarisIF($ip);
        unless ($logical_if) {
            print "ERROR: Can't find logical interface with IP = $ip\n";
            exit(1);
        }
        `/usr/sbin/ifconfig $logical_if up`;
    } else {
        print "ERROR: Unsupported platform!\n";
        exit(1);
    }
}

Needless to say I'll be filing bug reports.

The other factor that was going on was my ignorance about the tools I was using. I couldn't figure out why the ip addr add and ip addr del commands weren't working. The agent would report success adding addresses, yet ifconfig didn't show them. What I didn't realize was that ip can manipulate addresses that ifconfig doesn't seem to see. With ifconfig, you add an additional address to an interface like so:

ifconfig eth0:0 10.0.0.2

and you see a new device called eth0:0. But with ip, you do that like so:

ip add 10.0.0.2/32 dev eth0

and you don't see additional devices and ifconfig doesn't see the additional address. I wasn't thinking hard enough about what I meant by "I can see that it doesn't work" -- something I'm all to prone to take other people to task for (or at least act smugly about).

Ah well...the good news is that I learned something. The other good news is that, since at least a couple of these errors are in the latest versions of mmm_control, I should be able to spend some time at work improving them. Hasta la source, baby! (Or something like that...)

1 comments. Tags: bugs, mysql.
Wordpress worm
7th September 2009

Just spent the better part of five hours cleaning up four old, out-of-date Wordpress installations after they got infected with this worm. I host nine sites on my home server for friends and family; I'm cutting that down to three (just family), and maybe looking at mu-wordpress, as of Real Soon Now.

Happy Labour Day, everyone!

Update: I meant to add in here a few things I looked for, because this info was hard to track down.

Tags: bug, crackers, spam.
Start of school
8th September 2009

It's the start of school here at $UNIVERSITY, and for some reason I find myself noticing it more than last year. Then and now, my job has been one that is not flooded in September with new students (unlike a lot of my friends and coworkers), but rather it's more like a steady trickle. Grad students show up days or even weeks late; new faculty come in when they're good and ready; no one really has a firm idea when someone's showing up, but everyone's confident they'll be here Real Soon Now.

As a result, the biggest effect this usually has on me is the press of humanity in the bus and SkyTrain. My commute is a long one -- bus, SkyTrain, then another bus -- and it takes between 90 and 100 minutes, door to door. I get a lot of reading done, or I listen to podcasts, or if the Lithium Ion Gods are with me I fiddle with Emacs. This happens no matter what, but in September you've got all the people learning how the bus works, how far in advance they need to show up, and so on. The buses and SkyTrains are crowded because everyone's afraid it's the last one, or they'll be late for class, or everyone else is getting on so they must know something I don't.

And then it calms down. Some get tired of the bus and drive. Most figure out how late they can sleep in. (I vaguely remember that, in the same way that I vaguely remember kindergarten.) Things thin out. Before you know it it's December and it gets really empty. Winter brings humidity and rain, wet smells and drips on book pages.

And then it's spring, and then summer, and things get positively luxurious. There's room to stretch out, room for laptops, and lots to see. The kids' birthdays come around.

And then...September again.

No tags
OpenBSD needs help
9th September 2009

I just saw on Undeadly.org that orders for OpenBSD CDs are 'way down this year. Without OpenSSH and pf, I wouldn't be able to do my job nearly as well as I do. I've ordered a set for work (good excuse to upgrade that firewall), and ordered a set for home and tossed 'em $50 as well. I encourage you to do the same.

In the words of the original rant:

Do you use OpenBSD for fun? Contribute. Do you use OpenBSD for work? Contribute. Does OpenBSD allow you to worry about the problem you are trying to solve rather rather than the tools? Contribute. Do you wish your employer used the OpenBSD quality standard in your work? Contribute. Does your employer use OpenBSD? Ask them to contribute (after you do, of course). Do you bundle OpenBSD or subprojects like OpenSSH into your product? Contribute big! (you won't, you rarely do, but hey, I'll ask anyway) Do you find yourself wondering why so few take computer software quality seriously? Contribute!

Tags: openbsd, wontyoupleaselendahand.
Bad Time Equals LDAP Failure
9th September 2009

Just ran into an interesting problem: after replacing memory on a server, CentOS booting hung at "Starting system message bus..."

So what does dbus have to do with anything? This turned out to be an LDAP failure; dbus was trying to run as UID root, and since the LDAP server couldn't be contacted it hung. Why couldn't the LDAP server be contacted? The LDAP server logs only showed this:

[09/Sep/2009:12:04:32 -0700] conn=41492 op=-1 fd=112 closed - SSL
peer cannot verify your certificate.

The CA cert I use was in place, and another machine had just rebooted w/o problems (all this is taken care of with cfengine, so they were identical in this respect). I could connect to the LDAP server on the right port without any problems.

I finally figured out what was going on when I ran:

openssl s_client -connect ldap.example.com:636 -CApath /path/to/cacert_directory

and saw:

Verify return code: 9 (certificate is not yet valid)

date said it was December 31, 2001. What the what now? ntpdate to set things correctly, then I got:

Verify return code: 0 (ok)

I figure the CMOS clock (or whatever the kids are calling it these days) got reset when we had to remove the CPU daughtercard to get at the memory underneath.

And now you know...the rest of the story.

Tags: cfengine, ldap.
My submission to Canada's Consultation on Copyright
11th September 2009

In the spirit of Michael Geist, here's my submission on copyright reform. Originally I intended to write about how this affects me as a sysadmin, but then the stuff about my kids just came out...

1 comments. Tags: copyright.
I'm going to LISA '09!
16th September 2009

Just got the approval from the boss...LISA, here I come! w00t!

I'm going to LISA '09

1 comments. Tags: lisa.
Can I send email spam from your servers?
21st September 2009

Depressing.

No tags
What to ask when taking over external servers?
21st September 2009

At $WORK, I'm going to be taking over the administration of four servers that currently do stuff for a variety of researchers scattered around the province. There are a number of players here:

The owning agency has also ponied up for an upgrade to the four servers; I'll be taking delivery some time next week.

I've got some preliminary information -- what the servers do, how the users use the thing, etc -- but I'm preparing a more detailed plan. In the meantime, I've compiled a list of questions for my local contact.

In the middle of that, it occurred to me that this would be a good discussion topic. Have I missed anything? Let me know!

2 comments. Tags: migration, work.
Server cracked, restored
28th September 2009

"I say we take off and nuke the entire site from orbit. It's the only way to be sure."

Saturday afternoon my home web server got cracked. I found out because Google started refusing my searches, asking me to fill out a CAPTCHA form (incidentally, I hate the word CAPTCHA, and even typing it gives me hives) to prove I was human. What the hell?

So I checked on the server, which is also our firewall, which isn't good but frankly I was tired of maintaining a complex network at home, and sure enough there was some perl script running as user www-data (which Debian uses to run the webserver), sending off tons of Google queries and taking commands on IRC the way I keep hearing nobody does anymore. Crap.

Fortunately I've been running Bacula for a while now, backing up to an external hard drive, and so I figured that even though it probably would go away when I rebooted, I'd Do The Right Thing(tm) and rebuild from scratch.

This had to wait 'til the evening, so I shut down the webserver, ran backups a bunch more times, got more info, and moved the machine (a tiny li'l Shuttle box) from my youngest son's bedroom (apparently the only room in the house w/a phone outlet not covered by an ADSL filter) to our bedroom upstairs, running the network cable up the stairs.

In the end, it all went pretty smoothly. I was able to get all my packages back and restore from backup; the only thing I messed up was getting the ownership wrong on my restored crontab. (Debian uses a pool of UIDs for daemons, so you're not guaranteed to get the same UIDs if you reinstall.)

As a bandaid, I've firewalled off www-data from initiating connections out. I should have done this long before. Now I'm starting to think about the next step -- Xen, maybe, or SELinux. (I did briefly consider other distros, or even a BSD: CentOS for SELinux, FreeBSD for pf and jails. But I decided that one problem at a time was quite enough, thanks.)

Tags: linux, nukeitfromorbit, security.
LISA updates
30th September 2009

I've come across a few LISA items today, and it's only 9am...

Man, I'm looking forward to this.

1 comments. Tags: lisa.

RSS Feed