Mmm_mysql


title: mmm_mysql date: Fri Sep 4 15:09:07 PDT 2009 tags: bugs, mysql

I've spent many hours today at $WORK banging my head against the keyboard, trying to figure out why MMM-MySQL didn't work. MMM is meant to switch write roles, or master-slave roles, among different database servers for failover and such.

While the task as a whole is complex, the steps are simple enough: the monitor daemon accepts commands from a client, then forwards those commands to agents on the different MySQL servers. At its heart it's a bunch of Perl scripts that do the things this task entails: switching IP addresses, sending arp packets, toggling write-only status on the databases, and so on.

The problem came when, for example, the monitor would tell everyone to change their IP addresses and report success -- only I could see that wasn't working. Or the agent would run the command to turn the database write-only and report success, yet I could see that it wasn't working.

There were two factors at work here.

In the latter example, the agent would run the command bin/mysql_allow_write. Here's the relevant bit of code, edited for clarity:

# Read config file and status
our $config = ReadConfig("mmm_agent.conf");

print MySqlAllowWrite();

exit(0);

sub MySqlAllowWrite($) {

    [snip]

    # connect to server
    my $dsn = "DBI:mysql:host=$host;port=$port";
    my $dbh = DBI->connect($dsn, $user, $pass, { PrintError => 0 });
    return "ERROR: Can't connect to MySQL (host = $host:$port, user = $user)!" unless ($dbh);

    # set read_only to OFF
    (my $read_only) = $dbh->selectrow_array(q{select @@read_only});
    return "ERROR: SQL Query Error: " . $dbh->errstr unless (defined $read_only);
    return "OK" unless ($read_only);

    my $sth = $dbh->prepare("set global read_only=0");
    my $res = $sth->execute;
    return "ERROR: SQL Query Error: " . $dbh->errstr unless($res);
    $sth->finish;

    $dbh->disconnect();
    $dbh = undef;

    return "OK";
}

The subroutine is reporting errors but nothing watches for them. The code that calls the script itself just uses backticks and does no checking:

sub ExecuteBin {
    my $command = shift;
    my $params = shift;
    my $return_all = shift;

    my $path = "$config->{bin_path}/$command";

    return undef unless (-x $path);
    LogDebug("Core: Execute_bin('$path $params')");
    my $res = `$path $params`;

    unless ($return_all) {
        my @lines = split /\n/, $res;
        return pop(@lines);
    }

    return $res
}


The code to change IP address is much the same:

sub AddInterfaceIP($$) {
    my $if = shift;
    my $ip = shift;

    if ($^O eq 'linux') {
        `/sbin/ip addr add $ip/32 dev $if`;
    } elsif ($^O eq 'solaris') {
        `/usr/sbin/ifconfig $if addif $ip`;
        my $logical_if = FindSolarisIF($ip);
        unless ($logical_if) {
            print "ERROR: Can't find logical interface with IP = $ip\n";
            exit(1);
        }
        `/usr/sbin/ifconfig $logical_if up`;
    } else {
        print "ERROR: Unsupported platform!\n";
        exit(1);
    }
}

Needless to say I'll be filing bug reports.

The other factor that was going on was my ignorance about the tools I was using. I couldn't figure out why the ip addr add and ip addr del commands weren't working. The agent would report success adding addresses, yet ifconfig didn't show them. What I didn't realize was that ip can manipulate addresses that ifconfig doesn't seem to see. With ifconfig, you add an additional address to an interface like so:

ifconfig eth0:0 10.0.0.2

and you see a new device called eth0:0. But with ip, you do that like so:

ip add 10.0.0.2/32 dev eth0

and you don't see additional devices and ifconfig doesn't see the additional address. I wasn't thinking hard enough about what I meant by "I can see that it doesn't work" -- something I'm all to prone to take other people to task for (or at least act smugly about).

Ah well...the good news is that I learned something. The other good news is that, since at least a couple of these errors are in the latest versions of mmm_control, I should be able to spend some time at work improving them. Hasta la source, baby! (Or something like that...)