I've spent many hours today at $WORK banging my head against the keyboard, trying to figure out why MMM-MySQL didn't work. MMM is meant to switch write roles, or master-slave roles, among different database servers for failover and such.
While the task as a whole is complex, the steps are simple enough: the monitor daemon accepts commands from a client, then forwards those commands to agents on the different MySQL servers. At its heart it's a bunch of Perl scripts that do the things this task entails: switching IP addresses, sending arp packets, toggling write-only status on the databases, and so on.
The problem came when, for example, the monitor would tell everyone to change their IP addresses and report success -- only I could see that wasn't working. Or the agent would run the command to turn the database write-only and report success, yet I could see that it wasn't working.
There were two factors at work here.
In the latter example, the agent would run the command
. Here's the relevant bit of code, edited for clarity:
# Read config file and status
our $config = ReadConfig("mmm_agent.conf");
print MySqlAllowWrite();
sub MySqlAllowWrite($) {
# connect to server
my $dsn = "DBI:mysql:host=$host;port=$port";
my $dbh = DBI->connect($dsn, $user, $pass, { PrintError => 0 });
return "ERROR: Can't connect to MySQL (host = $host:$port, user = $user)!" unless ($dbh);
# set read_only to OFF
(my $read_only) = $dbh->selectrow_array(q{select @@read_only});
return "ERROR: SQL Query Error: " . $dbh->errstr unless (defined $read_only);
return "OK" unless ($read_only);
my $sth = $dbh->prepare("set global read_only=0");
my $res = $sth->execute;
return "ERROR: SQL Query Error: " . $dbh->errstr unless($res);
$dbh = undef;
return "OK";
The subroutine is reporting errors but nothing watches for them. The code that calls the script itself just uses backticks and does no checking:
sub ExecuteBin {
my $command = shift;
my $params = shift;
my $return_all = shift;
my $path = "$config->{bin_path}/$command";
return undef unless (-x $path);
LogDebug("Core: Execute_bin('$path $params')");
my $res = `$path $params`;
unless ($return_all) {
my @lines = split /\n/, $res;
return pop(@lines);
return $res
The code to change IP address is much the same:
sub AddInterfaceIP($$) {
my $if = shift;
my $ip = shift;
if ($^O eq 'linux') {
`/sbin/ip addr add $ip/32 dev $if`;
} elsif ($^O eq 'solaris') {
`/usr/sbin/ifconfig $if addif $ip`;
my $logical_if = FindSolarisIF($ip);
unless ($logical_if) {
print "ERROR: Can't find logical interface with IP = $ip\n";
`/usr/sbin/ifconfig $logical_if up`;
} else {
print "ERROR: Unsupported platform!\n";
Needless to say I'll be filing bug reports.
The other factor that was going on was my ignorance about the tools I
was using. I couldn't figure out why the ip addr add
and ip addr
commands weren't working. The agent would report success adding
addresses, yet ifconfig
didn't show them. What I didn't realize was
that ip
can manipulate addresses that ifconfig
doesn't seem to
see. With ifconfig
, you add an additional address to an interface
like so:
ifconfig eth0:0
and you see a new device called eth0:0
. But with ip
, you do that
like so:
ip add dev eth0
and you don't see additional devices and ifconfig
doesn't see
the additional address. I wasn't thinking hard enough about what I
meant by "I can see that it doesn't work" -- something I'm all to
prone to take other people to task for (or at least act smugly about).
Ah well...the good news is that I learned something. The other good news is that, since at least a couple of these errors are in the latest versions of mmm_control, I should be able to spend some time at work improving them. Hasta la source, baby! (Or something like that...)
