Post-Maintenance Fallout

Two things bit me after doing big patching yesterday.

First, Cacti's logs suddenly exploded with a crapton of errors like this:

12/20/2012 03:41:41 PM - CMDPHP: Poller[0] ERROR: SQL Assoc Failed!,
Error:'1146', SQL:"SELECT 1 AS id, ph.name, ph.file, ph.function FROM
plugin_hooks AS ph LEFT JOIN plugin_config AS ...

and on it went. The problem: Cacti got upgraded, but I forgot to run the upgrade step.

Second, LDAP Replication stopped working. The single master (multi-master replication is for people who don't get enough pain in their lives already) suddenly stopped, with terribly uninformative log messages like:

NSMMReplicationPlugin - Replication agreement for agmt="cn=eg-02" (eg-02:636) could not be updated. For replication to take place, please enable the suffix and restart the server

Forcing initialization didn't work, and neither did recreating the agreement; that got me this error:

agmtlist_add_callback: Can't start agreement "cn=eg-02,cn=replica,cn=dc\example\2c dc\3dcom,cn=mapping tree,cn=config"

But that log message did hold the key. As described here, 389/CentOS/Fedora DS/RHDS switched to a new DN format. And near as I can figure, either some upgrade step didn't work or it simply wasn't there in the first place.

The solution: Shut down the server. Edit dse.ldif and change

cn=eg-02,cn=replica,cn=dc\example\2c dc\3dcom,cn=mapping tree,cn=config

to:

cn=eg-02,cn=replica,cn=dc\example\2cdc\3dcom,cn=mapping tree,cn=config

UPDATE: Nope, the problem recurred, leading to this amusing return from the Nagios plugin:

UNKNOWN - WTF is return code 'ERROR'???

In unrelated news, I have now switched to keeping account information in flat files distributed by rcp. Replication agreements are for the fucking birds.

SECOND UPDATE: A second re-initialization of the client fixed the problem. In still yet unrelated news, I've submitted a patch to the Linux folks to eliminate UIDs entirely.