Two things bit me after doing big patching yesterday.
First, Cacti's logs suddenly exploded with a crapton of errors like this:
12/20/2012 03:41:41 PM - CMDPHP: Poller[0] ERROR: SQL Assoc Failed!,
Error:'1146', SQL:"SELECT 1 AS id, ph.name, ph.file, ph.function FROM
plugin_hooks AS ph LEFT JOIN plugin_config AS ...
and on it went. The problem: Cacti got upgraded, but I forgot to run the upgrade step.
Second, LDAP Replication stopped working. The single master (multi-master replication is for people who don't get enough pain in their lives already) suddenly stopped, with terribly uninformative log messages like:
NSMMReplicationPlugin - Replication agreement for agmt="cn=eg-02" (eg-02:636) could not be updated. For replication to take place, please enable the suffix and restart the server
Forcing initialization didn't work, and neither did recreating the agreement; that got me this error:
agmtlist_add_callback: Can't start agreement "cn=eg-02,cn=replica,cn=dc\example\2c dc\3dcom,cn=mapping tree,cn=config"
But that log message did hold the key. As described here, 389/CentOS/Fedora DS/RHDS switched to a new DN format. And near as I can figure, either some upgrade step didn't work or it simply wasn't there in the first place.
The solution: Shut down the server. Edit dse.ldif and change
cn=eg-02,cn=replica,cn=dc\example\2c dc\3dcom,cn=mapping tree,cn=config
to:
cn=eg-02,cn=replica,cn=dc\example\2cdc\3dcom,cn=mapping tree,cn=config
UPDATE: Nope, the problem recurred, leading to this amusing return from the Nagios plugin:
UNKNOWN - WTF is return code 'ERROR'???
In unrelated news, I have now switched to keeping account information in flat files distributed by rcp. Replication agreements are for the fucking birds.
SECOND UPDATE: A second re-initialization of the client fixed the problem. In still yet unrelated news, I've submitted a patch to the Linux folks to eliminate UIDs entirely.