pam_ldap: error trying to bind as user "uid=Alice,ou=Jones,ou=People,dc=example,dc=org" (Inappropriate authentication)
Two things bit me after doing big patching yesterday.
First, Cacti's logs suddenly exploded with a crapton of errors like this:
12/20/2012 03:41:41 PM - CMDPHP: Poller[0] ERROR: SQL Assoc Failed!,
Error:'1146', SQL:"SELECT 1 AS id, ph.name, ph.file, ph.function FROM
plugin_hooks AS ph LEFT JOIN plugin_config AS ...
and on it went. The problem: Cacti got upgraded, but I forgot to run the upgrade step.
Second, LDAP Replication stopped working. The single master (multi-master replication is for people who don't get enough pain in their lives already) suddenly stopped, with terribly uninformative log messages like:
NSMMReplicationPlugin - Replication agreement for agmt="cn=eg-02" (eg-02:636) could not be updated. For replication to take place, please enable the suffix and restart the server
Forcing initialization didn't work, and neither did recreating the agreement; that got me this error:
agmtlist_add_callback: Can't start agreement "cn=eg-02,cn=replica,cn=dc\example\2c dc\3dcom,cn=mapping tree,cn=config"
But that log message did hold the key. As described here, 389/CentOS/Fedora DS/RHDS switched to a new DN format. And near as I can figure, either some upgrade step didn't work or it simply wasn't there in the first place.
The solution: Shut down the server. Edit dse.ldif and change
cn=eg-02,cn=replica,cn=dc\example\2c dc\3dcom,cn=mapping tree,cn=config
to:
cn=eg-02,cn=replica,cn=dc\example\2cdc\3dcom,cn=mapping tree,cn=config
UPDATE: Nope, the problem recurred, leading to this amusing return from the Nagios plugin:
UNKNOWN - WTF is return code 'ERROR'???
In unrelated news, I have now switched to keeping account information in flat files distributed by rcp. Replication agreements are for the fucking birds.
SECOND UPDATE: A second re-initialization of the client fixed the problem. In still yet unrelated news, I've submitted a patch to the Linux folks to eliminate UIDs entirely.
At $WORK I've been setting up a server for a new project. It's going to be the foundation for a bunch of work that they're doing: bag o' passwords, code repository, website + wiki, email. I've been trying to set it up in such a way that a) it'll be a good foundation and b) it'll be easy for them to manage -- at least when it comes to adding/deleting/modifying users, email addresses and so on -- rather than having to come to me every time something needs changing.
I set up CentOS Directory server and populated it with the root and a
few entries. Thus: cn=Me,ou=People,dc=example,dc=org
. I've got
authentication working, restricted to certain groups where necessary,
and now it's time for me to think about how the non-propellerheads are
going to manage accounts. That means a web front end, and I've been
evaluating GOsa for the last week or so.
The good:
It's nice-looking. That's not damning-by-faint-praise; whatever I give to people is going to need to look professional.
It's got plugins for lots of different things: Samba settings, email settings (including easy vacation settings...nice), ACLs, FAI, Nagios (which, near as I can tell, is generated from FAI info), FTP quotas, you name it.
It supports hooks in its config file: postcreate, postremove, postmodify and check. These allow you to (say) create a home directory when a user is created or other fun things.
It includes an easy way to manage the LDAP server: export/import snapshots in CSV and LDIF, and snapshots of the tree that can be restored later. (Haven't tried that out yet.)
Addressbook page included.
The bad:
* Some plugins depend on others, but this isn't shown anywhere.
When I tried to remove the RPMs for some that I wouldn't need, I
found that others I _did_ need failed. If it should _all_ be
installed, I couldn't find that documented anywhere.
* Allowing people to log in to GOsa and change certain details is
important for me, but is documented poorly in the [ACLs
page][2].
* But also, ACLs can be applied only to items GOsa recognizes as
Departments (ie, `objectClass=gosaDepartment` and/or possibly
`objectclass=gosaAdministrativeUnit`). And you can't tell GOsa
that `ou=People` is one of these; it complains and says that's a
reserved keyword.
* The `gosa.conf` config file refers you to the manpage, but that
doesn't appear to be included in the RPMs I downloaded. The
[wiki page][6] on the file refers you to the [raw man page in
the source tree][7].
* It looks like vacation settings depend on an LDAP-enabled
vacation responder like [Gnarwl][3]; this is cool, but wasn't
documented anywhere. The only place I found this mentioned was
on [this blog entry][4], which contained a lot of other info
(like how to set Postfix up; yes, I shoulda RTFM but I didn't
know that Postfix could grab so much info out of LDAP) that I
didn't find on the GOsa website. In fact, that blog entry was
probably the best source of info about GOsa configuration.
* Adding email information for people requires a GOsa-specific
LDAP entry for a mail server; again, this is best documented in
[that blog entry][4].
Taking the the ou=People
ACL example above as a starting point, GOsa
wants to create its own department entries in your root:
ou=Accounting (say), with ou=Groups
and ou=People
under that. I
have mixed feelings about this organization; it would work well for
my current employer, but I'm not sure how well it'll fit for this
side project.
GOsa makes use of GOsa-specific LDAP schemas that you add to your server; this lets it label an object with (say) notes that users are allowed to change their own password. Perhaps not bad in itself -- I don't know enough to judge -- but it seems to mean that there are two sets of ACLs to manage: those in GOsa, and those in your LDAP server.
GOsa's schema files are for OpenLDAP syntax (there's a proper name for that but I can't think of it right now). I'm using CentOS DS, which has its own. FInding out how to convert one to the other was hard enough; after that was done, though, CentOS DS still didn't like it, and I had to delete some additional entries. There doesn't appear to be a lot of support for CentOS/Fedora/RedHat/389 DS.
(For the record, here's what I had to do:
$ wget http://directory.fedoraproject.org/download/ol-schema-migrate.pl
$ for i in $(grep 'schema/gosa' /usr/share/doc/gosa/slapd.conf-example | sed -e's/ldap/openldap/' | awk '{print $2}') ; do echo ../ol-schema-migrate.pl -b $i >> 98gosa.ldif; done
And then remove references to gosaLoginRestriction and goFaxDivertNumber.)
Overall, I'm not sure how happy I am with GOsa. It's not doing what I
want now, and it looks like the only way to make that happen is to
wipe the tree and start from scratch. -- Well, no, not scratch:
create a GOsa-visible department (ou=Something
) and stick everything
in there. Given the sparse documentation, it's probably silly for me
to focus on that, but that's what sticks out at me right now.
Trying to figure out how to add a bunch (well, 6) of LDAP user accounts. So far:
ldapuseradd: neat; wish there was a dry-run mode; wish the hooks supported arbitrary scripts
Python: neat; good to learn Python; have to avoid going down the rabbit hole of classes (Python made OOP clear to me in a way it hasn't before, and now everything looks like a nail)
phpLDAPadmin: arghh; nooooo; can't script
Realistically this won't happen very often, so perhaps I should just go ahead and use the damn browser to do this. But it kinda hurts me a little inside when I do.
Just ran into an interesting problem: after replacing memory on a server, CentOS booting hung at "Starting system message bus..."
So what does dbus have to do with anything? This turned out to be an
LDAP failure; dbus was trying to run as UID root
, and since the LDAP
server couldn't be contacted it hung. Why couldn't the LDAP server be
contacted? The LDAP server logs only showed this:
[09/Sep/2009:12:04:32 -0700] conn=41492 op=-1 fd=112 closed - SSL
peer cannot verify your certificate.
The CA cert I use was in place, and another machine had just rebooted w/o problems (all this is taken care of with cfengine, so they were identical in this respect). I could connect to the LDAP server on the right port without any problems.
I finally figured out what was going on when I ran:
openssl s_client -connect ldap.example.com:636 -CApath /path/to/cacert_directory
and saw:
Verify return code: 9 (certificate is not yet valid)
date
said it was December 31, 2001. What the what now? ntpdate
to set things correctly, then I got:
Verify return code: 0 (ok)
I figure the CMOS clock (or whatever the kids are calling it these days) got reset when we had to remove the CPU daughtercard to get at the memory underneath.
And now you know...the rest of the story.
Okay, I feel like a bit of a tool for never realizing how cool suspend-to-ram is in a laptop. My new laptop for work is a Dell D630, which I'd got 'cos its hardware is pretty much completely compatable w/Linux. However, I've also figured out that a) Ubuntu does suspend-to-ram quite nicely (aside from a couple times when the keyboard doesn't work, but closing/reopening the lid makes it work), and b) it just sips — sips, I tell you! — from the battery.
Now to try and make it work on my own laptop, which is currently sitting at the shop waiting for me to pick it up.
Today's agenda:
See? I am still a sysadmin.
"Phycicists are fun to be around. I was watching TV with one, and a commercial came on for OxyClean. The announcer's voice comes in, strong and deep, and says, What's the most powerful force in the universe? The guy I'm with starts pumping his fist and chanting, Strong nuclear force! Strong nuclear force! The announcer comes back and says, That's right, oxygen! Poor bastard looked like someone just shat in his ear."
(Conversation with a friend just now.)
Two things that didn't work:
Explanation: there's ou=Smith and ou=Jones, both of which are under ou=People,dc=example,dc=org. Smith wants to offer Jones the use of a few of his machines, which means setting up accounts for Jones and a few of his folks (cn=Alice, cn=Bob, and cn=Charlie). Obviously, these should be in ou=Jones, right? But if Smith's machines, through the wonders of pam_ldap, are set to check ou=Smith, how do Jones' folks log in?
(Digression: actually, Smith's machines right now check under ou=People — not ou=Smith,ou=People. Smith is the first one to use LDAP, so I stuck with that. I was going to change that at some point anyway, and I thought this might be a good chance to do just that.)
I thought I could try adding an alias, under ou=Smith, that'd point to cn=Alice,ou=Jones. And if I told LDAP that it was a posixAccount as well, then I could look at the account details with id and getent. But the logs showed that it just didn't work:
pam_ldap: error trying to bind as user "uid=Alice,ou=Jones,ou=People,dc=example,dc=org" (Inappropriate authentication)
Couldn't track down the error quickly, so went to plan B: stick with the current setup (machines checking ou=People) and put 'em under ou=Jones. I can always add host restrictions later on.
Explanation: Smith had a bunch of these machines at another location before getting server room space at UBC (and new servers). My access to them previously was via SSH only — there was no console access at all (sigh). Now they're at UBC, and one of 'em's gonna be my monitoring machine/second LDAP server ("The new server room: now with redundancy!") But while it was simple to turn on console redirection and choose PXE booting from the comfort of my office, I ended up borking the kickstart process and having to come back here anyway to set up the install. There's the BMC, which apparently I can access via the serial console if I so choose, but I'm still trying to figure out what that'll get me — ie, I can't find a manual in 11 seconds, so I'm putting that off for now.
Oh, and my new (work) laptop is in. Yay! It's a Dell D630, and aside from it's obscene footprint compared to my (ailing) C400, it's great. Ubuntu (Hardy for compatibility with the desktops here) is on so far, and CentOS (server work) and OpenBSD (instant firewall) aren't far behind.
I've been hlding off mentioning this 'til all my ducks were in a row, but at last it's settled. The job I've been working at part-time for the last six months will be my full-time job starting next Wednesday. w00t!
I've been spending my time at $job_1 making sure the documentation is complete, getting a spare workstation set up and ready to go, and dumping my brain into the sysadmin who will be helping fill in 'til a new person is hired (which might take a while).
I'm really excited about this. First off, I'll get my lunch hours back; I've been walking between the two offices (mornings at one, afternoons at the other, back to the first for the last half hour), and it'll be nice to have an hour to myself again. But the new job is exciting for me: nice big servers used for scientific computation, the chance to build an infrastructure from scratch, and some big projects. The people are friendly. The boss is nice. The place has funding for the next five years or so. It's all good. About the only thing missing is a rocket pack so I can cut down on this 90-minute commute.
And on top of all that, they're open to the idea of sending me to LISA this year. Now that would be nice…have to see if it works with the family, but I'm keeping my fingers crossed.
In other news:
A few quick notes about building Fedora Directory Server RPMs for CentOS:
$instance_dir
points to /etc/dirsrv
, not /etc/fedora-ds
.(Partly a memo to myself, and partly to help anyone in the same boat; edits have been disabled in the FDS wiki, so I can't add this right now.)
The Internet Storm Center writes about a new variant on malware that messes with your DNS: it installs a rogue DHCP server.
While not too sophisticated, the whole attack is very interesting. First, it's about a race between the rogue DHCP server and the legitimate one. Second, once a machine has been poisoned it is impossible to detect how it actually got poisoned in the first place - you will have to analyze network traffic to see the MAC address of thoese DHCP Offer packets to find out where the infected machine actually is.
In other news...all $job_2's new machines are set up and running. Kickstart is very nice…I really wish Debian had something similar; FAI is lovely, but Kickstart has the lovely feature of taking a hand-done installation you've just finished and turning that into a config file for a hands-off version. That saves a huge amount of time.
Next up: turn nscd back on (forgot I'd left it off for debugging LDAP
'til a simple find -exec chown
was taking 10 minutes to finish);
relabel the machines with their new names; commit the documentation
I've been piecing together on my laptop; open up to others in the
group; look at either moving the LDAP server over to the server room,
or setting up a slave over there.
I just spent the weekend (well, like an hour a day...kids, life, you know how it is) trying to track down why a bunch of new CentOS 5.2 installs at $job_2 couldn't pipe:
$ ls foo foo $ ls | grep foo $ echo $? 141
(Actually, I didn't think to look at the error code 'til someone else pointed it out…141 turns out to be SIGIPE) In the end, it would have been quicker if I'd simply searched for the first thing I saw when logging in:
-bash: [: =: unary operator expected -bash: [: -le: unary operator expected
This was particularly aggravating to track down because not every machine was doing this, and no matter what I thought to look at (/etc contents, /tmp permissions (those have a habit of going wonky on me for some reason), SELinux) I couldn't figure out what was different.
Turned out to be an upstream bug in nss_ldap. (The Bugzilla entry makes for some interesting reading, to be sure…) And I didn't see it on each machine because I hadn't upgraded after installation on all machines. (They're not yet in production, and I'm working on getting my kickstart straight.)
Man, it was gratifying to upgrade nss_ldap and see the problem go away…
So one of the things I need to set up at $JOB_2
is some kind of
unified bag o' passwords…which, since I hate NIS, pretty much means
LDAP. This is the first chance I've had to set up an LDAP system from
scratch, rather than either being afraid to try or being stuck with
(and, sadly, contributing to the further divergence of) a mishmash of
semi-borked LDAP servers.
I've been trying out Fedora Directory Server the last few days, and so far I'm pretty happy with it. It's nice to have the luxury of learning what the hell I'm doing before it all goes live, of screwing up a bunch of times on a non-production system.
Likes: Welp, it's a lot like Sun's Directory Server…at least as far as the logging and console go, anyhow. Not surprising, given the heritage. You can automate installation by giving it a configuration file — something I didn't realize you could do with Sun's DS.
Other likes: PHPLDAPAdmin is nice. The latest version has E-Z-Reed XML templates for things like account creation, meaning I can keep my ignorance of Javascript intact. (Hurray!)
Minor irritants: there are a few. First off, there are no RPMs for CentOS 5 for the 1.1 series; you have to jump through some hoops to get the FC6 RPMs of 1.1 installed. I'd originally tried the 1.0 series on Debian, and hadn't realized that the 1.1 series does not include the org chart or E-Z-Account-Maker web app. (This is where y'all can go, "Muffin!")
Third, I'm so far not able to get the automated installation
working…can't figure out why. Not terribly important, since $JOB_2
is small and likely to stay that way; a couple of servers is likely to
be the max. But installation of this thing, just like with Sun DS, has
lots of knobs that you can twiddle if you want, and part of the
problem with the mishmash at $JOB_1
is that no one ever standardized
the settings — never wrote down the answers to the questions, or
scripted it, or came up with a config file, or anything. And it's
hellish if you want to add another install to the mix.
Anyhow...so far it's cool. I've been playing with it on a machine at
$JOB_2
plus an installation of CentOS 5 on my laptop. Still to
learn: SSL, replication, and (maybe) multi-master replication.
(Incidentally, I'm surprised that there isn't a more recent version of O'Reilly's LDAP Administration by Gerald Carter. Yes, there's still OpenLDAP and I don't imagine it's changed very much (feel free to correct me), but something that included Fedora DS, and maybe (maybe) OpenDS would be good.
(And speaking of Sun gossip, I've been meaning to mention this for a while…and now this.)
There are always timesinks at a job: the things that suck up all your spare time, that interrupt what you're doing and force you on to something else. They're urgent, or they're complicated, or they're obscure and you only ever touch them every six months. If you're really unlucky, they're all three. They drain the life from you; a good day turns shitty, and an already-shitty day becomes nigh-unbearable.
The website is one such timesink at my current job. It's a veritable
Grand Canyon of different technologies, databases, and code. You can
examine it and, like a geologist, date particular pages or code with
great accuracy, judging by clues like composition, surroundings,
indentation patterns ("Oooh, K&R crossed with…crack?"), and
previous experience. When an Urgent Request for Web Changes comes in
(and they're all urgent), figuring out how to do it means figuring out
how that particular page was generated in the first place: static?
dynamic? CMS? And then you have to figure how you can meddle with it:
logging into Mambo, the CMS of the damned? If it's static: does the
URL map nicely to the filesystem, or is there a hidden Apache Alias
directive somewhere? Do you have permissions to open the file, or will
it take sudo
to chown it, or another nagging email to a coworker to
please check their changes into RCS? And if, God help you, it's
dynamic…but no; that mess of spaghetti should stay down. There's no
sense bringing it up again simply for prurient purposes.
Sunray terminals are another timesink. When they work they work very well. I like the energy-saving aspects of it — both electrical and my own; one machine to manage is always better than 40. But when they don't work, it's a pain. Has a session become wedged? Is it GNOME's fault? Has Adobe Acrobat decided to eat up all the CPU again? If so, is that worse than the security holes that remain unfixed in the later version? Why is Solaris 10 randomly not sending RST packets when it receives a SYN on a port it's not listening on? (If anyone has any ideas, please let me know.) Has a cheap switch, installed because no one believed that an office meant for one might someday hold four, gone off its meds again?
These things make me throw up my hands and and curse my fortune. I have no one unfortunate enough to be my subordinate, so it's up to me to hack and slash through the possibilities until it's finished, or at least put off for another day.
But LDAP is worse.
When it works it works very, very well. Failover works, replication works, and an account created here zips there without a moment's thought. But when it fails, it's urgent and complicated and obscure all at once, and sometimes in degrees polynomial.
At last count we have four different master-master replicas, running three or possibly four different versions of Sun's Directory Server (under six different product names, no less). There are replication agreements spanning versions that aren't even supposed to tolerate each other's existence, using two different encryption protocols and NetBEUI. Two completely different "helpful" management tools vie for our attention, lacking only flash plugins to trigger seizures. Only one server can be poked or prodded with a command line tool. Diagnostics are by turns nonexistent or endearingly fickle.
To be fair, the vendor documentation is vast and makes fine kindling,
though its promise to fully document error codes like error
457758854b: BER error 45775885b4
is best regarded as a bitter joke by
a jaded software engineer who died alone, unloved and without stock
options. (Our own documentation is marginally better: no carbon is
released when it is destroyed.) Thus, keeping track of ACLs (say), and
exactly which unholy wrath you will invite upon your head should you
make a mistake when granting or revoking privileges to read a
particular entry, means digging through half-remembered conversations,
drunken Google searches, year-old notebooks and a quiet, solitary
introspection normally reserved for contemplating your own impending
doom.
On top of everything else, LDAP encompasses everything, or nearly so. Email routing, website privileges, database access, even TCP checksum computation: all are kept in, or depend on, or just like to hold hands with, LDAP. It's enough to make me wistful for the good old days of NIS.
In a few minutes I am going to go back to work and try to figure out why a new account has stopped, in mid-replication, halfway between $UNIVERSITY and $OTHER_UNIVERSITY. It will take me the rest of the afternoon. I will use words that my own son does not know I know. And I will come out of it shrunken, withered, beaten down and humble.
At $other_university today adding a new hard drive to our server here: 300GB, instead of 30GB. The users will be very happy. And what with the snow coming down, I'll be very happy if transit keeps running 'til I'm all done.
And now a story about how sometimes it's not all Sympa's fault…
As part of the premptive strike against the mail server's impending failure, I upgraded Sympa (shudder) on $big_server using pkg-src. After copying the list config files over, and pointing it at a separate database, I tried telling it to update its list of subscribers. When I compared a few lists to the already-existing ones, though, they were short random numbers of subscribers. One that I used as a test case, for example, was short about 100 subscribers. That was a concern to me.
The lists that were short all seemed to be ones that grabbed information from the LDAP server, so I tried looking at the queries that Sympa made. The query itself was pretty simple:
(&(objectclass=inetLocalMailRecipient)(gidNumber=10000))
with Sympa being told that mailLocalAddress
was the important
bit. Should be simple to compare the results from this server and the
one where things work…But I lost a good hour of my life, and possibly
a couple years of life expectancy, when I became convinced that,
somehow, replication to that server was failing, and big chunks of
information (like mailLocalAddress
) were being lost. Finally,
though, I figured out that I'd stupidly been querying the two servers
using different credentials. Unfortunately, I figured that out on the
way home Friday night.
(Obviously I'd forgotten Aeileen Frisch's rules about system administration:
Monday, though, was a fresh start and a whole new day. I started by
looking again at the queries Sympa made. On $old_server, we'd get
about 210 results, with mailLocalAddress
in each one of 'em. But on
$big_server, we'd get about 210 results, with mailLocalAddress
missing in about 100 of them. No wonder Sympa was short.
I double-checked by specifically requesting mailLocalAddress
on the
problematic server, and it was returned. But $big_server didn't
volunteer that information.
I did some digging, and it seems this may be a bug in Sun Directory Server: it should be returning mailLocalAddress as one of the attributes. However, it does not do so for all entries, even when the querying user should have permission to see them. However, I'm unable to see the PR that the thread mentions, since we're not paying for support. (Thank you, Sun.)
Digging into Sympa, though, I found out that this was not the
entire reason for the failure. Sympa uses Perl's Net::LDAP module to
do its queries. It turns out that Net::LDAP wants a list when you're
asking for particular attributes. But in List.pm
's
_include_users_ldap
function, the search is created like so:
$fetch = $ldaph->search ( base => "$ldap_suffix",
filter =>; "$ldap_filter",
attrs => "$ldap_attrs",
scope =>; "$param->{'scope'}");
Changing one line:
$ rcsdiff -r1.1 List.pm
===================================================================
RCS file: RCS/List.pm,v
retrieving revision 1.1
diff -r1.1 List.pm
8646c8646,8647
< attrs => "$ldap_attrs",
---
> # attrs => "$ldap_attrs",
> attrs => ["$ldap_attrs"],
meant that, instead of asking for the default attributes (which
$big_server was calculating incorrectly), it was asking for
mailLocalAddress
and succeeding.
And now you know the rest of the story.
My lack of experience with LDAP in general, and Sun's (iPlanet|Directory Server( Enterprise Edition)?) in particular, has proven to be a bit of a handicap of late.
Case in point: when I upgraded $big_machine to Solaris 10 at the end of August, I also upgraded its LDAP server from iPlanet 5.1 to DSEE 6 (same software, different name). At the time I had two problems: I was unable to get replication to $big_server (we have a multi-master configuration; not supposed to work with 5.1, but it does/did for us) working over SSL, and replication from $big_server to other machines did not work. There were a lot of things going wrong at that point, so I set up replication in the clear from $little_machine, another LDAP server on the LAN, and left it 'til I had more time. It wasn't ideal, but it would do.
The last two Saturdays I've been trying to figure out why replication wasn't working. I concentrated on getting replication to it working over SSL. This was tough, because the logs didn't tell me much:
Server failed to flush BER data back to client
I swear, this turned up more Googlejuice today than it did a few weeks ago, because this time it turned up the ever-excellent Brandon Hutchinson again. This time he had a truly great set of instructions on installing DSEE6. That lead me to this blog entry, very helpful, giving information about the different sorts of databases you can stick your SSL certs into. (Must learn more about SSL/OpenSSL…)
However, in the end it turned out to be a simple and moderately
embarassing mistake: it's not enough, with DS6, to say dsadm
add-cert
and be done with it; you actually have to specify the
certificate to use. As Brandon points out, you have to edit =dse.ldif=
in order to do so (though I had to stop the server, edit the file and
start it up again, rather than just edit and restart, in order to get
it to work).
The other thing — replication from $big_server elsewhere — is still not working. I suspect this is my fault; in an attempt to get things working, I decided that the thing to try would be initializing $big_server from $little_server, then the other way around. This did not change things, and now $little_server is unable to push its changes elsewhere. I've since been told this is a mistake on my part; arghh.
Unfortunately, there were other things I screwed up in the original install of DS6 on $big_server — embarassing and rather pointless to record for Google right now — and I strongly suspect that I'm going to have to reinstall or reinitialize $big_server just to get things into a reasonably coherent state. Fortunately, there aren't that many changes that ever happen on it, so there shouldn't be many to lose or redo if it's wiped.
And thus my Saturday.
There is nothing worse than a problem that goes away once you restart
the program. Case in point: ls -l /home
ran atrociously slow
(slowly?) on a Solaris 10 machine at work today. It's running Sun's DS
5.2 (or whatever they're calling it these days).
I've come across this problem before when I was trying to figure out how to get the thing to bind to itself by default as an LDAP client, rather than to one of the remote servers that're meant to be backups.
This time, though, that simply wasn't the problem: no traffic was
going to the other machines at all. All I saw was looooooooooong
lookup times for simple passwd stuff. Error logs showed
nothing. Access logs swore blind that access times were on the order
of zero nanoseconds. Truss showed it kept mmap()
ing things; dtrace
showed a whole lotta reads. I couldn't figure out more than that
(which, natch, is my fault, not the tools).
In the end and out of desperation I restarted the server…which did the trick but left me frustrated that I'm no closer to figuring out what's going on with the damn thing.
When installing openldap22-server on FreeBSD from ports, I got this error when starting slapd:
error loading ucdata (error -127)
This is a permission problem with the directory /usr/local/share/openldap/ucdata
; changing the group ownership cleared it up.