/etc/ld.so.nohwcap

Now this is interesting.

Guy comes up to me at work and sez, "Hey, that new Linux machine is really slow." How can that be? It's an umpty-GHz processor with a gig of RAM, a nice hard drive and the same 100Mb/s connection to the network that the FreeBSD machine beside it has. "It's just slow." Slow how? Doing what? "It's just slow -- all the time."

Eventually we got it down to a working demonstration: log in. The developers've got a fairly intricate set of .cshrc files, so they echo some progress reports: Setting FOO...setting BAR...setting LD_LIBRARY_PATH...only it's taking for-freaking-ever -- well, relatively speaking: 8 seconds sez /usr/bin/time. cf. ~2 s. on the (if anything, slower) FreeBSD machine right beside it. WTF?

At first I started looking at the rc scripts. By deking out various bits, I could see where the 8 seconds was coming from -- half a second there, two and a half seconds there...but then I came to my senses and realized that looping over half a dozen items should not be causing this kind of nonsense.

I checked DMA on the hard drives. Aha, they're off! But all the home directory access is over NFS, so it's probably not much of an issue. And hdparm -d /dev/hda 1 just came back with permission denied (even as root...I seem to remember something about newer Intel chipsets having DMA built in), so I left that out.

Out of desperation more than anything else, I tried running strace /bin/csh /bin/echo foo -- and hot damn if we're not trying to open 209 different directories to find libncurses! Holy hell!

And what is this happy crappy? It's checking out a crapload of directories we haven't even told it about. For example, check out what it does for this one element in LD_LIBRARY_PATH, /home/foo/lib/bling:

open("/home/foo/lib/bling/tls/i686/mmv/cmov/libncurses.so.5", O_RDONLY) = -1 ENOENT (no such file or directory)
open("/home/foo/lib/bling/tls/i686/mmv/libncurses.so.5", O_RDONLY) = -1 ENOENT (no such file or directory)
open("/home/foo/lib/bling/tls/i686/libncurses.so.5", O_RDONLY) = -1 ENOENT (no such file or directory)
open("/home/foo/lib/bling/tls/libncurses.so.5", O_RDONLY) = -1 ENOENT (no such file or directory)
open("/home/foo/lib/bling", O_RDONLY) = -1 ENOENT (no such file or directory)
open("/home/foo/lib/bling/i686/mmv/cmov/libncurses.so.5", O_RDONLY) = -1 ENOENT (no such file or directory)
open("/home/foo/lib/bling/i686/mmv/libncurses.so.5", O_RDONLY) = -1 ENOENT (no such file or directory)
open("/home/foo/lib/bling/i686/libncurses.so.5", O_RDONLY) = -1 ENOENT (no such file or directory)
open("/home/foo/lib/bling/mmv/cmov/libncurses.so.5", O_RDONLY) = -1 ENOENT (no such file or directory)
open("/home/foo/lib/bling/mmv/libncurses.so.5", O_RDONLY) = -1 ENOENT (no such file or directory)
open("/home/foo/lib/bling/cmov/libncurses.so.5", O_RDONLY) = -1 ENOENT (no such file or directory)

That's right, folks, the preloader has taken it upon itself to do some kinda combinatoric search for hardware-optimized libraries, and as a result a measly thirteen entries in LD_LIBRARY_PATH turn into two hundred and nine. Add to that the aggravation of all these entries being in people's home directories which are NFS-mounted from elsewhere, and it's not too hard to see why the hell it's slow.

To be fair, this is meant to do something good: you can have libraries compiled for different hardware capabilities (HWCAP is the term to search for), which I imagine would be handy if you want one disk image for a bunch of different hardware. The trouble, of course, is that you get into these ridiculously long lists of directories that might exist...at least, if you're not using Alpha CPUs.

Fortunately, the folks at Debian have anticipated my whining and have done something about it.

Unfortunately, it's SOOPERSEKRIT.

Fortunately, I dug it out of a very cranky email to debian-devel:

# touch /etc/ld.so.nohwcap

With that in place, the formerly-plodding test that took 8 seconds to finish now runs in 1.5. And that is one hell of a performance gain from just touching one file.