_that's__weird


title: That's weird date: 2004-08-13 20:44:23

So it was a busy day at work: I had to do some juggling with home directories on our file server for Windows people, and set up a new Linux server for people running User-Mode Linux.

Which, BTW, rocks...but be sure to read this link. I came across this problem today (freezing at "Initializing stdio console driver", but managed to get around it by installing a new version of uml-utlities. Admittedly, I'm only trying the 2.4 series. But that didn't mean I wasn't able to find a weird thing...

So most of our workstations run FreeBSD. Our main NFS server runs FreeBSD. But we've got a couple workstations running Redhat Linux, and this problem was on one of them. It was very weird: Every time he ran ls in a particular NFS-mounted directory, ls would segfault and dump core. It was just this particular directory. And after some investigation, it turned out to be dependent on being this particular user.

I tried going to that directory. I could run ls just fine. I was running bash, so I tried running /bin/csh (most of the developers here run csh...poor bastards)...everything worked. I tried getting him to run bash; if he ran it in the problematic directory it dumped core, and if I got him to cd somewhere else and then come back and run ls it dumped core. If I sued to root and then to him, it dumped core. I tried, as him, going into another directory, very nearby in the tree with the same number of characters in the path. It was fine.

I got desperate and got him to try rebooting his machine. It still dumped core. WTF?

I'm curious enough at this point that I'm seriously considering digging up the source and compiling a debug version, then running it under GDB. This is all far enough beyond my experience that it's ridiculous. Still, I have to know or it's gonna kill me.