The Life of a Sysadmin

Carousel is a lie!

There it was, gone
Fri Oct 30 12:41:27 PDT 2009

Following in Matt's footsteps, I ran into a serious problem just before heading to LISA.

Wednesday afternoon, I'm showing my (sort of) backup how to connect to the console server. Since we're already on the firewall, I get him to SSH to it from there, I show him how to connect to a serial port, and we move on.

About an hour later, I get paged about problems with the database server: SSH and SNMP aren't responding. I try to log in, and sure enough it hangs. I connect to its console and log in as root; it works instantly. Uhoh, I smell LDAP problems...only there's nothing in the logs, and id <uid> works fine. I flip to another terminal and try SSHing to another machine, and that doesn't work either. But already-existing sessions work fine until I try to run sudo or do ls -l. So yeah, that's LDAP.

I try connecting via openssl to the LDAP server (stick alias telnets='openssl s_client -connect' in your .bashrc today!) and get this:

CONNECTED(00000003)

...and that's all. Wha? I tried connecting to it from the other LDAP server and got the usual (certificate, certificate chain, cipher, driver's license, note from mom, etc). Now that's just weird.

After a long and fruitless hour trying to figure out if the LDAP server had suddenly decided that SSL was for suckers and chumps, I finally thought to run tcpdump on the client, the LDAP server and the firewall (which sits between the two). And there it was, plain as day:

Near as I can figure, this was the sequence of events:

This took me two hours to figure out, and another 90 minutes to fix; setting the link speed manually on the firewall just convinced the nic/driver/kernel that there was no carrier there. In the end the combination that worked was telling the switch it was a gigabit port, but letting it negotiate duplexiciousnessity.

Gah. Just gah.

Tags: jumboframes, lisa, networking, openbsd, warstory.

Comments On This Entry

Yikes! Good catch, though. It is a bummer on the timing, but at least it happened now, instead of while you were gone. I don't even know how you'd troubleshoot that remotely.

Look me up at LISA. Are you coming to the "Blogger" BoF?

Oh aye. But are you coming to the Conference Organization BoF?

Oh, I didn't see it! I was looking at the Network Automation BoF. Can't I just clone myself? :-)

Maybe I'll switch off. I take it you're going to the Conference Organization BoF?

I'm organizing the CO BoF. :-)