Posts tagged “jumboframes”

January 25, 2013 FTP problems solved by bletcherous hack
Some time between mid-December and January at $WORK, we noticed that FTP transfers from the NIH NCBI were nearly always failing; maybe one attempt in 15 or so would work. I got it down to this test case:
```
wget ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE34nnn/GSE34777/matrix/GSE34777_series_matrix.txt.gz
```
The failed downloads would not fail right away, but instead hung when the data connection from the remote end should have transferred the file to us. Twiddling passive did nothing. If I tried HTTP instead of FTP, I'd get about 16k and then the transfer would hang.

That hostname, ftp.ncbi.nlm.nih.gov, resolves to 4-6 different IP addresses, with a TTL of 30 seconds. I found that ftp transfers from these IP addresses failed:
- 130.14.250.10
- 130.14.250.11
- 130.14.250.12
but this one worked:
- 130.14.29.30
The A record you get doesn't seem to have a pattern, so I presume it's being handled by a load balancer rather than simple round-robin. It didn't come up very often, which I think accounts for the low rate of success.

At first I thought this might indicate network problems here at $WORK, but the folks I contacted insisted nothing has changed, we're not behind any additional firewalls, and all our packets take the same route to both sets of addresses. So I checked our firewall, and couldn't find anything there -- no blocked packets, and to the best of my knowledge no changed settings. Weirdly, running the wget command on the firewall itself (which runs OpenBSD, instead of CentOS Linux like our servers) worked...that was an interesting rabbit hole. But if I deked out the firewall entirely and put a server outside, it still failed.

Then I tripped over the fix: lowering the MTU from our usual 9000 bytes to 8500 bytes made the transfers work successfully. (Yes, 8500 and no more; 8501 fails, 8500 or below works.) And what has an MTU of 8500 bytes? Cisco Firewall Service Modules, which are in use here at $WORK -- though not (I thought) on our network. I contacted the network folks again, they double-checked, and said no, we're not suddenly behind an FSM. And in fact, their MTU is 8500 nearly everywhere...which probably didn't happen overnight.

Changing the MTU here was an imposing thought; I'd have to change it everywhere, at once, and test with reboots...Bleah. Instead, I decided to try TCP MSS clamping instead with this iptables rule:
```
iptables -A OUTPUT -p tcp -d 130.14.250.0/24 --tcp-flags SYN,RST SYN -j TCPMSS --set-mss 8460
```
(Again, 8460 or below works; 8461 or above works fine.) It's a hack, but it works. I'm going to contact the NCBI folks and ask if anything's changed at their end.
January 11, 2011 Xmas Maintenance 2010: Lessons learned
Xmas vacation is when I get to do big, disruptive maintenance with a fairly free hand. Here's some of what I did and what I learned this year.

Order of rebooting

I made the mistake of rebooting one machine first: the one that held the local CentOS mirror. I did this thinking that it would be a good guinea pig, but then other machines weren't able to fetch updates from it; I had to edit their repo files. Worse, there was no remote console on it, and no time (I thought) to take a look.
- Lesson: Don't do that.
Automating patching

Last year I tried getting machines to upgrade using Cfengine like so:
```
centos.some_group_of_servers.Hr14.Day29.December.Yr2009::
          "/usr/bin/yum -q -y clean all"
          "/usr/bin/yum -q -y upgrade"
          "/usr/bin/reboot"
```
This didn't work well: I hadn't pushed out the changes in advance, because I was paranoid that I'd miss something. When I did push it out, all the machines hit on the cfserver at the same time (more or less) and didn't get the updated files because the server was refusing connections. I ended up doing it by hand.

This year I pushed out the changes in advance, but it still didn't work because of the problems with the repo. I ran cssh, edited the repos file and updated by hand.

This worked okay, but I had to do the machines in separate batches -- some needed to have their firewall tweaked to let them reach a mirror in the first place, some I wanted to watch more carefully, and so on. That meant going through a list of machines, trying to figure out if I'd missed any, adding them by hand to cssh sessions, and so on.
- Lesson: I need a better way of doing this.
- Lesson: I need a way to check whether updates are needed.
I may need to give in and look at RHEL, or perhaps func or better Cfengine tweaking will do the job.

Staggering reboots

Quick and dirty way to make sure you don't overload your PDUs:
```
sleep $(expr $RANDOM / 200 ) && reboot
```
Remote consoles

Rebooting one server took a long time because the ILOM was not working well, and had to be rebooted itself.
- Lesson: I need to test the SP before doing big upgrades; the simplest way of doing this may just be rebooting them.
Upgrading the database servers w/the 3 TB arrays took a long time: stock MySQL packages conflicted with the official MySQL rpms, and fscking the arrays takes maybe an hour -- and there's no sign of life on the console while you're doing it. Problems with one machine's ILOM meant I couldn't even get a console for it.
- Lesson: Again, make sure the SP is okay before doing an upgrade.
- Lesson: Fscking a few TB will take an hour with ext3.
- Lesson: Start the console session on those machines before you reboot, so that you can at least see the progress of the boot messages up until the time it starts fscking.
- Lesson: Might be worth editing fstab so that they're not mounted at boot time; you can fsck them manually afterward. However, you'll need to remember to edit fstab again and reboot (just to make sure)...this may be more trouble than it's worth.
OpenSuSE

Holy mother of god, what an awful time this was. I spent eight hours on upgrades for just nine desktop machines. Sadly, most of it was my fault, or at least bad configuration:
- Two of the machines were running OpenSuSE 11.1; the rest were running 11.2. The latter lets you upgrade to the latest release from the command line using "zypper dist-upgrade"; the former does not, and you need to run over with a DVD to upgrade them.
- By default, zypper fetches packages one at a time, installs them, then fetches them again. I'm not certain, but I think that means there's a lot more TCP overhead and less chance to ratchet up the speed. Sure as hell seemed slow downloading 1.8GB x 9 machines this way.
- Graphics drivers: awful. Four different versions, and I'd used the local install scripts rather than creating an RPM and installing that. (Though to be fair, that would just rebuild the driver from scratch when it was installed, rather than do something sane like build a set of modules for a particular kernel.) And I didn't figure out where the uninstall script was 'til 7pm, meaning lots of fun trying to figure out why the hell one machine wouldn't start X.
- Lesson: This really needs to be automated.
- Lesson: The ATI uninstall script is at /usr/share/ati/fglrx-uninstall.sh. Use it.
- Lesson: Next time, uninstall the driver and build a goddamn RPM.
- Lesson: A better way of managing xorg.conf would be nice.
- Lesson: Look for prefetch options for zypper. And start a local mirror.
- Lesson: Pick a working version of the driver, and commit that fucker to Subversion.
Special machines

These machines run some scientific software: one master, three slaves. When the master starts up at boot time, it tries to SSH to the slaves to copy over the binary. There appears to be no, or poor, rate throttling; if the slaves are not available when the master comes up, you end up with the following symptoms:
- Lots of SSH/scp processes on the master
- Lots of SSH/scp processes on the slave (if it's up)
- If you try to run the slave binary on the slave, you get errors like "lseek(3, 0, SEEK_CUR) = -1 ESPIPE (Illegal seek)" (from strace) or "ESPIPE text file busy" (from running it in the shell).
The problem is that umpty scp processes on the slave are holding open the binary, and the kernel gets confused trying to run it.
- Lesson: Bring up the slaves first, then bring up the master.
- Lesson: There are lots of interesting and obscure Unix errors.
I also ran into problems with a duff cable on the master; confusingly, both the kernel and the switch said it was still up. This took a while to track down.
- Lesson: Network cables are surprisingly fragile at the connection with the jack.
Virtual Machines

It turned out that a couple of my kvm-based VMs did not have jumbo frames turned on. I had to use virt-manager to shut down the machines, turn on virtio on the drivers, then reboot. However, kudzu on the VMs then saw these as new interfaces and did not configure them correctly. This caused problems because the machines were LDAP clients and hung when the network was unavailable.
- Lesson: To get around this, go into single-user mode and copy /etc/sysconfig/network-scripts/ifcfg-eth0.bak to ifcfg.eth0.
- Lesson: Be sure you're monitoring everything in Nagios; it's a sysadmin's regression test.
February 03, 2010 Jumbo frames again
Arghh...I just spent 24 hours trying to figure out why shadow migration was causing our new 7310 to hang. The answer? Because jumbo frames were not enabled on the switch the 7310 was on, and they were on the machine we're migrating from. Arghh, I say!
October 30, 2009 There it was, gone
Following in Matt's footsteps, I ran into a serious problem just before heading to LISA.

Wednesday afternoon, I'm showing my (sort of) backup how to connect to the console server. Since we're already on the firewall, I get him to SSH to it from there, I show him how to connect to a serial port, and we move on.

About an hour later, I get paged about problems with the database server: SSH and SNMP aren't responding. I try to log in, and sure enough it hangs. I connect to its console and log in as root; it works instantly. Uhoh, I smell LDAP problems...only there's nothing in the logs, and id <uid> works fine. I flip to another terminal and try SSHing to another machine, and that doesn't work either. But already-existing sessions work fine until I try to run sudo or do ls -l. So yeah, that's LDAP.

I try connecting via openssl to the LDAP server (stick alias telnets='openssl s_client -connect' in your .bashrc today!) and get this:
```
CONNECTED(00000003)
```
...and that's all. Wha? I tried connecting to it from the other LDAP server and got the usual (certificate, certificate chain, cipher, driver's license, note from mom, etc). Now that's just weird.

After a long and fruitless hour trying to figure out if the LDAP server had suddenly decided that SSL was for suckers and chumps, I finally thought to run tcpdump on the client, the LDAP server and the firewall (which sits between the two). And there it was, plain as day:
- 3-way handshake
- client says "I speak SSL!"
- server says "I speak SSL too! Here you go!"
- but the client never sees that packet
- and neither does the firewall.
Near as I can figure, this was the sequence of events:
- We SSH'd from the firewall, with its two bridged Intel GigE jumbo-enabled NICs
- to the console server, which only does 10/100
- which somehow prompted a renegotiation of the link speed on the firewall's interface
- which settled on 100 MBit, full duplex, but with jumbo frames
- which the switch saw as completely bogus
- which prompted the switch to (silently, natch) drop all jumbo frames directed at the firewall's outside interface
- which, in the context of an LDAP lookup done by a client inside the firewall, meant that the first packet that failed was the "I speak SSL too! Here you go!" packet
- which left the client with an established TCP connection to the LDAP server, waiting for a certificate
- which meant that it never actually failed over to the other LDAP server.
This took me two hours to figure out, and another 90 minutes to fix; setting the link speed manually on the firewall just convinced the nic/driver/kernel that there was no carrier there. In the end the combination that worked was telling the switch it was a gigabit port, but letting it negotiate duplexiciousnessity.

Gah. Just gah.

Carousel is a LIE!

Posts tagged “jumboframes”

Order of rebooting

Automating patching

Staggering reboots

Remote consoles

OpenSuSE

Special machines

Virtual Machines