Anyone who's anyone

11 Nov 2010

I missed my chance, but I think I'm gonna get another...

-- Sloan

Thursday morning brought Brendan Gregg's (nee Sun, then Oracle, and now Joyent) talk about data visualization. He introduced himself as the shouting guy, and talked about how heat maps allowed him so see what the video demonstrated in a much more intuitive way. But in turn, these require accurate measurement and quantification of performance: not just "I/O sucks" but "the whole op takes 10 ms, 1 of which is CPU and 9 of which is latency."

Some assumptions to avoid when dealing with metrics:

The available metrics are correctly implemented. Are you sure there's not a kernel bug in how something is measured? He's come across them.
The available metrics are designed by performance experts. Mostly, they're kernel developers who were trying to debug their work, and found that their tool shipped.
The available metrics are complete. Unless you're using DTrace, you simply won't always find what you're looking for.

He's not a big fan of using IOPS to measure performance. There are a lot of questions when you start talking about IOPS. Like what layer?

app
library
sync call
VFS
filesystem
RAID
device

(He didn't add political and financial, but I think that would have been funny.)

Once you've got a number, what's good or bad? The number can change radically depending on things like library/filesystem prefetching or readahead (IOPS inflation), read caching or write cancellation (deflation), the size of a read (he had an example demonstrating how measured capacity/busy-ness changes depending on the size of reads)...probably your company's stock price, too. And iostat or your local equivalent averages things, which means you lose outliers...and those outliers are what slow you down.

IOPS and bandwidth are good for capacity planning, but latency is a much better measure of performance.

And what's the best way of measuring latency? That's right, heatmaps. Coming from someone who worked on Fishworks, that's not surprising, but he made a good case. It was interesting to see how it's as much art as science...and given that he's exploiting the visual cortex to make things clear that never were, that's true in a few different ways.

This part of the presentation was so visual that it's best for you to go view the recording (and anyway, my notes from that part suck).

During the break, I talked with someone who had worked at Nortel before it imploded. Sign that things were going wrong: new execs come in (RUMs: Redundant Unisys Managers) and alla sudden everyone is on the chargeback model. Networks charges ops for bandwidth; ops charges networks for storage and monitoring; both are charged by backups for backups, and in turn are charged by them for bandwidth and storage and monitoring.

The guy I was talking to figured out a way around this, though. Backups had a penalty clause for non-performance that no one ever took advantage of, but he did: he requested things from backup and proved that the backups were corrupt. It got to the point where the backup department was paying his department every month. What a clusterfuck.

After that, a quick trip to the vendor area to grab stickers for the kids, then back to the presentations.

Next was the 2nd day of Practice and Experience Reports ("Lessons Learned"). First up was the network admin (?) for ARIN about IPv6 migration. This was interesting, particularly as I'd naively assumed that, hey, they're ARIN and would have no problems at all on this front...instead of realizing that they're out in front to take a bullet for ALL of us, man. Yeah. They had problems, they screwed up a couple times, and came out battered but intact. YEAH!

Interesting bits:

Routing is not as reliable, not least because for a long time (and perhaps still) admins were treating IPv6 as an experiment, something still in beta: there were times when whole countries in Europe would disappear oft the map for weeks as admins tried out different things.
Understanding ICMPv6 is a must. Naive assumptions brought over from IPv4 firewalls like "Hey, let's block all ICMP except ping" will break things in wonderfully subtle ways. Like: it's up to the client, not the router, to fragment packets. That means the client needs to discover the route MTU. That depends on ICMPv6.
Not all transit is equal; ask your vendor if they're using the same equipment to route both protocols, or if IPv6 is on the old, crappy stuff they were going to eBay. Ask if they're using tunnels; tunnels aren't bad in themselves, but can add multiple layers to things and make things trickier to debug. (This goes double if you've decided to firewall ICMPv6...)
They're very happy with OpenBSD's pf as an IPv6 firewall.
Dual-stack OSes make things easy, but can make policy complex. Be aware that DHCPv6 is not fully supported (and yes, you need it to hand out things like DNS and NTP), and some clients (believe he said XP) would not do DNS lookups over v6 -- only v4, though they'd happily go to v6 servers once they got the DNS records.
IPv6 security features are a double-edged sword: yes, you can set up encrypted VPNs, but so can botnets. Security vendors are behind on this; he's watching for neat tricks that'll allow you to figure out private keys for traffic and thus decrypt eg. botnet C&C, but it's not there yet. (My notes are fuzzy on this, so I may have it wrong.)
Multicast is an attack-and-discovery protocol, and he's a bit worried about possible return of reflection attacks (Smurfv6). He's hopeful that the many, many lessons learned since then mean it won't happen, but it is a whole new protocol nd set of stacks for baddies to explore and discover. (RFC 4942 was apparently important...)
Proxies are good for v4-only hosts: mod_proxy, squid and 6tunnel have worked well (6tunnel in particular).
Gotchas: reverse DNS can be painful, because v6 macros/generate statements don't work in BIND yet; IPv6 takes precedence in most cases, so watch for SSH barfing when it suddenly starts seeing new hosts.

Next up: Internet on the Edge, a good war story about bringing wireless through trees for DARPA that won best PER. Worth watching. (Later on, I happened across the person who presented and his boss in the elevator, and I congratulated him on his presentation. "See?" says his boss, and digs him in the ribs. "He didn't want to present it.")

Finally there was the report from one of the admins who helped set up Blue Gene 6, purchased from IBM. (The speaker was much younger than the others: skinny, pale guy with a black t-shirt that said GET YOUR WAR ON. "If anyone's got questions, I'm into that...") This report was extremely interesting to me, especially since I've got an upcoming purchase for a (much, much smaller) cluster coming up.

Blue Gene is a supercomputer with something like 10k nodes, and it uses 10GB/s Myrinet/Myricom (FIXME: Clarify which that is) cards/network for communication. Each node does source routing, and so latency is extremely low, throughput correspondingly high, and core routers correspondingly simple. To make this work, every card needs to have a map of the network so they know where to send stuff, and that map needs to be generated by a daemon that then distributes the map everywhere. Fine, right? Wrong:

The Myricom switch is admin'd by a web interface only: no CLI of any sort, no logging to syslog, nothing. Using this web interface becomes impractical when you've got thousands of nodes...
There's an inherent fragility in this design: a problem with a card means you need to turn off the whole node; a problem with the mapping daemon means things can get corrupt real quick.

And guess what? They had problems with the cards: a bad batch of transceivers meant that, over the 2-year life of the machine, they lost a full year's worth of computing. It took a long time to realize the problem, it took a long time to get the vendor to realize it, and it took longer to get it fixed (FIXME: Did he ever get it fixed?)

So, lessons learned:

Vendor relations should not start with a problem. If the first time you call them up is to say "Your stuff is breaking", you're doomed. Dealing with vendor problems calls for social skills first, and tech skills second. Get to know than just the sales team; get familiar with the tech team before you need them.
Know your systems inside and out before they break; part of their problem was not being as familiar with things as they should have been.
Have realistic expectations when someone says "We'll give you such a deal on this equipment!" That's why they went w/Myricom -- it was dirt cheap. They saved money on that, but it would have been better spent on hiring more people. (I realize that doesn't exactly make sense, but that's what's in my notes.)
Don't pay the vendor 'til it works. Do your acceptance testing, but be aware of subcontractor relations. In this case, IBM was providing Blue Gene but had subcontracted Myricom -- and already paid them. Oops, no leverage. (To be fair, he said that Myricom did help once they were convinced...but see the next point.)
Have an agreement in advance with your vendor about how much failure is too much. In their case, the failure rate was slow but steady, and Myricom kept saying "Oh, let's just let it shake out a little longer..." It took a lot of work to get them to agree to replace the cards.
Don't let vendors talk to each other through you. In their case, IBM would tell them something, and they'd have to pass that on to Myricom, and then the process would reverse. There were lots of details to keep track of, and no one had the whole picture. Setting up a weekly phone meeting with the vendors helped immensely.
Don't wait for the vendors to do your work. Don't assume that they'll troubleshoot something for you.
Don't buy stuff with a web-only interface. Make sure you can monitor things. (I'm looking at you, Dell C6500.)
Stay positive at all costs! This was a huge, long-running problem that impaired an expensive and important piece of equipment, and resisting pessimism was important. Celebrate victories locally; give positive feedback to the vendors; keep reminding everyone that you are making progress.

Question from me: How much of this advice depends on being involved in negotiations? Answer: maybe 50%; acceptance testing is a big part of it (and see previous comments about that) but vendor relations is the other part.

I was hoping to talk to the presenter afterward, but it didn't happen; there were a lot of other people who got to him first. :-) But what I heard (and heard again later from Victor) confirmed the low opinion of the Myrinet protocol/cards...man, there's nothing there to inspire confidence.

And after that came the talk by Adam Moskowitz on becoming a senior sysadmin. It was a list of (at times strongly) suggested skills -- hard, squishy, and soft -- that you'll need. Overarching all of it was the importance of knowing the business you're in and the people you're responsible to: why you're doing something ("it supports the business by making X, Y and Z easier" is the correct answer; "it's cool" is not) , explaining it to the boss and the boss' boss, respecting the people you work with and not looking down on them because they don't know computers. Worth watching.

That night, Victor, his sister and I drove up to San Francsisco to meet Noah and Sarah at the 21st Amendment brewpub. The drive took two hours (four accidents on the way), but it was worth it: good beer, good food, good friends, great time. Sadly I was not able to bring any back; the Noir et Blanc was awesome.

One good story to relate: there was an illustrator at the party who told us about (and showed pictures of) a coin she's designing for a client. They gave her the Three Wolves artwork to put on the coin. Yeah.

Footnotes:

Name (required)
Email (required)
Website
The site is named after Saint ________ the Carpeted: (required)
Comment

Carousel is a LIE!

Anyone who's anyone

Add a comment:

Related Posts

QRP weekend 08 Oct 2018

Open Source Cubesat Workshop 2018 03 Oct 2018

mpd crash? try removing files in /var/lib/mpd/ 11 Aug 2018