Electric Version07 Dec 2006
Sound of tires, sound of God...
"Electric Version", The New Pornographers.
Thursday morning came far too early. My roommate offered some of his 800mg Ibuprofins, and I accepted. First thing I attended was the presentation "Drowning in the Data Tsunami" by Lee Damon and Evan Marcus. It was interesting, but seemed to be mostly about US data regulations (HIPPA/SOX et al.) and wasn't really relevant to me. I had been expecting more of an outline of, say, how in God's name we're going to preserve information for, say, a hundred years (heroic efforts of the Internet Archive notwithstanding). There was mention of an interesting approach to simply not accumulating cruft as you upgrade storage (because it's easier than sorting through to see what can be discarded; "Why bother weeding out 200MB when the new disk is 800GB?"): a paper by Radia Perlman (sp?) (she of OSPF fame) that proposes an encrypted data storage system (called The Ephemerizer) combined with key escrow that, to expire data, simply deletes the key when the time is up. Still, I moved on before too long.
...Which was good, because I sat in on Alva Couch's presentation on his and Mark Burgess' paper, "Modelling Next-Generation Configuration Management Tools". Some very, very confusing stuff about aspects, promises and closures -- confusing because the bastard didn't preface his talk with "This is what Hugh from Vancouver will need to know to understand this." (May be in the published paper; will check later.) Here's what I could gather:
- System administration could be described as the Pinky and the Brain problem: "What are we going to do tonight, Pinky?" "Same thing we do every night, Brain: try to take over the world!"
- IOW, the problem is too big -- and in the meantime you have all these competing theories (aspects from Luke/Puppet (I think), promise theory from Burgess (which I had heard about) and closures from the bcfg2 people) that need to be integrated, but currently aren't.
- Many tools model/modify configuration, not behaviour -- and implicit in there is the (unproven?) assumption that correct behaviour emerges from correct configuration as if by magic. There is no understanding in cfengine of outside forces.
- A promise, in sysadmin terms, is promise to do something. For example, an NFS server promises to make certain files available over the network. A client mounting a disk from the server promises to access some of those files.
- Closure is the whole of the problem: in the case of the NFS server, it's DNS plus routing plus mountd running plus nfsd running plus proper ACLs (which I only found out at this conference that nearly everyone pronounces "ackles" rather than "ay see ells").
- His model: closures encompass promises encompass aspects. By dividing up the problem this way, you no longer have to take over the whole world.
- His model accounts for site policy by designating it a soft aspect.
I will do the right thing and read his paper, and I may update this later; these are just my notes and impressions, and aren't gospel. Couch is an incredibly enthusiastic speaker, and even though I didn't understand a lot of it I ended up excited anyway. :-) He gave another talk later in the week that Ricky went to, about how system administration will have to become more automatic; as a result, we'd all better learn how to think high-level and to be better communicators, because more and more of our stuff will be management -- and not just in the sense of managing computers. I'm going to seek out more of his stuff and see if it'll fit in my head.
After the break was a talk on "QA and the System Administrator", presented by a Google sysadmin. I went because it was Google, and frankly it wasn't that interesting. One thing that did jump out at me was when he described a Windows tool called Eggplant, a QA/validation tool. It has OCR built-in to recognize a menu, no matter where it is on the screen. This astounded me; when you start needing OCR to script things, that's broken. I don't doubt that it's a good tool, and I can think of lots of ways that would come in handy. But come on. I mean, a system that requires that is just so ugly.
I went out to lunch with Jay, a sysadmin from a shop that's just got permission from the boss to BSD a unit-testing program they've come up with for OpenBSD firewalls: it uses QEMU instances to fully test a firewall with production IP addresses, making sure that you're blocking and allowing everything you want. It sounds incredibly cool, and he's promised to send me a copy when he gets back. I can't wait to have a look at it.
After that was the meet-the-author session. I got to thank Tom Limoncelli for "Time Management for System Administrators", and got an autograph sticker from him and Strata Rose Chalup, his co-author for Ed 2. Sadly, I didn't get a chance to thank Tobias Oetiker (who I nearly ran into at lunch the day before).
Next up was the talk from Tom Limoncelli and Adam Moskovitz (Adam's looking for a job! Somebody hire him!) about how to get your paper accepted at LISA. Probably basic stuff if you've written a paper before, but I haven't so it was good to know. Thing like how to write a good abstract, what kind of paper is good for LISA, and how you shouldn't say things like "...and if our paper is accepted, we'll start work right away on the solution." Jay asked whether a paper on the pf testing tool would be good, and they both nodded enthusiastically.
- When talking about papers that go over the same subject, a paper from a previous LISA was mentioned that surveyed 8 years of papers on data storage and found identifiable cycles from "Oh no, we've got more data than disks!" to "Oh no, we've got more data than tape!" (This made me feel better about skipping out on the 9am talk.)
- Apparently, Sun reimplemented cat(1) and improved performance 10x.
Quotes from the talk:
- Tom: "You're not supposed to publish your paper on your website until it's published at LISA. And if you're cool, you'll do that with a cron job."
- From an audience member at another conference presentation: "At any point, did you step back and look at your work? And if so, were you sufficiently disgusted?"
- Tom again, on how audience criticism is a good thing: "Every theory paper needs someone to go up to the mic and say, 'Okay, Buck Rogers, but I live in reality.'"
At this point I started getting fairly depressed. Part of it was just being tired, but I kept thinking that not only could I not think of something to write a paper about, I could not think of how I'd get to find something to write about. I wandered over to the next talk feeling rather sad and lost.
The next talk was from Andy Seely on being a sysadmin in US Armed Forces Command and Control. Jessica was there, and we chatted a bit about how this talk conflicted with Tom Limoncelli's Time Management Guru session, and maybe ducking over to see that. Then Andy came over and asked Jessica to snap some picture, so she ended up staying. I was prepared to give it five minutes before deciding whether or not to leave.
Well, brother, let me tell you: Andy Seely is one of the best goddamned speakers on the planet. He was funny, engaging, and I could no more leave the room than I could get my jaw to undrop. Not only that, his talk was fascinating, and not just because he's a sysadmin for the US Armed Forces while simultaneously having a ponytail, earrings and tattoos. You can read the article in ;login: (FIXME: Add link) that it was based on, but he expanded on it considerably. Let me see what I can recall:
- One slide, a computer display of a map of the Middle East with lots of dots: "This is a map of people dying." This is what a screw-up or a service outage means in his job: people across the planet die.
- "There are databases where you can search on anyone in Afghanistan named Mohammed. It's an entertaining database optimization problem, let me tell you."
- On deadlines: "The more you work with government, the more you find dates...well, they're filled with humour."
- "We've got headquarters with systems everywhere -- no surprise, where haven't we invaded yet?" (laughter) I yell out "Canada!" "We're thinking about it. But we're looking for some place that'll fight back." (more laughter) "I'm sorry, that came out wrong. But it was funny." This made it to IRC, which prompted Ricky and others to ditch what they were doing and come over to this talk. (I met Andy later on and he apologized profusely, saying that he meant Canada was an ally, so why would the US invade them in the first place? We had a duel, my shot grazed his shoulder, and Canada's honour was regained.)
- Having to support an app where there's strong debate over whether it's written in C, Ada or Java, or whether it uses UDP or TCP.
- Being told that an app that keeps failing is single-threaded, so throwing more CPUs at it won't do anything; it's RAM that it needs. Later investigation confirms that, in fact, it's multi-threaded and needs more CPUs, not RAM...which the vendor eventually confirms.
- He can't install a compiler, or a debugger, or anything that doesn't come with a default install of Solaris 8, or 7, or 2.x. That would be a huge security offence.
- A Sun E4000 mainboard blows up in the Middle East. Getting one through regular channels would take too long, so where do you go? That's right: Ebay. He's a contractor, so he has no budget...but he does have a government credit card with a $2500 limit. So he calls up the guy selling it and cuts a deal to buy the thing for $2500 (shipping was billed separately). Put it on a C130, and off she goes.
- Not being allowed to write a program...but he is allowed to string shell commands together...and sometimes those commands get written down in a file for reference purposes. If he's lucky, Perl's on the machine as well.
Longer story: Because of the nature of his work, he's got boxes that he has to keep working when he knows next to nothing about what they're meant to do. Case in point: a new Sun box arrives ("and it's literally painted black!"), but the person responsible for it wants to send it back because it doesn't work -- which means that when they click the icon to start the app it's meant to run, it doesn't launch and there's no visible sign that it's running. There's no documentation. And yet he's obligated to support this application. What do you do?
Even tracking down the path to the program launched by the icon is a challenge, but he does, tracks down the nested shell scripts and finally finds the jar that is the app ("Aha! It is Java!"). He finds log files which are verbose but useless. He contacts the company that wrote it, and is told he needs a support contract...which the government, when putting together the contract for the thing, did not think to include. So he calls back an hour later, talks to the help desk and tells them he's lost the number -- "Can you help a brother out?" They do, but they're stumped as well, and say they've never seen anything like this.
Time to pull out truss, which produces a huge amount of output. Somewhere in the middle of all that he notices a failing hard read of a file in /bin: it was trying to read 6 bytes and failing. Turns out the damned thing was trying to keep state in /bin, and failing because the file was zero bytes long. He removed the file, and suddenly the app works.
Andy also talked about trying to get a multiple GB dump file from Florida to Qatar. Physical transport was not an option, because arranging it would take too long. So he tries FTPing the file -- which works until he goes home for the day, at which point the network connection goes down and he loses a day. So he writes a Perl script that divides the file into 300MB chunks, then sends those one at a time. It works!
At this point, someone yells out "What about split?" Andy says, "What?" He hadn't known about it. There was a lot of good-natured laughter. He asked, "Is there an unsplit?" "Cat!" came the response from all over the room. He smacked his forehead and laughed. "This is why I come to LISA," he said. "At my job, I've been there 10 years. People come to me 'cos I'm the smart one. Here, I'm the dumb one. I love that."
There are two things I would like to say at this point.
First off, Andy is at least the tenth coolest person on the entire Eastern seaboard. No, he didn't know about cat -- but not only did he reimplement it in Perl rather than give up, he didn't even flinch when being told about it in the middle of giving a talk at LISA. I would probably have self-combusted from embarassment ("foomp!"), and I would have felt awful. Andy's attitude? "I learned something." That's incredibly strong. (Although he told a story later about being in the elevator with some Google people. They recognized him and said, "Hey, it's the 'man cat' guy!")
Second, when he said, "Here, I'm the dumb one. I love that" I sat up straight and thought, "Holy shit, he's right." Here I am at LISA for the first time ever. I've met people who can help me, and people I can help. I've made a crapload of new friends and have learned more in one week than I would've thought possible. And I'm worried 'cos it might be a few years before I can think about presenting a paper? That's messed up. I tend to set unreasonably high goals for myself and then get depressed when I can't reach them. Andy's statement made me feel a whole lot better.
During Q & A I asked what he did for peer support, since his ability to (say) post to a mailing list asking for help must be pretty restricted. He said that he's started a wiki for internal use and it's getting used...but both the culture and the job function mean that it's slow going. He's also started a conference for fellow sysadmins: 100 or so this year, and he's hoping for more next year.
In conclusion: if you ever get the chance to go see him, do so. And then buy him a beer.