Sound of tires, sound of God...
"Electric Version", The New Pornographers.
Thursday morning came far too early. My roommate offered some of
his 800mg Ibuprofins, and I accepted. First thing I attended was the
presentation "Drowning in the Data Tsunami" by Lee Damon and Evan
Marcus. It was interesting, but seemed to be mostly about US data
regulations (HIPPA/SOX et al.) and wasn't really relevant to me. I had
been expecting more of an outline of, say, how in God's name we're
going to preserve information for, say, a hundred years (heroic
efforts of the Internet Archive notwithstanding). There was mention of
an interesting approach to simply not accumulating cruft as you
upgrade storage (because it's easier than sorting through to see what
can be discarded; "Why bother weeding out 200MB when the new disk is
800GB?"): a paper by Rhadia Perlman (sp?) (she of OSPF fame) that
proposes an encrypted data storage system combined with key escrow
that, to expire data, simply deletes the key when the time is
up. Still, I moved on before too long.
...Which was good, because I sat in on Alva Couch's presentation on
his and Mark Burgess' paper, "Modelling Next-Generation Configuration
Management Tools". Some very, very confusing stuff about aspects,
promises and closures -- confusing because the bastard didn't preface
his talk with "This is what Hugh from Vancouver will need to know to
understand this." (May be in the published paper; will check later.)
Here's what I could gather:
- System administration could be described as the Pinky and the Brain problem: "What are we going to do tonight, Pinky?" "Same thing we do every night, Brain: try to take over the world!"
- IOW, the problem is too big -- and in the meantime you have all these competing theories (aspects from Luke/Puppet (I think), promise theory from Burgess (which I had heard about) and closures from the bcfg2 people) that need to be integrated, but currently aren't.
- Many tools model/modify configuration, not behaviour -- and implicit in there is the (unproven?) assumption that correct behaviour emerges from correct configuration as if by magic. There is no understanding in cfengine of outside forces.
- A promise, in sysadmin terms, is promise to do something. For example, an NFS server promises to make certain files available over the network. A client mounting a disk from the server promises to access some of those files.
- Closure is the whole of the problem: in the case of the NFS server, it's DNS plus routing plus mountd running plus nfsd running plus proper ACLs (which I only found out at this conference that nearly everyone pronounces "ackles" rather than "ay see ells").
- His model: closures encompass promises encompass aspects. By dividing up the problem this way, you no longer have to take over the whole world.
- His model accounts for site policy by designating it a soft aspect.
I will do the right thing and read his paper, and I may update this
later; these are just my notes and impressions, and aren't
gospel. Couch is an incredibly enthusiastic speaker, and even though
I didn't understand a lot of it I ended up excited anyway. :-) He gave
another talk later in the week that Ricky went to, about how system
administration will have to become more automatic; as a result, we'd
all better learn how to think high-level and to be better
communicators, because more and more of our stuff will be management
-- and not just in the sense of managing computers. I'm going to seek
out more of his stuff and see if it'll fit in my head.
After the break was a talk on "QA and the System Administrator",
presented by a Google sysadmin. I went because it was Google, and
frankly it wasn't that interesting. One thing that did jump out at me
was when he described a Windows tool called Eggplant, a QA/validation
tool. It has OCR built-in to recognize a menu, no matter where it is
on the screen. This astounded me; when you start needing OCR to script
things, that's broken. I don't doubt that it's a good tool, and I
can think of lots of ways that would come in handy. But come on. I
mean, a system that requires that is just so ugly.
I went out to lunch with Jay, a sysadmin from a shop that's just got
permission from the boss to BSD a unit-testing program they've come up
with for OpenBSD firewalls: it uses QEMU instances to fully test a
firewall with production IP addresses, making sure that you're
blocking and allowing everything you want. It sounds incredibly
cool, and he's promised to send me a copy when he gets back. I can't
wait to have a look at it.
After that was the meet-the-author session. I got to thank Tom
Limoncelli for "Time Management for System Administrators", and got an
autograph sticker from him and Strata Rose Chalup, his co-author for
Ed 2. Sadly, I didn't get a chance to thank Tobias Oetiker (who I
nearly ran into at lunch the day before).
Next up was the talk from Tom Limoncelli and Adam Moskovitz (Adam's
looking for a job! Somebody hire him!) about how to get your paper
accepted at LISA. Probably basic stuff if you've written a paper
before, but I haven't so it was good to know. Thing like how to write
a good abstract, what kind of paper is good for LISA, and how you
shouldn't say things like "...and if our paper is accepted, we'll
start work right away on the solution." Jay asked whether a paper on
the pf testing tool would be good, and they both nodded
enthusiastically.
Must Google:
- When talking about papers that go over the same subject, a paper from a previous LISA was mentioned that surveyed 8 years of papers on data storage and found identifiable cycles from "Oh no, we've got more data than disks!" to "Oh no, we've got more data than tape!" (This made me feel better about skipping out on the 9am talk.)
- Apparently, Sun reimplemented cat(1) and improved performance 10x.
Quotes from the talk:
- Tom: "You're not supposed to publish your paper on your website until it's published at LISA. And if you're cool, you'll do that with a cron job."
- From an audience member at another conference presentation: "At any point, did you step back and look at your work? And if so, were you sufficiently disgusted?"
- Tom again, on how audience criticism is a good thing: "Every theory paper needs someone to go up to the mic and say, 'Okay, Buck Rogers, but I live in reality.'"
At this point I started getting fairly depressed. Part of it was just
being tired, but I kept thinking that not only could I not think of
something to write a paper about, I could not think of how I'd get to
find something to write about. I wandered over to the next talk
feeling rather sad and lost.
The next talk was from Andy Seely on being a sysadmin in US Armed
Forces Command and Control. Jessica was there, and we chatted a bit
about how this talk conflicted with Tom Limoncelli's Time Management
Guru session, and maybe ducking over to see that. Then Andy came over
and asked Jessica to snap some picture, so she ended up staying. I was
prepared to give it five minutes before deciding whether or not to
leave.
Well, brother, let me tell you: Andy Seely is one of the best
goddamned speakers on the planet. He was funny, engaging, and I could
no more leave the room than I could get my jaw to undrop. Not only
that, his talk was fascinating, and not just because he's a sysadmin
for the US Armed Forces while simultaneously having a ponytail,
earrings and tattoos. You can read the article in ;login: (FIXME: Add
link) that it was based on, but he expanded on it considerably. Let me
see what I can recall:
- One slide, a computer display of a map of the Middle East with lots of dots: "This is a map of people dying." This is what a screw-up or a service outage means in his job: people across the planet die.
- "There are databases where you can search on anyone in Afghanistan named Mohammed. It's an entertaining database optimization problem, let me tell you."
- On deadlines: "The more you work with government, the more you find dates...well, they're filled with humour."
- "We've got headquarters with systems everywhere -- no surprise, where haven't we invaded yet?" (laughter) I yell out "Canada!" "We're thinking about it. But we're looking for some place that'll fight back." (more laughter) "I'm sorry, that came out wrong. But it was funny." This made it to IRC, which prompted Ricky and others to ditch what they were doing and come over to this talk. (I met Andy later on and he apologized profusely, saying that he meant Canada was an ally, so why would the US invade them in the first place? We had a duel, my shot grazed his shoulder, and Canada's honour was regained.)
- Having to support an app where there's strong debate over whether it's written in C, Ada or Java, or whether it uses UDP or TCP.
- Being told that an app that keeps failing is single-threaded, so throwing more CPUs at it won't do anything; it's RAM that it needs. Later investigation confirms that, in fact, it's multi-threaded and needs more CPUs, not RAM...which the vendor eventually confirms.
- He can't install a compiler, or a debugger, or anything that doesn't come with a default install of Solaris 8, or 7, or 2.x. That would be a huge security offence.
- A Sun E4000 mainboard blows up in the Middle East. Getting one through regular channels would take too long, so where do you go? That's right: Ebay. He's a contractor, so he has no budget...but he does have a government credit card with a $2500 limit. So he calls up the guy selling it and cuts a deal to buy the thing for $2500 (shipping was billed separately). Put it on a C130, and off she goes.
- Not being allowed to write a program...but he is allowed to string shell commands together...and sometimes those commands get written down in a file for reference purposes. If he's lucky, Perl's on the machine as well.
Longer story: Because of the nature of his work, he's got boxes that
he has to keep working when he knows next to nothing about what
they're meant to do. Case in point: a new Sun box arrives ("and it's
literally painted black!"), but the person responsible for it wants to
send it back because it doesn't work -- which means that when they
click the icon to start the app it's meant to run, it doesn't launch
and there's no visible sign that it's running. There's no
documentation. And yet he's obligated to support this
application. What do you do?
Even tracking down the path to the program launched by the icon is a
challenge, but he does, tracks down the nested shell scripts and
finally finds the jar that is the app ("Aha! It is Java!"). He finds
log files which are verbose but useless. He contacts the company that
wrote it, and is told he needs a support contract...which the
government, when putting together the contract for the thing, did not
think to include. So he calls back an hour later, talks to the help
desk and tells them he's lost the number -- "Can you help a brother
out?" They do, but they're stumped as well, and say they've never seen
anything like this.
Time to pull out truss, which produces a huge amount of
output. Somewhere in the middle of all that he notices a failing hard
read of a file in /bin: it was trying to read 6 bytes and
failing. Turns out the damned thing was trying to keep state in
/bin, and failing because the file was zero bytes long. He removed the
file, and suddenly the app works.
Andy also talked about trying to get a multiple GB dump file from
Florida to Qatar. Physical transport was not an option, because
arranging it would take too long. So he tries FTPing the file -- which
works until he goes home for the day, at which point the network
connection goes down and he loses a day. So he writes a Perl script
that divides the file into 300MB chunks, then sends those one at a
time. It works!
At this point, someone yells out "What about split?" Andy says,
"What?" He hadn't known about it. There was a lot of good-natured
laughter. He asked, "Is there an unsplit?" "Cat!" came the response
from all over the room. He smacked his forehead and laughed. "This is
why I come to LISA," he said. "At my job, I've been there 10
years. People come to me 'cos I'm the smart one. Here, I'm the dumb
one. I love that."
There are two things I would like to say at this point.
First off, Andy is at least the tenth coolest person on the entire
Eastern seaboard. No, he didn't know about cat -- but not only did he
reimplement it in Perl rather than give up, he didn't even flinch when
being told about it in the middle of giving a talk at LISA. I would
probably have self-combusted from embarassment ("foomp!"), and I would
have felt awful. Andy's attitude? "I learned something." That's
incredibly strong. (Although he told a story later about being in the
elevator with some Google people. They recognized him and said, "Hey,
it's the 'man cat' guy!")
Second, when he said, "Here, I'm the dumb one. I love that" I sat up
straight and thought, "Holy shit, he's right." Here I am at LISA for
the first time ever. I've met people who can help me, and people I can
help. I've made a crapload of new friends and have learned more in one
week than I would've thought possible. And I'm worried 'cos it might
be a few years before I can think about presenting a paper? That's
messed up. I tend to set unreasonably high goals for myself and then
get depressed when I can't reach them. Andy's statement made me feel a
whole lot better.
During Q & A I asked what he did for peer support, since his ability
to (say) post to a mailing list asking for help must be pretty
restricted. He said that he's started a wiki for internal use and it's
getting used...but both the culture and the job function mean that
it's slow going. He's also started a conference for fellow sysadmins:
100 or so this year, and he's hoping for more next year.
In conclusion: if you ever get the chance to go see him, do so. And
then buy him a beer.