Deep thoughts

I've been listening to the presentations from LISA07, and I have a few observations.

Trey Darley's presentation reminded me a lot of my last job, but much more intense: fast growth, no control, and no budget. The difference is that he had the experience and the chops to deal with it well. Also, if he can present at LISA, so can I.

Andrew Hume's presentation, "No Terabyte Left Behind", was interesting, by which I mean frightening. People mostly just trust that hardware does what it says it does/will do when it comes to storage. But that doesn't always work: he tells the story of a prof he worked with who checksummed all his files once a week. When a checksum changed — and it did about every 6 months — he'd retrieve it from backup. His rough guess for undetectable errors: 1 per 10 terabyte-years. And we're getting to the point where that's going to be significant very soon.

Tony Cass' presentation on grid computing for CERN was fascinating. This is the place I wanted to work (though as a particle physicist). UBC/TRIUMF is doing some work for this project as well, which makes me think I should jump over.

David Josephson's presentation was interesting, as much for the Q&A afterward as for his point. Which was? Glad you asked: that focussing on IP-based spam filtering (RBLs, greylisting) provides an incentive to spammers to hijack network prefixes via BGP attacks, and generally do nasty things to the Internet; please switch to content-based filtering post-haste. (To clarify, he was talking in particular about fast naive Bayesian classifiers, not SpamAssassin.) Since IP-based filtering treats IPs as valuable things — tokens that demonstrate your email is worth accepting — spammers steal IP addresses.

I'm not sure how much I buy his argument; he kept promising that the BGP attacks he described were only part of the problem, but he never seemed to get beyond that. But during the Q&A Brad Knowles got up and said (my summary) Content filtering doesn't scale, at least in his experience (as Senior Internet Mail Systems Administrator for AOL). At that point, another guy got up and said (again, my summary) that sort of thing is heard all the time, but with no data to back it up. The responder had co-authored a paper with Josephson that got Best Paper award at LISA '04, and they'd made damn sure to include a ton of footnotes. If their conclusions were wrong, people were free to challenge them; if Knowle's were wrong, they were unchallengeable because there was no data to back it up -- it was all just story that got passed along and became myth.

Knowles' response was "I don't have time to write papers; I'm a technician, not an academic." Which is true, in lots of ways. And I don't mean any insult to Knowles; he's done things I will probably never match, we are all flooded with work, and so on. I'm one guy, working at a small shop, with none of his experience, or chops, or rep, or audience.

But there's a reason my .signature says "Because the plural of Anecdote is Myth": it's to remind me that unless you can back something up with facts, preferably written down and logged and repeatable, all you've got is a bunch of stories that become more and more True the more you repeat them.

It's obnoxious to sneer and say, "Cite, please"; it's worse to be ignorant.

Lots more listening to do. If you haven't downloaded them yet, you really should.