I've been listening to the presentations from LISA07, and I have
a few observations.
Trey Darley's presentation reminded me a lot of my last job, but
much more intense: fast growth, no control, and no budget. The
difference is that he had the experience and the chops to deal with it
well. Also, if he can present at LISA, so can I.
Andrew Hume's presentation, "No Terabyte Left Behind", was
interesting, by which I mean frightening. People mostly just trust
that hardware does what it says it does/will do when it comes to
storage. But that doesn't always work: he tells the story of a prof he
worked with who checksummed all his files once a week. When a checksum
changed — and it did about every 6 months — he'd retrieve it from
backup. His rough guess for undetectable errors: 1 per 10
terabyte-years. And we're getting to the point where that's going to
be significant very soon.
Tony Cass' presentation on grid computing for CERN was
fascinating. This is the place I wanted to work (though as a particle
physicist). UBC/TRIUMF is doing some work for this project as
well, which makes me think I should jump over.
David Josephson's presentation was interesting, as much for the
Q&A afterward as for his point. Which was? Glad you asked: that
focussing on IP-based spam filtering (RBLs, greylisting) provides an
incentive to spammers to hijack network prefixes via BGP attacks, and
generally do nasty things to the Internet; please switch to
content-based filtering post-haste. (To clarify, he was talking in
particular about fast naive Bayesian classifiers, not SpamAssassin.)
Since IP-based filtering treats IPs as valuable things — tokens that
demonstrate your email is worth accepting — spammers steal IP
addresses.
I'm not sure how much I buy his argument; he kept promising that the
BGP attacks he described were only part of the problem, but he never
seemed to get beyond that. But during the Q&A Brad Knowles
got up and said (my summary) Content filtering doesn't scale, at
least in his experience (as Senior Internet Mail Systems Administrator
for AOL). At that point, another guy got up and said (again, my
summary) that sort of thing is heard all the time, but with no data
to back it up. The responder had co-authored a paper with Josephson
that got Best Paper award at LISA '04, and they'd made damn sure to
include a ton of footnotes. If their conclusions were wrong, people
were free to challenge them; if Knowle's were wrong, they were
unchallengeable because there was no data to back it up — it was all
just story that got passed along and became myth.
Knowles' response was "I don't have time to write papers; I'm a
technician, not an academic." Which is true, in lots of ways. And I
don't mean any insult to Knowles; he's done things I will probably
never match, we are all flooded with work, and so on. I'm one guy,
working at a small shop, with none of his experience, or chops, or
rep, or audience.
But there's a reason my .signature says "Because the plural of
Anecdote is Myth": it's to remind me that unless you can back
something up with facts, preferably written down and logged and
repeatable, all you've got is a bunch of stories that become more and
more True the more you repeat them.
It's obnoxious to sneer and say, "Cite, please"; it's worse to be
ignorant.
Lots more listening to do. If you haven't downloaded them yet, you
really should.