LISA again! This is the fifth? (Washington, Baltimore, San Diego, San Jose, Seattle, and now Boston) sixth! that I've been to. Saturday's flight in was fairly uneventful, except a) it didn't bother my sciatica too much, so yey and b) I forgot my coat on the plane, and it doesn't look like Air Canada has a working system to take "Hey, did you see a coat?" calls.
Fortunately I can count on the kindness of Andy Seely, who brought an extra coat and loaned it to me. For his kindness I have given him a "Taggart Transcontinental" t-shirt, and let him buy me supper. I'm nothing if not generous.
Sunday I spent the entire day at the Google SRE tutorial, which was very, very cool; a big part of it was an exercise to architect a system that would read and join logfiles. It took a long time to wrap my head around how everyone was thinking about this, but writing down the moving parts made it all a lot clearer. In the end, my team's proposal approximated the final example config presented by Google, so that was good. Final sol'n, BTW, used 101 machines. The math all worked out, but it still made my jaw drop. When I asked the presenters about this, they grinned. "We've forgotten how to count small," one of them said.
Today was spent in "Everything you ever wanted to know about operating systems but were afraid to ask", aka "Caskey's Brain Dump". It was a pretty awesome talk, covering everything from silicon through filesystems. Well worth it; I'd love a recording of it, since the slides simply don't do it justice.
Today's title from the subject line of some spam I just got. ("a spam"? "a spammy email"? just "spam"?)
Mystery flu-like illness continues, or at least its fallout; I've had lower back pain for the last ~ 4 weeks. Doctor says removing spine is "not an option" but I've done some Googling and
$WORK continues apace. After taking a week of Python training, we're using Go for a new tool we're building. Haven't got a good sense for what it's like just yet, but so far I don't seem to be making a mess of things.
Tried out drone.io at $WORK yesterday and holy god, is it good. Auth with our internal Github, then activate repos, and boom! it runs tests on every new commit on any branch, watches for PRs, the whole nine yards. When I think of the amount of work we had to do to get Jenkins to do this, it's insane. Plus the whole run-as-a-Docker-container, fire-up-sibling-docker-containers-for-tests thing is very, very impressive.
Sportsball has started up again with a vengeance: practices on Monday and Wednesday, games on Fridays and Saturdays. Somebody stop this merry-go-round!
I've registered for LISA 16, woot! This will be my fifth -- wait, sixth? -- LISA, ten years after my first time attending. Not sure who's gonna be the theme band this year -- I've done New Pornographers, Josh Rouse, Soul Coughing and Sloan. And since he's co-chair this year, it seems like a good time to pull out that picture of Matt Simmons (@standaloneSA) as a PHP dev:
This year I'm blogging for the USENIX blog, so we'll see how much I actually put up here...but the thought of going w/o updating my own just makes me sad, so here we go.
Took the bus down, which was completely uneventful and pleasant. Walked from King St station to the conference hotel, which was a bit of a hike but welcome exercise. I'm on the 25th floor and have a pretty skookum view of local neon and such. Got supper and some groceries, then went out for drinks w/Matt and his wife Amy, Pat Cable (who I'm meeting in person now for the first time), Bob and Alf, and Ken Schumacher. Good times, with lots of good teasings of Matt in as well. Missing Ben Cotton, which is a shame; the two of us could pretty much get Matt to cry if we tried hard enough.
First tutorial this AM was "Stats for Ops", and it was amazing. Discovered that using a spreadsheet is a really good skill to have. I have to learn that at some point...
And now off for next tutorial.
So my latest blog post for LISA just got posted -- and that's the last long(ish) one; next week BeerOps, Mark Lamourine and I will be posting daily updates as we're there. Also, I've volunteered to help Julie Miller, the Marketing Communications Manager for USENIX, with the opening orientation on Saturday night. I seem to remember taking that the first year I went, though I don't seem to have written it down...
By the way, shouts out to BeerOps, Mark Lamourine, Matt Simmons and Noah Meyerhans for all the help during LISA Bloggity Sprint 2014. There are beers/chocolate/what-you-owed in plentitude.
On another note: I'm auditioning a Chromebook, an Acer C720, to see how it works out. Right now I'm using Debian Jessie (testing) via Crouton, which lets you install Linux to a chroot within Chrome. So far: the keyboard is smaller than I'm used to, and the Canadian keyboard in particular is annoying -- they've crammed in tons of extra keys and split the Enter and Shift keys to do so. But overall it's okay; I can run tests for Yogurty in 3 seconds (cf. 12 on my old P3 laptop/server), and even Stellarium seems to run just fine. I've got a refurbished 4GB model on order w/Walmart in the states, and I can pick that up while I'm at LISA. So, you know, looking good.
Bridget Kromhout's latest post, The First Rule of DevOps Club, is awesome. Quote:
But when the open space opening the next day had an anecdote featuring "ops guys", I'd had enough. I went up, took the mic, and told the audience of several hundred people (of whom perhaps 98% were guys) how erased I feel when I hear that.
I said what I always think (and sometimes say) when this comes up. If you are a guy, and you like to date women, would you place a personal ad that says this? "I'd like to meet a wonderful guy to fall in love and spend my life with. This guy must like long walks on the beach and holding hands, and must also be female." If that sounds ludicrous to you, then you don't actually think "guy" is gender-neutral.
That's a small part of a much longer post; go read the rest.
Much at $WORK; I've got a new team mate from Belgium who's awesome, I'm starting to find a sense of rhythm, and organizing time is as challenging as ever. There are lots, LOTS of fun things to do, and it's damn hard sometimes to say "I'm just gonna put that on the TODO list and walk away."
This week my youngest son has switched from "The Wizard of Oz" to "Treasure Island" for story time. He got bored of TWOO and we didn't finish it; I'm curious to see how long he'll stick with TI. Still so much fun to read to them both.
Busy, yo:
This was my first week on call at $WORK, and naturally a few things came up -- nothing really huge, but enough that the rhythm I'd been slowly developing (and coming to relish) was pretty much lost. And then Friday night/Saturday morning I was paged three times (11pm, 1am and 5.30am) -- mostly minor things, but enough that I was pretty much a wreck yesterday. I'm coming to dread the sad trombone.
Besides that, I've also been blogging about the LISA14 conferencefor USENIX, along with Katherine Daniels (@beerops) and Mark Lamourine (@markllama). They've got some excellent articles up; Mark wrote about LISA workshops, and Katherine described why she's going to LISA. Awesome stuff and worth your time.
I managed to brew last week for the first time since thrice-blessed February; it's a saison (yeast, wheat malt, acidulated malt) with a crapton of homegrown hops (roughly a pound). I'm looking forward to this one.
Going to San Francisco again week after next for $WORK. (Prospective busyness.)
Kids are back to school! Youngest is in grade 1 and oldest in grade 3. Wow.
Got SSL set up for both my web and email servers; created pull request for Duraconf in the process.
Traded in a crapton of telescope eyepieces for a couple nice upgrades: 2" 31mm Antares modified Erfle (74 deg FOV), and 1.25" 17mm Antares Speers-Waler (82 deg FOV). I also got a 2" diagonal and connector for the Meade; the Dob had a 2" focuser already. I kept my 12mm Vixen (50 deg FOV) and a random 7.5mm Plossl. All this was done at Vancouver Telescope, who are an incredibly awesome bunch of people.
Ordered flocking paper for the Meade (I think the focuser tube needs it) and the Peterson EZ focus kit (satisfied customers).
Went with the family to the PNE. Pics to come!
Visited brother and his wife in Kelowna with my parents.
First LISA blog post up at the USENIX blog. (Gotta write more about that too...)
First day back at $WORK after the winter break yesterday, and some...interesting...things. Like finding out about the service that didn't come back after a power outage three weeks ago. Fuck. Add the check to Nagios, bring it up; when the light turns green, the trap is clean.
Or when I got a page about a service that I recognized as having, somehow, to do with a webapp we monitor, but no real recollection of what it does or why it's important. Go talk to my boss, find out he's restarted it and it'll be up in a minute, get the 25-word version of what it does, add him to the contact list for that service and add the info to documentation.
I start to think about how to include a link to documentation in Nagios alerts, and a quick search turns up "Default monitoring alerts are awful" , a blog post by Jeff Goldschrafe about just this. His approach looks damned cool, and I'm hoping he'll share how he does this. Inna meantime, there's the Nagios config options "notes", "notesurl" and "actionurl", which I didn't know about. I'll start adding stuff to the Nagios config. (Which really makes me wish I had a way of generating Nagios config...sigh. Maybe NConf?)
But also on Jeff's blog I found a post about Kaboli, which lets you interact with Nagios/Icinga through email. That's cool. Repo here.
Planning. I want to do something better with planning. I've got RT to catch problems as they emerge, and track them to completion. Combined with orgmode, it's pretty good at giving me a handy reference for what I'm working on (RT #666) and having the whole history available. What it's not good at is big-picture planning...everything is just a big list of stuff to do, not sorted by priority or labelled by project, and it's a big intimidating mess. I heard about Kanban when I was at LISA this year, and I want to give it a try...not suure if it's exactly right, but it seems close.
And then I came across Behaviour-driven infrastructure through Cucumber, a blog post from Lindsay Holmwood. Which is damn cool, and about which I'll write more another time. Which led to the Github repo for a cucumber/nagios plugin, and reading more about Cucumber, and behaviour-driven development versus test-driven development (hint: they're almost exactly the same thing).
My god, it's full of stars.
I'm still digesting all the stuff that came out of LISA this year. But there are a number of things I want to try out:
I learned a little bit about agile development, mainly from Geoff Halprin's training material and keynote, and it seemed interesting. One of the things that resonated with me was the idea of only having a small number of work stages for stuff: in the queue, next, working, and done. (Going without the info here, so quite possibly wrong.) I like that: work on stuff in two-week chunks, commit to that and get it done. That seems much more manageable than having stuff in the queue with no real idea of a schedule. And a two-week chunk is at least a good place to start: interruptions aren't about to go away any time soon, and I can adjust this as necessary.
A corollary is that it's probably not best to plan more than two such things in a month. I'm thinking about things like switching from Nagios to Icinga, setting up Ganeti, and such: more than I can do in an hour, less than a semester's work.
I really want to work on eliminating pain points this year. Icinga's one; Nagios' web interface is painful. (I'd also like to look at Sensu.) I want to make backups better. I want to add proper testing for Cfengine with Vagrant and Git, so I can go on more than a wing and a prayer when pushing changes.
I also need to work more closely with the faculty in my department. Part of that is committing to more manageable work, and part of that is just following through more. Part of it, though, is working with people that intimidate me, and letting them know what I can do for them.
I need to manage my time better, and I think a big part of that is interruptions. I've just been told I'm getting an office, which is a mixed blessing. There's a certain amount of flux in the office, and I've been making friends with the people around me lately. I'll miss them/that, but I think the ability to retreat and work on something is going to be valuable.
Another part of managing time is, I think/hope, a better routine. Like: one hour every day for long-term project work. (The office makes this easier to imagine.) Set times for the things I want to get done at home (where my free time comes in one-hour chunks). Deciding if I want to work on transit (I can take my laptop home with me, and it's a 90 minute commute), and how (fun projects? stuff I can't get done at work? blue-sky stuff?). If, because a. my eyes will bug out if I stare at a screen all day and b. I firmly intend to keep a limit on my work time. So it'd probably be a couple days a week, to allow time for all the podcasts and books I want to inhale.
Microboxing for productivity. Interesting stuff.
Kanban. Related to Agile, but I forgot about it. Pomodoro + Emacs + Orgmode, too.
Probably more as I think of it...but right now it's time to sleep. 5.30am comes awful early after 11 days off...
A collection of stuff that didn't fit anywhere else:
St Vidicon of Cathode. Only slightly spoiled by the disclaimer "Saint Vidicon and his story are the intellectual property of Christopher Stasheff."
A Vagrant box for OmniOS, the OpenSolaris distro I heard about at LISA.
A picture of Matt Simmons. When I took this, he was committing some PHP code and mumbling something like "All I gotta do is enable globals and everything'll be fine..."
Where I'm going, you cannot come...
"Theologians", Wilco
At 2.45am, I woke up because a) my phone was buzzing with a page from work, and b) the room was shaking. I was quite bagged, since I'd been up 'til 1 finishing yesterday's blog entry, and all I could think was "Huh...earthquake. How did Nagios know about this?" Since the building didn't seem to be falling, I went back to sleep. In the morning, I found out it was a magnitude 6.2 earthquake.
I was going to go to the presentation by the CBC on "What your CDN won't tell you" (initially read as "What your Canadian won't tell you": "Goddammit, it's prounced BOOT") but changed my mind at the last minute and went to the Cf3 "Guru is in" session with Diego Zamboni. (But not before accidentally going to the Cf3 tutorial room; I made an Indiana Jones-like escape as Mark Burgess was closing the door.) I'm glad I went; I got to ask what people are doing for testing, and got a some good hints.
Vagrant's good for testing (and also awesome in general). I'm trying to get a good routine set up for this, but I have not started using the Cf3 provider for Vagrant...because of crack? Not sure.
You might want to use different directories in your revision control; that makes it easy to designate dev, testing, and production machines (don't have to worry about getting different branches; just point them at the directories in your repo).
Make sure you can promote different branches in an automated way (merging branches, whatever). It's easy to screw this up, and it's worth taking the time to make it very, very easy to do it right.
If you've got a bundle meant to fix a problem, deliberately break a machine to make sure it actually does fix the problem.
Consider using git + gerrit + jenkins to test and review code.
The Cf3 sketch tool still looks neat. The Enterprise version looked cool, too; it was the first time I'd seen it demonstrated, and I was intrigued.
At the break I got drugs^Wcold medication from Jennifer. Then I sang to Matt:
(and the sailors say) MAAAA-AAAT you're a FIIINNE girl what a GOOOD WAAAF you would be but my life, my love and my LAY-ee-daaaay is the sea (DOOOO doo doo DOO DOO doot doooooooo)
I believe Ben has video; I'll see if it shows up.
BTW, Matt made me sing "Brandy" to him when I took this picture:
I discussed Yo Dawg Compliance with Ben ("Yo Dawg, I put an X in your Y so you could X when you Y"; == self-reference), and we decided to race each other to @YoDawgCompliance on Twitter. (Haha, I got @YoDawgCompliance2K. Suck it!)
(Incidentally, looking for a fully-YoDawg compliant ITIL implementation? Leverage @YoDawgCompliance2K thought leadership TODAY!)
Next up was the talk on the Greenfield HPC by @arksecond. I didn't know the term, and earlier in the week I'd pestered him for an explanation. Explanation follows: Greenfield is a term from the construction industry, and denotes a site devoid of any existing infrastructure, buildings, etc where one might do anything; Brownfield means a site where there is existing buildings, etc and you have to take those into account. Explanation ends. Back to the talk. Which was interesting.
They're budgeting 25 kW/rack, twice what we do. For cooling they use spot cooling, but they also were able to quickly prototype aisle containment with duct tape and cardboard. I laughed, but that's awesome: quick and easy, and it lets you play around and get it right. (The cardboard was replaced with plexiglass.)
Lunch was with Matt and Ken from FOO National Labs, then Sysad1138 and Scott. Regression was done, fun was had and phones were stolen.
The plenary! Geoff Halprin spoke about how DevOps has been done for a long time, isn't new and doesn't fix everything. Q from the audience: I work at MIT, and we turn out PhDs, not code; what of this applies to me? A: In one sense, not much; this is not as relevant to HPC, edu, etc; not everything looks like enterprise setups. But look at the techniques, underlying philosophy, etc and see what can be taken.
That's my summary, and the emphasis is prob. something he'd disagree with. But it's Friday as I write this and I am tired as I sit in the airport, bone tired and I want to be home. There are other summaries out there, but this one is mine.
Silly simple lies
They made a human being out of you...
"Flair", Josh Rouse
Thursday I gave my Lightning Talk. I prepared for it by writing it out, then rehearsing a couple times in my room to get it down to five minutes. I think it helped, since I got in about two seconds under the wire. I think I did okay; I'll post it separately. Pic c/o Bob the Viking:
Some other interesting talks:
@perlstalker on his experience with Ceph (he's happy);
@chrisstpierre on why XML is good for (it's code with a built-in validator; don't use it for setting syslog levels);
the guy who wanted to use retired aircraft carriers as floating data centres;
Dustin on MozPool (think cloud for Panda Boards);
Stew (@digitalcrow) on Machination, his homegrown hierarchical config management tool (users can set their preferences; if needed for the rest of their group, it can be promoted up the hierarchy as needed);
Derek Balling on megacity.org/timeline (keep your fingers crossed!);
a Google dev on his experience bringing down GMail.
Afterward I went to the vendor booths again, and tried the RackSpace challenge: here's a VM and it's root password; it needs to do X, Y and Z. GO. I was told my time wasn't bad (8.5 mins; wasn't actually too hard), and I may actually win something. Had lunch with John again and discussed academia, fads in theoretical computer science and the like.
The afternoon talk on OmniOS was interesting; it's an Illumos version/distro with a rigourous update schedule. The presenter's company uses it in a LOT of machines, and their customers expect THEM to fix any problems/security problems...not say "Yeah, the vendor's patch is coming in a couple weeks." Stripped down; they only include about 110 packages (JEOS: "Just Enough Operating System") in the default install. "Holy wars" slide: they use IPS ("because ALL package managers suck") and vi (holler from audience: "Which one?"). They wrote their own installer: "If you've worked with OpenSolaris before, you know that it's actually pretty easy getting it to work versus fucking getting it on the disk in the first place."
At the break I met with Nick Anderson (@cmdln_) and Diego Zamboni (@zzamboni, author of "Learning Cfengine 3"). Very cool to meet them both, particularly as they did not knee me in the groin for my impertinence in criticising of Cf3 syntax. Very, very nice and generous folk.
The next talk, "NSA on the Cheap", was one I'd already heard from the USENIX conference in the summer (downloaded the MP3), so I ended up talking to Chris Allison. I met him in Baltimore on the last day, and it turns out he's Matt's coworker (and both work for David Blank-Edelman). And when he found out that Victor was there (we'd all gone out on our last night in Baltimore) he came along to meet him. We all met up, along with Victor's wife Jennifer, and caught up even more. (Sorry, I'm writing this on Friday; quality of writing taking a nosedive.)
And so but Victor, Jennifer and I went out to Banker's Hill, a restaurant close to the hotel. Very nice chipotle bacon meatloaf, some excellent beer, and great conversation and company. Retired back to the hotel and we both attended the .EDU BoF. Cool story: someone who's unable to put a firewall on his network (he's in a department, not central IT, so not an option for him) woke up one day to find his printer not only hacked, but the firmware running a proxy of PubMed to China ("Why is the data light blinking so much?"). Not only that, but he couldn't upgrade the firmware because the firmware reloading code had been overwritten.
Q: How do you know you're dealing with a Scary Viking Sysadmin?
A: Service monitoring is done via two ravens named Huginn and Muninn.
Careful with words -- they are so meaningful
Yet they scatter like the booze from our breath...
"The White Trash Period Of My Life", Josh Rouse
I woke up at a reasonable time and went down to the lobby for free wireless; finished up yesterday's entry (2400 words!), posted and ate breakfast with Andy, Alf ("I went back to the Instagram hat store yesterday and bought the fedora. But now I want to accessorize it") and...Bob in full Viking drag.
Andy: "Now you...you look like a major in the Norwegian army."
Off to the Powershell tutorial. I've been telling people since that I like two things from Microsoft: the Natural Keyboard, and now Powershell. There are some very, very nice features in there:
common args/functions for each command, provided by the PS library
directory-like listings for lots of things (though apparently manipulating the registry through PS is sub-optimal); feels Unix/Plan 9-like
$error contains all the errors in your interactive cycle
"programming with hand grenades": because just 'bout everything in PS is an object, you can pass that along through a pipe and the receiving command explodes it and tries to do the right thing.
My notes are kind of scattered: I was trying to install version 3 (hey MS: please make this easier), and then I got distracted by something I had to do for work. But I also got to talk to Steve Murawski, the instructor, during the afternoon break, as we were both on the LOPSA booth. I think MS managed to derive a lot of advantage from being the last to show up at the party.
Interestingly, during the course I saw on Twitter that Samba 4 has finally been released. My jaw dropped. It looks like there are still some missing bits, but it can be an AD now. [Keanu voice] Whoah.
During the break I helped staff the LOPSA booth and hung out with a syadmin from NASA; one of her users is a scientist who gets data from the ChemCam (I think) on Curiosity. WAH.
The afternoon's course was on Ganeti, given by Tom Limoncelli and Guido Trotter. THAT is my project for next year: migrating my VMs, currently on one host, to Ganeti. It seems very, very cool. And on top of that, you can test it out in VirtualBox. I won't put in all my notes, since I'm writing this in a hurry (I always fall behind as the week goes on) and a lot of it is avail on the documentaion. But:
You avoid needing a SAN by letting it do DRBD on different pairs of nodes. Need to migrate a machine? Ganeti will pass it over to the other pair.
If you've got a pair of machines (which is about my scale), you've just gained failover of your VMs. If you've got more machines, you can declare a machine down (memory starts crapping out, PS failing, etc) and migrate the machines over to their alternate. When the machine's back up, Ganeti will do the necessary to get the machine back in the cluster (sync DRBDs, etc).
You can import already-existing VMs (Tom: "Thank God for summer interns.")
There's a master, but there are master candidates ready to take over if requested or if the master becomes unavailable.
There's a web manager to let users self-provision. There's also Synnefo, a AWS-like web FE that's commercialized as Okeanos.io (free trial: 3-hour lifetime VMs)
I talked with Scott afterward, and learned something I didn't know: NFS over GigE works fine for VM images. Turn on hard mounts (you want to know when something goes wrong), use TCP, use big block sizes, but it works just fine. This changes everything.
In the evening the bar was full and the hotel restaurant was definitely outside my per diem, so I took a cab downtown to the Tipsy Crow. Good food, nice beer, and great people watching. (Top tip for Canadians: here, the hipsters wear moustaches even when it's not Movember. Prepare now and get ahead of the curve.) Then back to the hotel for the BoFs. I missed Matt's on small infrastructure (damn) but did make the amateur astronomy BoF, which was quite cool. I ran into John Hewson, my roommate from the Baltimore LISA, and found out he's presenting tomorrow; I'll be there for that.
Q: How do you know you're with a Scary Viking Sysadmin?
A: Prefaces new cool thing he's about to show you with "So I learned about this at the last sysadmin Althing...."
And if I ever was myself,
I wasn't that night...
"Handshake Drugs", Wilco
Wednesday was opening day: the stats (1000+ attendees) and the awards (the Powershell devs got one for "bringing the power of automated system administration to Windows, where it previously largely unsupported"). Then the keynote from Vint Cerf, co-designer of TCP and yeah. He went over a lot of things, but made it clear he was asking questions, not proposing answers. Many cool quotes, including: "TCP/IP runs over everything, including you if you're not paying attention." Discussed the recent ITU talks a lot, and what exactly he's worried about there. Grab the audio/watch the video.
Next talk was about a giant scan of the entire Internet (/0) for SIP servers. Partway through my phone rang and I had to take it, but by the time I got out to the hall it'd stopped and it turned out to be a wrong number anyway. Grr.
IPv6 numbering strategies was next. "How many hosts can you fit in a /48? ALL OF THEM." Align your netblocks by nibble boundaries (hex numbers); it makes visual recognition of demarcation so much easier. Don't worry about packing addresses, because there's lots of room and why complicate things? You don't want to be doing bitwise math in the middle of the night.
Lunch, and the vendor tent. But first an eye-wateringly expensive burrito -- tasty, but $9. It was NOT a $9-sized burrito. I talked to the CloudStack folks and the Ceph folks, and got cool stuff from each. Both look very cool, and I'm going to have to look into them more when I get home. Boxer shorts from the Zenoss folks ("We figured everyone had enough t-shirts").
I got to buttonhole Mark Burgess, tell him how much I'm grateful for what he's done but OMG would he please do something about the mess of brackets. Like the Wordpress sketch:
commands:
!wordpress_tarball_is_present::
"/usr/bin/wget -q -O $($(params)[_tarfile]) $($(params)[_downloadurl])"
comment => "Downloading latest version of WordPress.";
His response, as previously, was "Don't do that, then." To be fair, I didn't have this example and was trying to describe it verbally ("You know, dollar bracket dollar bracket variable square bracket...C'mon, I tweeted about it in January!"). And he agreed yes, it's a problem, but it's in the language now, and indirection is a problem no matter what. All of which is true, and I realize it's easy for me to propose work for other people without coming up with patches. And I let him know that this was a minor nit, that I really was grateful for Cf3. So there.
I got to ask Dru Lavigne about FreeBSD's support for ZFS (same as Illumos) and her opinion of DragonflyBSD (neat, thinks of it as meant for big data rather than desktops, "but maybe I'm just old and crotchety").
I Talked with a PhD student who was there to present a paper. He said it was an accident he'd done this; he's not a sysadmin, and though his nominal field is CS, he's much more interested in improving the teaching of undergraduate students. ("The joke is that primary/secondary school teachers know all about teaching and not so much about the subject matter, and at university it's the other way around."). In CompSci it's all about the conferences -- that's where/how you present new work, not journals (Science, Nature) like the natural sciences. What's more, the prestigious conferences are the theoretical ones run by the ACM and the IEEE, not a practical/professional one like LISA. "My colleagues think I'm slumming."
Off to the talks! First one was a practice and experience report on the config and management of a crapton (700) iPads for students at an Australian university. The iPads belonged to the students -- so whatever profile was set up had to be removable when the course was over, and locking down permanently was not an option.
No suitable tools for them -- so they wrote their own. ("That's the way it is in education.") Started with Django, which the presenter said should be part of any sysadmin's toolset; easy to use, management interface for free. They configured one iPad, copied the configuration off, de-specified it with some judicious search and replace, and then prepared it for templating in Django. To install it on the iPad, the students would connect to an open wireless network, auth to the web app (which was connected to the university LDAP), and the iPad would prompt them to install the profile.
The open network was chosen because the secure network would require a password....which the iPad didn't have yet. And the settings file required an open password in it for the secure wireless to work. The reviewers commented on this a lot, but it was a conscious decision: setting up the iPad was one of ten tasks done on their second day, and a relatively technical one. And these were foreign students, so language comprehension was a problem. In the end, they felt it was a reasonable risk.
John Hewson was up next, talking about ConfSolve, his declarative configuration language connected to/written with a constraint solver. ("Just cross this red wire with this blue wire...") John was my roommate at the Baltimore LISA, and it was neat to see what he's been working on. Basically, you can say things like "I want this VM to have 500 GB of disk" and ConfSolve will be all like, "Fuck you, you only have 200 GB of storage left". You can also express hard limits and soft preferences ("Maximize memory use. It'd be great if you could minimise disk space as well, but just do your best"). This lets you do things like cloudbursting: "Please keep my VMs here unless things start to suck, in which case move my web, MySQL and DNS to AWS and leave behind my SMTP/IMAP."
After his presentation I went off to grab lunch, then back to the LISA game show. It was surprisingly fun and funny. And then, Matt and I went to the San Diego Maritime Museum, which was incredibly awesome. We walked through The Star of India, a huge three-masted cargo ship that still goes out and sails. There were actors there doing Living History (you could hear the caps) with kids, and displays/dioramas to look at. And then we met one of the actors who told us about the ship, the friggin' ENORMOUS sails that make it go (no motor), and about being the Master at Arms in the movie "Master and Commander". Which was our cue to head over to the HMS Surprise, used in the filming thereof. It's a replica, but accurate and really, really neat to see. Not nearly as big as the Star of India, and so many ropes...so very, very many ropes. And after that we went to a Soviet (!) Foxtrot-class submarine, where we had to climb through four circular hatches, each about a metre in diameter. You know how they say life in a submarine is claustrophobic? Yeah, they're not kidding. Amazing, and I can't recommend it enough.
We walked back to the hotel, got some food and beer, and headed off to the LOPSA annual meeting. I did not win a prize. Talked with Peter from the University of Alberta about the lightning talk I promised to do the next day about reproducible science. And thence to bed.
Q: How do you know you're with a Scary Viking Sysadmin?
A: When describing multiple roles at the office, says "My other hat is made of handforged steel."
And my conscience has it stripped down to science
Why does everything displease me?
Still, I'm trying...
"Christmas with Jesus", Josh Rouse
At 3am my phone went off with a page from $WORK. It was benign, but do you think I could get back to sleep? Could I bollocks. I gave up at 5am and came down to the hotel lobby (where the wireless does NOT cost $11/day for 512 Kb/s, or $15 for 3Mb/s) to get some work done and email my family. The music volume was set to 11, and after I heard the covers of "Living Thing" (Beautiful South) and "Stop Me If You Think That You've Heard This One Before" (Marc Ronson; disco) I retreated back to my hotel room to sit on my balcony and watch the airplanes. The airport is right by both the hotel and the downtown, so when you're flying in you get this amazing view of the buildings OH CRAP RIGHT THERE; from my balcony I can hear them coming in but not see them. But I can see the ones that are, I guess, flying to Japan; they go straight up, slowly, and the contrail against the morning twilight looks like rockets ascending to space. Sigh.
Abluted (ablated? hm...) and then down to the conference lounge to stock up on muffins and have conversations. I talked to the guy giving the .EDU workshop ("What we've found is that we didn't need a bachelor's degree in LDAP and iptables"), and with someone else about kids these days ("We had a rich heritage of naming schemes. Do you think they're going to name their desktop after Lord of the Rings?" "Naw, it's all gonna be Twilight and Glee.")
Which brought up another story of network debugging. After an organizational merger, network problems persisted until someone figured out that each network had its own DNS servers that had inconsistent views. To make matters worse, one set was named Kirk and Picard, and the other was named Gandalf and Frodo. Our Hero knew then what to do, and in the post-mortem Root Cause Diagnosis, Executive Summary, wrote "Genre Mismatch." [rimshot]
(6.48 am and the sun is rising right this moment. The earth, she is a beautiful place.)
And but so on to the HPC workshop, which intimidated me. I felt unprepared. I felt too small, too newbieish to be there. And when the guy from fucking Oak Ridge got up and said sheepishly, "I'm probably running one of the smaller clusters here," I cringed. But I needn't have worried. For one, maybe 1/3rd of the people introduced themselves as having small clusters (smallest I heard was 10 nodes, 120 cores), or being newbies, or both. For two, the host/moderator/glorious leader was truly excellent, in the best possible Bill and Ted sense, and made time for everyone's questions. For three, the participants were also generous with time and knowledge, and whether I asked questions or just sat back and listened, I learned so much.
Participants: Oak Ridge, Los Alamos, a lot of universities, and a financial trading firm that does a lot of modelling and some really interesting, regulatory-driven filesystem characteristics: nothing can be deleted for 7 years. So if someone's job blows up and it litters the filesystem with crap, you can't remove the files. Sure, they're only 10-100 MB each, but with a million jobs a day that adds up. You can archive...but if the SEC shows up asking for files, they need to have them within four hours.
The guy from Oak Ridge runs at least one of his clusters diskless: less moving parts to fail. Everything gets saved to Lustre. This became a requirement when, in an earlier cluster, a node failed and it had Very Important Data on a local scratch disk, and it took a long time to recover. The PI (==principal investigator, for those not from an .EDU; prof/faculty member/etc who leads a lab) said, "I want to be able to walk into your server room, fire a shotgun at a random node, and have it back within 20 minutes." So, diskless. (He's also lucky because he gets biweekly maintenance windows. Another admin announces his quarterly outages a year in advance.)
There were a lot of people who ran configuration management (Cf3, Puppet, etc) on their compute nodes, which surprised me. I've thought about doing that, but assumed I'd be stealing precious CPU cycles from the science. Overwhelming response: Meh, they'll never notice. OTOH, using more than one management tool is going to cause admin confusion or state flapping, and you don't want to do that.
One guy said (both about this and the question of what installer to use), "Why are you using anything but Rocks? It's federally funded, so you've already paid for it. It works and it gets you a working cluster quickly. You should use it unless you have a good reason not to." "I think I can address that..." (laughter) Answer: inconsistency with installations; not all RPMs get installed when you're doing 700 nodes at once, so he uses Rocks for a bare-ish install and Cf3 after that -- a lot like I do with Cobbler for servers. And FAI was mentioned too, which apparently has support for CentOS now.
One .EDU admin gloms all his lab's desktops into the cluster, and uses Condor to tie it all together. "If it's idle, it's part of the cluster." No head node, jobs can be submitted from anywhere, and the dev environment matches the run environment. There's a wide mix of hardware,so part of user education a) is getting people to specify minimal CPU and memory requirements and b) letting them know that the ideal job is 2 hours long. (Actually, there were a lot of people who talked about high-turnover jobs like that, which is different from what I expected; I always thought of HPC as letting your cluster go to town for 3 weeks on something. Perhaps that's a function of my lab's work, or having a smaller cluster.)
User education was something that came up over and over again: telling people how to efficiently use the cluster, how to tweak settings (and then vetting jobs with scripts).
I asked about how people learned about HPC; there's not nearly the wealth of resources that there are for programming, sysadmin, networking, etc. Answer: yep, it's pretty quiet out there. Mailing lists tend to be product-specific (though are pretty excellent), vendor training is always good if you can get it, but generally you need to look around a lot. ACM has started a SIG for HPC.
I asked about checkpointing, which was something I've been very fuzzy about. Here's the skinny:
Checkpointing is freezing the process so that you can resurrect it later. It protects against node failures (maybe with automatic moving of the process/job to another node if one goes down) and outages (maybe caused by maintenance windows.)
Checkpointing can be done at a few different layers:
* The easiest and best by far is for the app to do it. It knows its
state intimately and is in the best position to do this. However,
the app needs to support this. Not necessary to have it explicitly
save the process (as in, kernel-resident memory image, registers,
etc); if it can look at logs or something and say "Oh, I'm 3/4
done", then that's good too.
* The Condor scheduler supports this, *but* you have to do this by
linking in its special libraries when you compile your program. And
none of the big vendors do this (Matlab, Mathematica, etc).
* BLCR: "It's 90% working, but the 10% will kill you." Segfaults,
restarts only work 2/3 of the time, etc. Open-source project from a
federal lab and until very recently not funded -- so the response to
"There's this bug..." was "Yeah, we're not funded. Can't do nothing
for you." Funding has been obtained recently, so keep your fingers
crossed.
One admin had problems with his nodes: random slowdowns, not caused
by cstates or the other usual suspects. It's a BIOS problem of some
sort and they're working it out with the vendor, but in the meantime
the only way around it is to pull the affected node and let the power
drain completely. This was pointed out by a user ("Hey, why is my job
suddenly taking so long?") who was clever enough to write a
dirt-simple 10 million iteration for-loop that very, very obviously
took a lot longer on the affected node than the others. At this point
I asked if people were doing regular benchmarking on their clusters to
pick up problems like this. Answer: no. They'll do benchmarking on
their cluster when it's stood up so they have something to compare it
to later, but users will unfailingly tell them if something's slow.
I asked about HPL; my impression when setting up the cluster was, yes,
benchmark your own stuff, but benchmark HPL too 'cos that's what you
do with a cluster. This brought up a host of problems for me, like
compiling it and figuring out the best parameters for it. Answers:
* Yes, HPL is a bear. Oak Ridge: "We've got someone for that and
that's all he does." (Response: "That's your answer for everything
at Oak Ridge.")
* Fiddle with the params P, Q and N, and leave the rest alone. You
can predict the FLOPS you should get on your hardware, and if you
get 90% or so within that you're fine.
* HPL is not that relevant for most people, and if you tune your
cluster for linear algebra (which is what HPL does) you may get
crappy performance on your real work.
* You can benchmark it if you want (and download Intel's binary if you
do; FIXME: add link), but it's probably better and easier to stick
to your own apps.
Random:
* There's a significant number of clusters that expose interactive
sessions to users via qlogin; that had not occurred to me.
* Recommended tools:
* ubmod: accounting graphs
* Healthcheck scripts (Werewolf)
* stress: cluster stress test tool
* munin: to collect arbitrary info from a machine
* collectl: good for ie millisecond resolution of traffic spikes
* "So if a box gets knocked over -- and this is just anecdotal -- my
experience is that the user that logs back in first is the one who
caused it."
* A lot of the discussion was prompted by questions like "Is anyone
else doing X?" or "How many people here are doing Y?" Very helpful.
* If you have to return warranty-covered disks to the vendor but you
really don't want the data to go, see if they'll accept the metal
cover of the disk. You get to keep the spinning rust.
* A lot of talk about OOM-killing in the bad old days ("I can't tell
you how many times it took out init."). One guy insisted it's a lot
better now (3.x series).
* "The question of changing schedulers comes up in my group every six
months."
* "What are you doing for log analysis?" "We log to /dev/null."
(laughter) "No, really, we send syslog to /dev/null."
* Splunk is eye-wateringly expensive: 1.5 TB data/day =~ $1-2 million
annual license.
* On how much disk space Oak Ridge has: "It's...I dunno, 12 or 13 PB?
It's 33 tons of disks, that's what I remember."
* Cheap and cheerful NFS: OpenSolaris or FreeBSD running ZFS. For
extra points, use an Aztec Zeus for a ZIL: a battery-backed 8GB
DIMM that dumps to a compact flash card if the power goes out.
* Some people monitor not just for overutilization, but for
underutilization: it's a chance for user education ("You're paying
for my time and the hardware; let me help you get the best value for
that"). For Oak Ridge, though, there's less pressure for that:
scientists get billed no matter what.
* "We used to blame the network when there were problems. Now their
app relies on SQL Server and we blame that."
* Sweeping for expired data is important. If it's scratch, then
*treat* it as such: negotiate expiry dates and sweep regularly.
* Celebrity resemblances: Michael Moore and the guy from Dead Poet's
Society/The Good Wife. (Those are two different sysadmins, btw.)
* Asked about my .TK file problem; no insight. Take it to the lists.
(Don't think I've written about this, and I should.)
* On why one lab couldn't get Vendor X to supply DKMS kernel modules
for their hardware: "We're three orders of magnitude away from
their biggest customer. We have *no* influence."
* Another vote for SoftwareCarpentry.org as a way to get people up to
speed on Linux.
* A lot of people encountered problems upgrading to Torque 4.x and
rolled back to 2.5. "The source code is disgusting. Have you ever
looked at it? There's 15 years of cruft in there. The devs
acknowledged the problem and announced they were going to be taking
steps to fix things. One step: they're migrating to C++.
[Kif sigh]"
* "Has anyone here used Moab Web Services? It's as scary as it sounds.
Tomcat...yeah, I'll stop there." "You've turned the web into RPC. Again."
* "We don't have regulatory issues, but we do have a
physicist/geologist issue."
* 1/3 of the Top 500 use SLURM as a scheduler. Slurm's srun =~
Torque's pdbsh; I have the impression it does not use MPI (well,
okay, neither does Torque, but a lot of people use Torque + mpirun),
but I really need to do more reading.
* lmod (FIXME: add link) is a Environment Modules-compatible (works
with old module files) replacement that fixes some problems with old
EM, actively developed, written in lua.
* People have had lots of bad experiences with external Fermi GPU
boxes from Dell, particularly when attached to non-Dell equipment.
* Puppet has git hooks that let you pull out a particular branch on a node.
And finally:
Q: How do you know you're with a Scary Viking Sysadmin?
A: They ask for Thor's Skullsplitter Mead at the Google Bof.
Hotel in Arizona made us all wanna feel like stars...
"Hotel Arizona", Wilco
Sunday morning I was down in the lobby at 7.15am, drinking coffee purchased with my $5 gift certificate from the hotel for passing up housekeeping ("Sheraton Hotels Green Initiative"). I registered for the conference, came back to my hotel room to write some more, then back downstairs to wait for my tutorial on Amazon Web Services from Bill LeFebvre (former LISA chair and author of top(1)) and Marc Chianti. It was pretty damned awesome: an all-day course that introduced us to AWS and the many, many services they offer. For reasons that vary from budgeting to legal we're unlikely to move anything to AWS at $WORK, but it was very, very enlightening to learn more about it. Like:
Amazon lights up four new racks a day, just keeping up with increased demand.
Their RDS service (DB inna box) will set up replication automagically AND apply patches during configurable regular downtime. WAH.
vmstat(1) will, for a VM, show CPU cycles stolen by/for other VMs in the ST column
Amazon will not really guarantee CPU specs, which makes sense (you're on guest on a host of 20 VMs, many hardware generations, etc). One customer they know will spin up a new instance and immediately benchmark it to see if performance is acceptable; if not, they'll destroy it and try again.
Netflix, one of AWS' biggest customers, does not use EBS (persistent) storage for its instances. If there's an EBS problem -- and this probably happens a few times a year -- they keep trucking.
It's quite hard to "burst into the cloud" -- to use your own data centre most of the time, then move stuff to AWS at Xmas, when you're Slashdotted, etc. The problem is: where's your load balancer? And how do you make that available no matter what?
One question I asked: How would you scale up an email service? 'Cos for that, you don't only need CPU power, but (say) expanded disk space, and that shared across instances. A: Either do something like GlusterFS on instances to share FS, or just stick everything in RDS (AWS' MySWL service) and let them take care of it.
The instructors know their stuff and taught it well. If you have the chance, I highly recommend it.
Lunch/Breaks:
Met someone from Mozilla who told me that they'd just decommissioned the last of their community mirrors in favour of CDNs -- less downtime. They're using AWS for a new set of sites they need in Brazil, rather than opening up a new data centre or some such.
Met someone from a flash sale site: they do sales every day at noon, when they'll get a million visitors in an hour, and then it's quiet for the next 23 hours. They don't use AWS -- they've got enough capacity in their data centre for this, and they recently dropped another cloud provider (not AWS) because they couldn't get the raw/root/hypervisor-level performance metrics they wanted.
Saw members of (I think) this show choir wearing spangly skirts and carrying two duffel bags over each shoulder, getting ready to head into one of the ballrooms for a performance at a charity lunch.
Met a sysadmin from a US government/educational lab, talking about fun new legal constraints: to keep running the lab, the gov't required not a university but a LLC. For SLAC, that required a new entity called SLAC National Lab, because Stanford was already trademarked and you can't delegate a trademark like you can DNS zones. And, it turns out, we're not the only .edu getting fuck-off prices from Oracle. No surprise, but still reassuring.
I saw Matt get tapped on the shoulder by one of the LISA organizers and taken aside. When he came back to the table he was wearing a rubber Nixon mask and carrying a large clanking duffel bag. I asked him what was happening and he said to shut up. I cried, and he slapped me, then told me he loved me, that it was just one last job and it would make everything right. (In the spirit of logrolling, here he is scoping out bank guards:
Where does the close bracket go?)
After that, I ran into my roommate from the Baltimore LISA in 2009 (check my tshirt...yep, 2009). Very good to see him. Then someone pointed out that I could get free toothpaste at the concierge desk, and I was all like, free toothpaste?
And then who should come in but Andy Seely, Tampa Bay homeboy and LISA Organizing Committee member. We went out for beer and supper at Karl Strauss (tl;dr: AWESOME stout). Discussed fatherhood, the ageing process, free-range parenting in a hanger full of B-52s, and just how beer is made. He got the hang of it eventually:
I bought beer for my wife, he took a picture of me to show his wife, and he shared his toothpaste by putting it on a microbrewery coaster so I didn't have to pay $7 for a tube at the hotel store, 'cos the concierge was out of toothpaste. It's not a euphemism.
Q: How do you know you're with a Scary Viking Sysadmin?
A: They insist on hard drive destruction via longboat funeral pyre.
Wasted days, wasted nights
Try to downplay being uptight...
-- "(nothinsevergonnastandinmyway) Again", Wilco
Saturday I headed out the door at 5.30am -- just like I was going into work early. I'd been up late the night before finishing up "Zone One" by Colson Whitehead, which ZOMG is incredible and you should read, but I did not want to read while alone and feeling discombobulated in a hotel room far from home. Cab to the airport, and I was suprised to find I didn't even have to opt out; the L3 scanners were only being used irregularly. I noticed the hospital curtains set up for the private screening area; it looked a bit like God's own shower curtain.
The customs guard asked me where I was going, and whether I liked my job. "That's important, you know?" Young, a shaved head and a friendly manner. Confidential look left, right, then back at me. "My last job? I knew when it was time to leave that one. You have a good trip."
The gate for the airline I took was way out on a side wing of the airport, which I can only assume meant that airline lost a coin toss or something. The flight to Seattle was quick and low, so it wasn't until the flight to San Diego that a) we climbed up to our cruising altitude of $(echo "39000/3.3" | bc) 11818 meters and b) my ears started to hurt. I've got a cold and thought that my aggressive taking of cold medication would help, but no. The first seatmate had a shaved head, a Howie Mandel soul patch, a Toki watch and read "Road and Track" magazine, staring at the ads for mag wheels; the other seatmate announced that he was in the Navy, going to his last command, and was going to use the seat tray as a headrest as soon as they got to cruising. "I was up late last night, you know?" I ate my Ranch Corn Nuggets (seriously).
Once at the hotel, I ran into Bob the Norwegian, who berated me for being surprised that he was there. "I've TOLD you this over and over again!" Not only that, but he was there with three fellow Norwegian sysadmins, including his minion. I immediately started composing Scary Viking Sysadmin questions in my head; you may begin to look forward to them.
We went out to the Gaslamp district of San Diego, which reminds me a lot of Gastown in Vancouver; very familiar feel, and a similar arc to its history. Alf the Norwegian wanted a hat for cosplay, so we hit two -- TWO -- hat stores. The second resembled nothing so much as a souvenir shop in a tourist town, but the first was staffed by two hipsters looking like they'd stepped straight out of Instagram:
They sold $160 Panama hats. I very carefully stayed away from the merchandise. Oh -- and this is unrelated -- from the minibar in my hotel room:
We had dinner at a restaurant whose name I forget; stylish kind of place, with ten staff members (four of whom announced, separately, that they would be our server for the night). They seemed disappointed when I ordered a Wipeout IPA ("Yeah, we're really known more for our Sangria"), but Bob made up for it by ordering a Hawaiian Hoo-Hoo:
We watched the bar crawlers getting out of cabs dressed in Sexy Santa costumes ("The 12 Bars of Xmas Pub Crawl 2012") and discussed Agile Programming (which phrase, when embedded in a long string of Norwegian, sounds a lot like "Anger Management".)
Q: How do you know you're with a Scary Viking Sysadmin?
A: They explain the difference between a fjord and a fjell in terms of IPv6 connectivity.
There was also this truck in the streets, showing the good folks of San Diego just what they were missing by not being at home watching Fox Sports:
We headed back to the hotel, and Bob and I waited for Matt to show up. Eventually he did, with Ben Cotton in tow (never met him before -- nice guy, gives Matt as much crap as I do -> GOOD) and Matt regaled us with tales of his hotel room:
Matt: So -- I don't wanna sound special or anything -- but is your room on the 7th floor overlooking the pool and the marina with a great big king-sized bed? 'Cos mine is.
Me: Go on.
Matt: I asked the guy at the desk when I was checking in if I could get a king-size bed instead of a double --
Me: "Hi, I'm Matt Simmons. You may know me from Standalone Hyphen Sysadmin Dot Com?"
Ben: "I'm kind of a big deal on the Internet."
Matt: -- and he says sure, but we're gonna charge you a lot more if you trash it.
Not Matt's balcony:
(UPDATE: Matt read this and said "Actually, I'm on the 9th floor? Not the 7th." saintaardvarkthecarpeted.com regrets the error.)
I tweeted from the bar using my laptop ("It's an old AOLPhone prototype"). It was all good.
My friend Andy, who blogs at Tampa Bay Breakfasts, got an article written about him here. Like his blog, it's good reading. You should read both.
He's also a sysadmin who's on the LISA organizing committee this year, and I'm going to be seeing him in a few days when I head down to San Diego. The weather is looking shockingly good for this Rain City inhabitant. I'm looking forward to it. Now I just have to pick out my theme band for this year's conference....I'm thinking maybe Josh Rouse.
Yesterday I finally moved the $WORK mail server (well, services) from a workstation under my desk to a proper VM and all. Mailman, Postfix, Dovecot -- all went. Not only that, but I've got them running under SELinux no less. Woot!
Next step was to update all the documentation, or at least most of it, that referred to the old location. In the process I came across something I'd written in advance of the last time I went to LISA: "My workstation is not important. It does no services. I mention this so that no one will panic if it goes down."
Whoops: not true! While migrating to Cfengine 3, I'd set up the Cf3 master server on my workstation. After all, it was only for testing, right? Heh. We all know how that goes. So I finally bit the bullet and moved it over to a now-even-more-important VM (no, not the mail server) and put the policy files under /masterfiles so that bootstrapping works. Now we're back to my workstation only holding my stuff. Hurrah!
And did I mention that I'm going to LISA? True story. Sunday I'm doing Amazon Web Services training; Monday I'm in the HPC workshop; Tuesday I'm doing Powershell Fundamentals (time to see how the other half lives, and anyway I've heard good things about Powershell) and Ganeti (wanted to learn about that for a while). As for the talks: I'm not as overwhelmed this year, but the Vint Cerf speech oughta be good, and anyhow I'm sure there will be lots I can figure out on the day.
Completely non-techrelated link of the day: "On Drawing". This woman is an amazing writer.
I've been asked to revisit Hadoop at $WORK. About a year ago I got a small cluster (3 nodes) working and was plugging away at Myrna...but then our need for Myrna disappeared, and Hadoop was left fallow. The need this time around seems more permanent.
So far I'm trying to get a simple streaming job working. The initial script is pretty simple:
samtools view input.bam| cut -f 3 | uniq -c | sed 's/^[\t]*//' | sort -k1,1nr > output.txt
This breaks down to:
which, invoked Hadoop-style, should be: ``` hstream -input input.bam \ -file mapper.sh -mapper "mapper.sh" \ -file reducer.sh -reducer "reducer.sh" \ -output output.txt
Running the mapper.sh/reducer.sh files works fine; the problem is that
under Hadoop, it fails:
2012-11-06 12:07:30,106 INFO org.apache.hadoop.streaming.PipeMapRed: R/W/S=1000/0/0 in:NA [rec/s] out:NA [rec/s] 2012-11-06 12:07:30,110 INFO org.apache.hadoop.streaming.PipeMapRed: MRErrorThread done 2012-11-06 12:07:30,111 WARN org.apache.hadoop.streaming.PipeMapRed: java.io.IOException: Broken pipe at java.io.FileOutputStream.writeBytes(Native Method) at java.io.FileOutputStream.write(FileOutputStream.java:260) at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
I'm unsure right now if that's [this error][3] or something else I've
done wrong. Oh well, it'll be fun to turn on debugging and see what's
going on under the hood...
...unless, of course, unless I'm wasting my time. A quick search
turned up a number of Hadoop-based bioinformatics tools
([Biodoop][4], [Seqpiq][5] and [Hadoop-Bam][6]), and I'm sure there
are a crapton more.
Other chores:
* Duplicating pythonbrew/modules work on another server since our
cluster is busy
* Migrating our mail server to a VM
* Setting up printing accounting with Pykota (latest challenge:
dealing wth usernames that aren't in our LDAP tree)
* Accumulated paperwork
* Renewing lapsed support on a Very Important Server
Oh well, at least I'm registered for [LISA][7]. Woohoo!
I've bene catching up on the talks at LISA last year, and one of them was Mark Burgess' talk "3 Myths and 3 Challenges to Bring System Administration out of the Dark Ages". (Anyone else reminded of "7 things about lawyers the occult can't explain?") If I was there, I'd've made this comment; as it is, I'll leave it here.
One of this points was that in this brave new world, we need to let go of serialism ("A follows B follows C, and that's Just The Way It Is(tm)"). That's the old way of thinking, he said, the Industrial way; we can do much more in parallel than we ever could in serial.
It occurs to me that it might be better to say that needless serialism can be let go of. Like a Makefile: the final executable depends on all the object files; without them, there's no sense trying to create it. But the object files typically depend on a file or two each (a .c and .h file, say), and there's no reason they can't be compiled in parallel ("make -j9"). Dependencies are there for a reason, and it is no bad thing to hold on to them.
(Kinda like the misquoting of Emerson. Often, you hear "Consistency is the hobgoblin of little minds." But the quote actually begins "A foolish consistency..." And now, having demonstrated my superiority by quoting Wikipedia, I will now disappear up my own ass.)
How many times have I tried
Just to get away from you, and you reel me back?
How many times have I lied
That there's nothing that I can do?
-- Sloan
Friday morning started with a quick look at Telemundo ("PRoxima: Esclavas del sexo!"), then a walk to Phillz coffee. This time I got the Tesora blend (their hallmark) and wow, that's good coffee. Passed a woman woman pulling two tiny dogs across the street: "C'mon, peeps!" Back at the tables I checked my email and got an amazing bit of spam about puppies, and how I could buy some rare breeds for ch33p.
First up was the Dreamworks talk. But before that, I have to relate something.
Earlier in the week I ran into Sean Kamath, who was giving the talk, and told him it looked interesting and that I'd be sure to be there. "Hah," he said, "Wanna bet? Tom Limoncelli's talk is opposite mine, and EVERYONE goes to a Tom Limoncelli talk. There's gonna be no one at mine."
Then yesterday I happened to be sitting next to Tom during a break, and he was discussing attendance at the different presentations. "Mine's tomorrow, and no one's going to be there." "Why not?" "Mine's opposite the Dreamworks talk, and EVERYONE goes to Dreamworks talks."
Both were quite amused -- and possibly a little relieved -- to learn what the other thought.
But back at the ranch: FIXME in 2008, Sean gave a talk on Dreamworks and someone asked afterward "So why do you use NFS anyway?" This talk was meant to answer that.
So, why? Two reasons:
They use lots of local caching (their filers come from NetApp, and they also have a caching box), a global namespace, data hierarchy (varying on the scales of fast, reliable and expensive), leverage the automounter to the max, and 10GB core links everywhere, and it works.
FTP/rcp/rdist? Nope. SSH? Won't handle the load. AFS lacks commercial support -- and it's hard to get the head of a billion-dollar business to buy into anything without commercial support.
They cache for two reasons: global availability and scalability. First, people in different locations -- like on different sides of the planet (oh, what an age we live in!) -- need access to the same files. (Most data has location affinity, but this will not necessarily be true in the future.) Geographical distribution and the speed of light do cause some problems: while Data reads and gettatr() are helped a lot by the caches, first open, sync()s and writes are slow when the file is in India and it's being opened in Redwood. They're thinking about improvements to the UI to indicate what's happening to reduce user frustration. But overall, it works and works well.
Scalability is just as important: thousands of machines hitting the same filter will melt it, and the way scenes are rendered, you will have just that situation. Yes, it adds latency, but it's still faster than an overloaded filer. (It also requires awareness of close-to-open consistency.)
Automounter abuse is rampant at DW; If one filer is overloaded, they move some data somewhere else and change the automount maps. (They're grateful for the automounter version in RHEL 5: it no longer requires that the node be rebooted to reload the maps.) But like everything else it requires a good plan, or it gets confusing quickly.
Oh, and quick bit of trivia: they're currently sourcing workstations with 96GB of RAM.
One thing he talked about was that there are two ways to do sysadmin: rule-enforcing and policy-driven ("No!") or creative, flexible approaches to helping people get their work done. The first is boring; the second is exciting. But it does require careful attention to customers' needs.
So for example: the latest film DW released was "Mastermind". This project was given a quota of 85 TB of storage; they finished the project with 75 TB in use. Great! But that doesn't account for 35 TB of global temp space that they used.
When global temp space was first brought up, the admins said, "So let me be clear: this is non-critical and non-backed up. Is that okay with you?" "Oh sure, great, fine." So the admins bought cheap-and-cheerful SATA storage: not fast, not reliable, but man it's cheap.
Only it turns out that non-backed up != non-critical. See, the artists discovered that this space was incredibly handy during rendering of crowds. And since space was only needed overnight, say, the space used could balloon up and down without causing any long-term problems. The admins discovered this when the storage went down for some reason, and the artists began to cry -- a day or two of production was lost because the storage had become important to one side without the other realizing it.
So the admins fixed things and moved on, because the artists need to get things done. That's why he's there. And if he does his job well, the artists can do wonderful things. He described watching "Madegascar", and seeing the crowd scenes -- the ones the admins and artists had sweated over. And they were good. But the rendering of the water in other scenes was amazing -- it blew him away, it was so realistic. And the artists had never even mentioned that; they'd just made magic.
Understand that your users are going to use your infrastructure in ways you never thought possible; what matters is what gets put on the screen.
Challenges remain:
Sometimes data really does need to be at another site, and caching doesn't always prevent problems. And problems in a data render farm (which is using all this data) tend to break everything else.
Much needs to be automated: provisioning, re-provisioning and allocating storage is mostly done by hand.
Disk utilization is hard to get in real time with > 4 PB of storage world wide; it can take 12 hours to get a report on usage by department on 75 TB, and that doesn't make the project managers happy. Maybe you need a team for that...or maybe you're too busy recovering from knocking over the filer by walking 75 TB of data to get usage by department.
Notifications need to be improved. He'd love to go from "Hey, a render farm just fell over!" to "Hey, a render farm's about to fall over!"
They still need configuration management. They have a homegrown one that's working so far. QOTD: "You can't believe how far you can get with duct tape and baling wire and twine and epoxy and post-it notes and Lego and...we've abused the crap out of free tools."
I went up afterwards and congratulated him on a good talk; his passion really came through, and it was amazing to me that a place as big as DW uses the same tools I do, even if it is on a much larger scale.
I highly recommend watching his talk (FIXME: slides only for now. Do it now; I'll be here when you get back.
During the break I got to meet Ben Rockwood at last. I've followed his blog for a long time, and it was a pleasure to talk with him. We chatted about Ruby on Rails, Twitter starting out on Joyent, upcoming changes in Illumos now that they've got everyone from Sun but Jonathan Schwarz (no details except to expect awesome and a renewed focus on servers, not desktops), the joke that Joyent should just come out with it and call itself "Sun". Oh, and Joyent has an office in Vancouver. Ben, next time you're up drop me a line!
Next up: Twitter. 165 million users, 90 million tweets per day, 1000 tweets per second....unless the Lakers win, in which case it peaks at 3085 tweets per second. (They really do get TPS reports.) 75% of those are by API -- not the website. And that percentage is increasing.
Lessons learned:
Nothing works the first time; scale using the best available tech and plan to build everything more than once.
(Cron + ntp) x many machines == enough load on, say, the central syslog collector to cause micro outages across the site. (Oh, and speaking of logging: don't forget that syslog truncates messages > MTU of packet.)
RRDtool isn't good for them, because by the time you want to fiugure out what that one minute outage was about two weeks ago, RRDtool has averaged away the data. (At this point Toby Oetiker, a few seats down from me, said something I didn't catch. Dang.)
Ops mantra: find the weakest link; fix; repeat. OPS stats: MTTD (mean time to detect problem) and MTTR (MT to recover from problem).
It may be more important to fix the problem and get things going again than to have a post-mortem right away.
At this scale, at this time, system administration turns into a large programming project (because all your info is in your config. mgt tool, correct?). They use Puppet + hundreds of Puppet modules + SVN + post-commit hooks to ensure code reviews.
Occasionally someone will make a local change, then change permissions so that Puppet won't change it. This has led to a sysadmin mantra at Twitter: "You can't chattr +i with broken fingers."
Curve fitting and other basic statistical tools can really help -- they were able to predict the Twitpocalypse (first tweet ID > 2^32) to within a few hours.
Decomposition is important to resiliency. Take your app and break it into n different independant, non-interlocked services. Put each of them on a farm of 20 machines, and now you no longer care if a machine that does X fails; it's not the machine that does X.
Because of this Nagios was not a good fit for them; they don't want to be alerted about every single problem, they want to know when 20% of the machines that do X are down.
Config management + LDAP for users and machines at an early, early stage made a huge difference in ease of management. But this was a big culture change, and management support was important.
And then...lunch with Victor and his sister. We found Good Karma, which had really, really good vegan food. I'm definitely a meatatarian, but this was very tasty stuff. And they've got good beer on tap; I finally got to try Pliny the Elder, and now I know why everyone tries to clone it.
Victor talked about one of the good things about config mgt for him: yes, he's got a smaller number of machines, but when he wants to set up a new VM to test something or other, he can get that many more tests done because he's not setting up the machine by hand each time. I hadn't thought of this advantage before.
After that came the Facebook talk. I paid a little less attention to this, because it was the third ZOMG-they're-big talk I'd been to today. But there were some interesting bits:
Everyone talks about avoiding hardware as a single point of failure, but software is a single point of failure too. Don't compound things by pushing errors upstream.
During the question period I asked them if it would be totally crazy to try different versions of software -- something like the security papers I've seen that push web pages through two different VMs to see if any differences emerge (though I didn't put it nearly so well). Answer: we push lots of small changes all the time for other reasons (problems emerge quickly, so easier to track down), so in a way we do that already (because of staged pushes).
Because we've decided to move fast, it's inevitable that problems will emerge. But you need to learn from those problems. The Facebook outage was an example of that.
Always do a post-mortem when problems emerge, and if you focus on learning rather than blame you'll get a lot more information, engagement and good work out of everyone. (And maybe the lesson will be that no one was clearly designated as responsible for X, and that needs to happen now.)
The final speech of the conference was David Blank-Edelman's keynote on the resemblance between superheroes and sysadmins. I watched for a while and then left. I think I can probably skip closing keynotes in the future.
And then....that was it. I said goodbye to Bob the Norwegian and Claudio, then I went back to my room and rested. I should have slept but I didn't; too bad, 'cos I was exhausted. After a while I went out and wandered around San Jose for an hour to see what I could see. There was the hipster cocktail bar called "Cantini's" or something; billiards, flood pants, cocktails, and the sign on the door saying "No tags -- no colours -- this is a NEUTRAL ZONE."
I didn't go there; I went to a generic looking restaurant with room at the bar. I got a beer and a burger, and went back to the hotel.
I missed my chance, but I think I'm gonna get another...
-- Sloan
Thursday morning brought Brendan Gregg's (nee Sun, then Oracle, and now Joyent) talk about data visualization. He introduced himself as the shouting guy, and talked about how heat maps allowed him so see what the video demonstrated in a much more intuitive way. But in turn, these require accurate measurement and quantification of performance: not just "I/O sucks" but "the whole op takes 10 ms, 1 of which is CPU and 9 of which is latency."
Some assumptions to avoid when dealing with metrics:
The available metrics are correctly implemented. Are you sure there's not a kernel bug in how something is measured? He's come across them.
The available metrics are designed by performance experts. Mostly, they're kernel developers who were trying to debug their work, and found that their tool shipped.
The available metrics are complete. Unless you're using DTrace, you simply won't always find what you're looking for.
He's not a big fan of using IOPS to measure performance. There are a lot of questions when you start talking about IOPS. Like what layer?
(He didn't add political and financial, but I think that would have been funny.)
Once you've got a number, what's good or bad? The number can change radically depending on things like library/filesystem prefetching or readahead (IOPS inflation), read caching or write cancellation (deflation), the size of a read (he had an example demonstrating how measured capacity/busy-ness changes depending on the size of reads)...probably your company's stock price, too. And iostat or your local equivalent averages things, which means you lose outliers...and those outliers are what slow you down.
IOPS and bandwidth are good for capacity planning, but latency is a much better measure of performance.
And what's the best way of measuring latency? That's right, heatmaps. Coming from someone who worked on Fishworks, that's not surprising, but he made a good case. It was interesting to see how it's as much art as science...and given that he's exploiting the visual cortex to make things clear that never were, that's true in a few different ways.
This part of the presentation was so visual that it's best for you to go view the recording (and anyway, my notes from that part suck).
During the break, I talked with someone who had worked at Nortel before it imploded. Sign that things were going wrong: new execs come in (RUMs: Redundant Unisys Managers) and alla sudden everyone is on the chargeback model. Networks charges ops for bandwidth; ops charges networks for storage and monitoring; both are charged by backups for backups, and in turn are charged by them for bandwidth and storage and monitoring.
The guy I was talking to figured out a way around this, though. Backups had a penalty clause for non-performance that no one ever took advantage of, but he did: he requested things from backup and proved that the backups were corrupt. It got to the point where the backup department was paying his department every month. What a clusterfuck.
After that, a quick trip to the vendor area to grab stickers for the kids, then back to the presentations.
Next was the 2nd day of Practice and Experience Reports ("Lessons Learned"). First up was the network admin (?) for ARIN about IPv6 migration. This was interesting, particularly as I'd naively assumed that, hey, they're ARIN and would have no problems at all on this front...instead of realizing that they're out in front to take a bullet for ALL of us, man. Yeah. They had problems, they screwed up a couple times, and came out battered but intact. YEAH!
Interesting bits:
Routing is not as reliable, not least because for a long time (and perhaps still) admins were treating IPv6 as an experiment, something still in beta: there were times when whole countries in Europe would disappear oft the map for weeks as admins tried out different things.
Understanding ICMPv6 is a must. Naive assumptions brought over from IPv4 firewalls like "Hey, let's block all ICMP except ping" will break things in wonderfully subtle ways. Like: it's up to the client, not the router, to fragment packets. That means the client needs to discover the route MTU. That depends on ICMPv6.
Not all transit is equal; ask your vendor if they're using the same equipment to route both protocols, or if IPv6 is on the old, crappy stuff they were going to eBay. Ask if they're using tunnels; tunnels aren't bad in themselves, but can add multiple layers to things and make things trickier to debug. (This goes double if you've decided to firewall ICMPv6...)
They're very happy with OpenBSD's pf as an IPv6 firewall.
Dual-stack OSes make things easy, but can make policy complex. Be aware that DHCPv6 is not fully supported (and yes, you need it to hand out things like DNS and NTP), and some clients (believe he said XP) would not do DNS lookups over v6 -- only v4, though they'd happily go to v6 servers once they got the DNS records.
IPv6 security features are a double-edged sword: yes, you can set up encrypted VPNs, but so can botnets. Security vendors are behind on this; he's watching for neat tricks that'll allow you to figure out private keys for traffic and thus decrypt eg. botnet C&C, but it's not there yet. (My notes are fuzzy on this, so I may have it wrong.)
Multicast is an attack-and-discovery protocol, and he's a bit worried about possible return of reflection attacks (Smurfv6). He's hopeful that the many, many lessons learned since then mean it won't happen, but it is a whole new protocol nd set of stacks for baddies to explore and discover. (RFC 4942 was apparently important...)
Proxies are good for v4-only hosts: mod_proxy, squid and 6tunnel have worked well (6tunnel in particular).
Gotchas: reverse DNS can be painful, because v6 macros/generate statements don't work in BIND yet; IPv6 takes precedence in most cases, so watch for SSH barfing when it suddenly starts seeing new hosts.
Next up: Internet on the Edge, a good war story about bringing wireless through trees for DARPA that won best PER. Worth watching. (Later on, I happened across the person who presented and his boss in the elevator, and I congratulated him on his presentation. "See?" says his boss, and digs him in the ribs. "He didn't want to present it.")
Finally there was the report from one of the admins who helped set up Blue Gene 6, purchased from IBM. (The speaker was much younger than the others: skinny, pale guy with a black t-shirt that said GET YOUR WAR ON. "If anyone's got questions, I'm into that...") This report was extremely interesting to me, especially since I've got an upcoming purchase for a (much, much smaller) cluster coming up.
Blue Gene is a supercomputer with something like 10k nodes, and it uses 10GB/s Myrinet/Myricom (FIXME: Clarify which that is) cards/network for communication. Each node does source routing, and so latency is extremely low, throughput correspondingly high, and core routers correspondingly simple. To make this work, every card needs to have a map of the network so they know where to send stuff, and that map needs to be generated by a daemon that then distributes the map everywhere. Fine, right? Wrong:
The Myricom switch is admin'd by a web interface only: no CLI of any sort, no logging to syslog, nothing. Using this web interface becomes impractical when you've got thousands of nodes...
There's an inherent fragility in this design: a problem with a card means you need to turn off the whole node; a problem with the mapping daemon means things can get corrupt real quick.
And guess what? They had problems with the cards: a bad batch of transceivers meant that, over the 2-year life of the machine, they lost a full year's worth of computing. It took a long time to realize the problem, it took a long time to get the vendor to realize it, and it took longer to get it fixed (FIXME: Did he ever get it fixed?)
So, lessons learned:
Vendor relations should not start with a problem. If the first time you call them up is to say "Your stuff is breaking", you're doomed. Dealing with vendor problems calls for social skills first, and tech skills second. Get to know than just the sales team; get familiar with the tech team before you need them.
Know your systems inside and out before they break; part of their problem was not being as familiar with things as they should have been.
Have realistic expectations when someone says "We'll give you such a deal on this equipment!" That's why they went w/Myricom -- it was dirt cheap. They saved money on that, but it would have been better spent on hiring more people. (I realize that doesn't exactly make sense, but that's what's in my notes.)
Don't pay the vendor 'til it works. Do your acceptance testing, but be aware of subcontractor relations. In this case, IBM was providing Blue Gene but had subcontracted Myricom -- and already paid them. Oops, no leverage. (To be fair, he said that Myricom did help once they were convinced...but see the next point.)
Have an agreement in advance with your vendor about how much failure is too much. In their case, the failure rate was slow but steady, and Myricom kept saying "Oh, let's just let it shake out a little longer..." It took a lot of work to get them to agree to replace the cards.
Don't let vendors talk to each other through you. In their case, IBM would tell them something, and they'd have to pass that on to Myricom, and then the process would reverse. There were lots of details to keep track of, and no one had the whole picture. Setting up a weekly phone meeting with the vendors helped immensely.
Don't wait for the vendors to do your work. Don't assume that they'll troubleshoot something for you.
Don't buy stuff with a web-only interface. Make sure you can monitor things. (I'm looking at you, Dell C6500.)
Stay positive at all costs! This was a huge, long-running problem that impaired an expensive and important piece of equipment, and resisting pessimism was important. Celebrate victories locally; give positive feedback to the vendors; keep reminding everyone that you are making progress.
Question from me: How much of this advice depends on being involved in negotiations? Answer: maybe 50%; acceptance testing is a big part of it (and see previous comments about that) but vendor relations is the other part.
I was hoping to talk to the presenter afterward, but it didn't happen; there were a lot of other people who got to him first. :-) But what I heard (and heard again later from Victor) confirmed the low opinion of the Myrinet protocol/cards...man, there's nothing there to inspire confidence.
And after that came the talk by Adam Moskowitz on becoming a senior sysadmin. It was a list of (at times strongly) suggested skills -- hard, squishy, and soft -- that you'll need. Overarching all of it was the importance of knowing the business you're in and the people you're responsible to: why you're doing something ("it supports the business by making X, Y and Z easier" is the correct answer; "it's cool" is not) , explaining it to the boss and the boss' boss, respecting the people you work with and not looking down on them because they don't know computers. Worth watching.
That night, Victor, his sister and I drove up to San Francsisco to meet Noah and Sarah at the 21st Amendment brewpub. The drive took two hours (four accidents on the way), but it was worth it: good beer, good food, good friends, great time. Sadly I was not able to bring any back; the Noir et Blanc was awesome.
One good story to relate: there was an illustrator at the party who told us about (and showed pictures of) a coin she's designing for a client. They gave her the Three Wolves artwork to put on the coin. Yeah.
Footnotes:
+10 LART of terror. (Quote from Matt.)
I raise my glass to the cut-and-dried,
To the amplified
I raise my glass to the b-side.
-- Sloan, "A-Side Wins"
Tuesday morning I got paged at 4:30am about /tmp filling up on a webserver at work, and I couldn't get back to sleep after that. I looked out my window at Venus, Saturn, Spica and Arcturus for a while, blogged & posted, then went out for coffee. It was cold -- around 4 or 5C. I walked past the Fairmont and wondered at the expensive cars in their front parking space; I'd noticed something fancy happening last night, and I've been meaning to look it up.
Two buses with suits pulled up in front of the Convention Centre; I thought maybe there was going to be a rumble, but they were here for the Medevice Conference that's in the other half of the Centre. (The Centre, by the way, is enormous. It's a little creepy to walk from one end to the other, in this enormous empty marble hall, followed by Kenny G the whole way.)
And then it was tutorial time: Cfengine 3 all day. I'd really been looking forward to this, and it was pretty darn good. (Note to myself: fill out the tutorial evaluation form.) Mark Burgess his own bad self was the instructor. His focus was on getting things done with Cfengine 3: start small and expand the scope as you learn more.
At times it dragged a little; there was a lot of time spent on niceties of syntax and the many, many particular things you can do with Cf3. (He spent three minutes talking about granularity of time measurement in Cf3.)
Thus, by the 3rd quarter of the day we were only halfway through his 100+ slides. But then he sped up by popular request, and this was probably the most valuable part for me: explaining some of the principles underlying the language itself. He cleared up a lot of things that I had not understood before, and I think I've got a much better idea of how to use it. (Good thing, too, since I'm giving a talk on Cf2 and Cf3 for a user group in December.)
During the break, I asked him about the Community Library. This is a collection of promises -- subroutines, basically -- that do high-level things like add packages, or comment-out sections of a file. When I started experimenting with Cf3, I followed the tutorials and noticed that there were a few times where the CL promises had changed (new names, different arguments, etc). I filed a bug and the documentation was fixed, but this worried me; I felt like libc's printf() had suddenly been renamed showstuff(). Was this going to happen all the time?
The answer was no: the CL is meant to be immutable; new features are appended, and don't replace old ones. In a very few cases, promises have been rewritten if they were badly implemented in the first place.
At lunch, I listened to some people in Federal labs talk about supercomputer/big cluster purchases. "I had a thirty-day burnin and found x, y and z wrong..." "You had 30 days? Man, we only have 14 days." "Well, this was 10 years ago..." I was surprised by this; why wouldn't you take a long time to verify that your expensive hardware actually worked?
User pressure is one part; they want it now. But the other part is management. They know that vendors hate long burn-in periods, because there's a bunch of expensive shiny that you haven't been paid for yet getting banged around. So management will use this as a bargaining chip in the bidding process: we'll cut down burn-in if you'll give us something else. It's frustrating for the sysadmins; you hope management knows what they're doing.
I talked with another sysadmin who was in the Cf3 class. He'd recently gone through the Cf2 -> Cf3 conversion; it took 6 months and was very, very hard. Cf3 is so radically different from Cf2 that it took a long time to wrap his head around how it/Mark Burgess thought. And then they'd come across bugs in documentation, or bugs in implementation, and that would hold things up.
In fact, version 3.1 has apparently just come out, fixing a bug that he'd tripped across: inserting a file into the middle of another file truncated that file. Cf3 would divide the first file in two (as requested), insert the bit you wanted, then throw away the second half rather than glom it back on. Whoops.
As a result, they're evaluating Puppet -- yes, even after 6 months of effort to port...in fact, because it took 6 months of effort to port. And because Puppet does hierarchical inheritance, whereas Cf3 only does sets and unions of sets. (Which MB says is much more flexible and simple: do Java class hierarchies really simplify anything?)
After all of that, it was time for supper. Matt and I met up with a few others and headed to The Loft, based on some random tweet I'd seen. There was a long talk about interviews, and I talked to one of the people about what it's like to work in a secret/secretive environment.
Secrecy is something I keep bumping up against at LISAs; there are military folks, government folks (and not just US), and folks from private companies that just don't talk a lot about what they do. I'm very curious about all of this, but I'm always reluctant to ask...I don't want to put anyone in an awkward spot. OTOH, they're probably used to it.
After that, back to the hotels to continue the conversation with the rapidly dwindling supplies of free beer, then off to the Fedora 14 BoF that I promised Beth Lynn I'd attend. It was interesting, particularly the mention of Fedora CSI ("Tonight on NBC!"), a set of CC-licensed system administration documentation. David Nalley introduced it by saying that,if you change jobs every few years like he does, you probably find yourself building the same damn documentation from scratch over and over again. Oh, and the Fedora project is looking for a sysadmin after burning through the first one. Interesting...
And then to bed. I'm not getting nearly as much sleep here as I should.
Growing up was wall-to-wall excitement, but I don't recall
Another who could understand at all...
-- Sloan
Monday: day two of tutorials. I found Beth Lynn in the lobby and congratulated her on being very close to winning her bet; she's a great deal closer than I would have guessed. She convinced me to show up at the Fedora 14 BoF tomorrow.
First tutorial was "NASes for the Masses" with Lee Damon, which was all about how to do cheap NASes that are "mostly reliable" -- which can be damn good if your requirements are lower, or your budget smaller. You can build a multi-TB RAID array for about $8000 these days, which is not that bad at all. He figures these will top out at around 100 users...200-300 users and you want to spend the money on better stuff.
The tutorial was good, and a lot of it was stuff I'd have liked to know about five years ago when I had no budget. (Of course, the disk prices weren't nearly so good back then...) At the moment I've got a good-ish budget -- though, like Damon, Oracle's ending of their education discount has definitely cut off a preferred supplier -- so it's not immediately relevant for me.
QOTD:
Damon: People load up their file servers with too much. Why would you put MSSQL on your file server?
Me: NFS over SQL.
Matt: I think I'm going to be sick.
Damon also told us about his experience with Linux as an NFS server: two identical machines, two identical jobs run, but one ran with the data mounted from Linux and the other with the data mounted from FreeBSD. The FreeBSD server gave a 40% speed increase. "I will never use Linux as an NFS server again."
Oh, and a suggestion from the audience: smallnetbuilder.com for benchmarks and reviews of small NASes. Must check it out.
During the break I talked to someone from a movie studio who talked about the legal hurdles he had to jump in his work. F'r example: waiting eight weeks to get legal approval to host a local copy of a CSS file (with an open-source license) that added mouseover effects, as opposed to just referring to the source on its original host.
Or getting approval for showing 4 seconds of one of their movies in a presentation he made. Legal came back with questions: "How big will the screen be? How many people will be there? What quality will you be showing it at?" "It's a conference! There's going to be a big screen! Lots of people! Why?" "Oh, so it's not going to be 20 people huddled around a laptop? Why didn't you say so?" Copyright concerns? No: they wanted to make sure that the clip would be shown at a suitably high quality, showing off their film to the best effect. "I could get in a lot of trouble for showing a clip at YouTube quality," he said.
The afternoon was "Recovering from Linux Hard Drive Disasters" with Ted T'so, and this was pretty amazing. He covered a lot of material, starting with how filesystems worked and ending with deep juju using debugfs. If you ever get the chance to take this course, I highly recommend it. It is choice.
Bits:
ReiserFS: designed to be very, very good at handling lots of little files, because of Reiser's belief that the line between databases and filesystems should be erased (or at least a lot thinner than it is). "Thus, ReiserFS is the perfect filesystem if you want to store a Windows registry."
Fsck for ReiserFS works pretty well most of the time; it scans the partition looking for btree nodes (is that the right term?) (ReiserFS uses btrees throughout the filesytem) and then reconstructs the btree (ie, your filesystem) with whatever it finds. Where that falls down is if you've got VM images which themselves have ReiserFS filesystems...everything gets glommed together and it is a big, big mess.
BtrFS and ZFS both very cool, and nearly feature-identical though they take very different paths to get there. Both complex enough that you almost can't think of them as a filesystem, but need to think of them in software engineering terms.
ZFS was the cure for the "filesystems are done" syndrome. But it took many, many years of hard work to get it fast and stable. BtrFS is coming up from behind, and still struggles with slow reads and slowness in an aged FS.
Copy-on-write FS like ZFS and BtrFS struggle with aged filesystems and fragmentation; benchmarking should be done on aged FS to get an accurate idea of how it'll work for you.
Live demos with debugfs: Wow.
I got to ask him about fsync() O_PONIES; he basically said if you run bleeding edge distros on your laptop with closed-source graphics drivers, don't come crying to him when you lose data. (He said it much, much nicer than that.) This happens because ext4 assumes a stable system -- one that's not crashing every few minutes -- and so it can optimize for speed (which means, say, delaying sync()s for a bit). If you are running bleeding edge stuff, then you need to optimize for conservative approaches to data preservation and you lose speed. (That's an awkward sentence, I realize.)
I also got to ask him about RAID partitions for databases. At $WORK we've got a 3TB disk array that I made into one partition, slapped ext3 on, and put MySQL there. One of the things he mentioned during his tutorial made me wonder if that was necessary, so I asked him what the advantages/disadvantages were.
Answer: it's a tradeoff, and it depends on what you want to do. DB vendors benchmark on raw devices because it gets a lot of kernel stuff out of the way (volume management, filesystems). And if you've got a SAN where you can a) say "Gimme a 2.25TB LUN" without problems, and b) expand it on the fly because you bought an expensive SAN (is there any other kind?), then you've got both speed and flexibility.
OTOH, maybe you've got a direct-attached array like us and you can't just tell the array to double the LUN size. So what you do is hand the raw device to LVM and let it take care of resizing and such -- maybe with a filesystem, maybe not. You get flexibility, but you have to give up a bit of speed because of the extra layers (vol mgt, filesystem).
Or maybe you just say "Screw it" like we have, and put a partition and filesystem on like any other disk. It's simple, it's quick, it's obvious that there's something important there, and it works if you don't really need the flexibility. (We don't; we fill up 3TB and we're going to need something new anyhow.)
And that was that. I called home and talked to the wife and kids, grabbed a bite to eat, then headed to the OpenDNS BoF. David Ulevitch did a live demo of how anycast works for them, taking down one of their servers to show the routing tables adjust. (If your DNS lookup took an extra few seconds in Amsterdam, that's why.) It was a little unsettling to see the log of queries flash across the screen, but it was quick and I didn't see anything too interesting.
After that, it was off to the Gordon Biersch pub just down the street. The food was good, the beer was free (though the Marzen tasted different than at the Fairmont...weird), and the conversation was good. Matt and Claudio tried to set me straight on US voter registration (that is, registering as a Democrat/Republican/Independent); I think I understand now, but it still seems very strange to me.
Hey you!
We've been around for a while.
If you'll admit that you were wrong, then we'll admit that we're right.
-- Sloan
After posting last night, a fellow UBCianiite and I went looking for drinks. We eventually settled on the bar at the Fairmont. The Widsomething Imperial IPA was lovely, as was the Gordon Biersch (spelling, I'm sure) Marzen...never had a Marzen before and it was lovely. (There was a third beer, but it wasn't very good. Mentioning it would ruin my rhythm.) What was even lovelier was that the coworker picked up the tab for the night. I'm going to invite him drinking a lot more from now on.
Sunday was day one of tutorials. In the morning was "Implementing DNSSEC". As some of the complaints on Twitter mentioned, the implementation details were saved for the last quarter of the tutorial. I'm not very familiar with DNSSEC, though, so I was happy with the broader scope...and as the instructor pointed out, BIND 9.7 has made a lot of it pretty easy, and the walkthrough is no longer as detailed as it once had to be.
Some interesting things:
He mentioned not being a big believer in dynamic zones previously...and now he runs 40 zones and they're ALL dynamic. This is an especially nice thing now that he's running DNSSEC.
Rackspace is authoritative for 1.1 million zones...so startup time of the DNS server is important; you can't sit twiddling your thumbs for several hours while you wait for the records to load.
BIND 10 (did I mention he works for the ISC?) will have a database backend built right in. Not sure if he meant that text records would go away entirely, or if this would be another backend, or if it'd be used to generate text files. Still, interesting.
DNSSEC failure -- ie, a failure of your upstream resolver to validate the records/keys/whatever -- is reported as SERVFAIL rather than something more specific. Why? To keep (say) Windows 3.1 clients, necessary to the Education Department of the fictional state of East Carolina, working...they are not going to be updated, and you can't break DNS for them.
Zone signatures: root (.) is signed (by Verisign; uh-oh); .net is signed as of last week; .com is due next March. And there are still registrars that shrug when you ask them when they're going to support DS records. As he said, implement it now or start hemorrhaging customers.
Another reason to implement it now, if you're an ISP: because the people who will call in to notify you of problems are the techie early adopters. Soon, it'll be Mom and Dad, and they're not going to be able to help you diagnose it at all.
Go look at dnsviz.net
Question that he gets a lot: what kind of hardware do I need to serve X many customers? Answer: there isn't one; too many variables. But what he does suggest is to take your hardware budget, divide by 3, and buy what you can for that much. Congratulations: you now have 3 redundant DNS servers, which is a lot better than trying to guess the right size for just one.
A crypto offload card might be a good thing to look at if you have a busy resolver. But they're expensive. If your OS supports it, look into GPU support; a high-end graphics card is only a few hundred dollars, and apparently works quite well.
On why DNSSEC is important:
"I put more faith in the DNS system than I do in the public water system. I check my email in bed with my phone before I have a shower in the morning."
"As much as I have privacy concerns about Google, I have a lot more concerns about someone pretending to be Google."
On stupid firewall assumptions about DNS:
AOL triggered heartburn a ways back when replies re: MX records started exceeding 512 bytes...which everyone knew was impossible and/or wrong. (It's not.) Suddenly people had weird problems trying to email AOL.
Some version of Cisco's stateful packet inspection assumes that any DNS reply over 512 bytes is clearly bogus. It's not, especially with DNSSEC.
If I rem. correctly (notes are fuzzy on this point), a reply over 512 bytes gets you a UDP packet that'll hold what it can, with a flag set that says "query over TCP for the full answer please." But there are a large number of firewall tutorials that advise you to turn off DNS over TCP. (My own firewall may be set up like that...need to fix that when I get back.)
When giving training on DNS in early 2008, he came to a slide about cache poisoning. There was another ISC engineer there to help him field questions, give details, etc, and he was turning paler and paler as he talked about this. This was right before the break; as soon as the class was dismissed, the engineer came up to him and said, "How many more of those damn slides do you have?" "That's all, why?" "I can't tell you. But let's just say that in a year, DNSSEC will be a lot more important."
The instructor laughed in his face, because he'd been banging his head against that brick wall for about 10 years. But the engineer was one of the few who knew about the Kaminsky attack, and had been sworn to secrecy.
Lunch! Good lunch, and I happened, along with Bob the Norwegian, to be nearly first in line. Talked to tablemates from a US gov't lab, and they mentioned the competition between labs. They described how they moved an old supercomputer over beside a new supercomputing cluster, and got the top 500 cluster for...a week, 'til someone else got it. And there were a couple admins from the GPS division of John Deere, because tractors are all GPS-guided these days when plowing the fields.
Sunday afternoon was "Getting it out the door successfully", a tutorial on project management, with Strata Rose-Chalup. This was good; there were some things I already knew (but was glad to see confirmed), and a lot more besides...including stuff I need to implement. Like: if startup error messages are benign, then a) don't emit them, and b) at least document them so that other people (customers, testers, future coders) know this.
QOTD:
"What do you do if your product owner is an insane jackass?" "If your product owner is an insane jackass, then you have a typical product..." But srsly: many people choose to act like this when they feel they're not being listened to them. Open up your meetings and let them see what's on the table. Bring in their peers, too; that way their choice will be to act like a jackass in front of their peers, or to moderate their demands.
Tip from the audience: when faced with impossible requests, don't say "No". Just bring up the list of stuff you're already working on, and the requests/features/bugfixes that have already been agreed to, and ask them where this fits in. They'll either modify their request ('cos it's not that important to them), or you'll find a lot of other stuff moved out of your way ('cos that other stuff isn't that important to them).
After that was supper with Andy, who I hadn't seen since last year's LISA. We hit up a small Mexican place for supper (not bad), the Britannia Arms for a beer (where Matt tried to rope us into Karaoke and kept asking us to do "Freebird" with him), then the Fairmont hotel bar so Andy could get his Manhattan. (He's a bit intense about Manhattans.) It was a good time.
There's been debate and some speculation
Have you heard?
Sloan
I figure two months is long enough.
I'm at LISA again, this time in sunny San Jose. I took the train down this year (no reason, why do you ask?), which...well, it took a long time: I got on a bus to Seattle at 5:30am on Friday, and arrived at the San Jose train station at 10am on Saturday. I went coach; a sleeper would have been a nice addition, as the chairs are not completely comfortable for sleeping. (Probably would have got me access to the wireless too, which Amtrak's website does not mention is only available to T3h El33+.)
But oh, the leg room! I nearly wept. And the people-watching....my wife is the champ, but I can get into it too. Overheard snippets of conversation in the observation car were the best. Like this guy with silver hair, kind of like the man from Glad:
Silver: So yeah, she got into animal husbandry then and just started doing every drug on the planet. I mean, when I started doing pot, I told my parents. I told my grandparents. But she...I mean, EVERY drug on the planet.
Or the two blue-collar guys who met in the observation car and became best buds:
Buddy: Aw man, you ever go to the casinos? Now that I'm up in Washington now, I think I'm gonna check 'em out.
Guy: I dunno, I go with my friends sometimes. I don't gamble, but I'll have a few beers.
Buddy: You hear who's coming to the Tulalip? Joe Satriani, man. JOOOOOOOOOE. Joe Satriani!
Guy: Yeah, I'll hit the buffet...
And then later:
Silver: I knew it was a bad thing. I mean, she was a ten. I'm okay, but she was a TEN, you know what I mean? The other tenants were going to get jealous, and I only had enough of them to pay the mortage.
Buddy: (separate conversation) And we caught one of those red crabs when we were up in Alaska?
Guy: Man, you won't catch me eatin' that shit.
Silver: And then she says, do you mind if I take a trip up the mountains with this doctor I met? I say, what do I have to say about it?
Buddy: What? Man, they're good eatin'. We just dropped it in a pot and boiled the sonuvabitch.
Silver: And that's when I realize she thinks we're in a relationship. I guess she's got this thing about men.
I slept badly, woke up at 3:30am and read for a while before realizing that the book of disturbing scifi stories is not really good 3:30am reading. I watched San Francisco and San Jose approach from the observation car; tons and tons of industrial land, occasionally interrupted by beautiful parks and seashore.
San Jose came at last. I had thought about walking to the convention centre, but decided against it. Glad I did, since a) it's a little further than I thought; b) it's surprisingly warm here; c) more industrial land, and d) when I did go out walking later on I managed to get completely turned around twice. I was looking for Phillz Coffee, based on a recommendation from Twitter (can't bring myself yet to say "tweet"; give me six months) and got lost in Innitek land (complete with Adobe) and a Vietnamese neighbourhood before finding it at last. The coffee was pretty good; they have about two dozen varieties and they make it one cup at a time. Not sure it was worth $3.50 for a 12 oz coffee, though...NOT THAT I'M UNGRATEFUL. Thank you, @perwille.
Gotta say, downtown SJ on a Saturday is...dead. I saw maybe a dozen people in six blocks despite stores, a nearby university (they call them high schools here) and I think three museums. I have no idea where one might go for a fun time at night, but I imagine it involves another city.
So then I took a bus to sunny Cupertino. Why? To visit the retail outlet of Orion Telescopes. I've got into astronomy again (loved it as a kid), and I'm thinking of buying one of their telescopes in about a year. Since the store was only ten miles away, why not go? And since the bus goes right from the hotel to, well, pretty close, seems like it's a requirement.
Now that was fun; even more people-watching on the train. Like the Hispanic gentleman w/a handlebar moustache, a cowboy hat, tight polyester pants (he had the roundest buttocks I've ever seen on a man. I could only wonder in great admiration), a silk shirt with "K-Paz" embroidered on the back, and a button that said, in Spanish, something that was probably "DO X NOW! ASK ME HOW!" And the study in ringtones: the elderly Hispanic grandmother who had Mexican accordion music vs. the middle-aged African-American guy who had Michael Jackson's "Thriller." Man, you just don't get that where I come from.
And the contrast in neighbourhoods between San Jose (out of downtown, it was all Hispanic shops), Santa Clara ("ALL-AMERICAN CITY 2001" said the sign; Restoration Hardware to prevent white panic) and Cupertino (duelling car dealerships (Audi, Land Rover and Lexus) and antivirus companies (Symantec and Trend Micro); Critical Mass, only with scooters instead of bikes; Harley driver wearing a leather jacket with an Ed Hardy embroidered patch on the back).
Anyhow, the telescopes were neat; it was the first chance I'd really had to look at them closely. I didn't buy one (relax, Pre!). They didn't have a floor model of the one I really want, but I've got a better idea what the size, and what I want out of one.
And now...to post, then register. Which means going to the business centre, since Internet access costs $78/day at the Hilton with a 3KB daily cap. And the Russian mob's attempt to get my banking data by setting up a "Free DownTown WiFi" network is NOT GOING TO WORK, tvaritch.
Thursday night (November 5th...god I'm behind) was NIGHT OF BoFs. (Dramatic music!) First up was my conference organizer's BoF. In a nutshell: I wanna start a conference; what do I need to know?
There were only a handful of people there, but hey, quality not quantity:
Easiest part of organizing a conference: getting speakers. This surprised me, but everyone likes to talk about themselves. WIPs (work-in-progress posters/talks) will get everyone engaged.
Hardest part:
defining the scope/theme of your event. This is important because a) you need your elevator pitch and b) otherwise it's just Saint Aardvark's Conference About Totally Interesting Stuff, and if you don't happen to be SAtC (poor you!) you may not be all that interested.
the last week: death by a thousand papercuts + dread
Gotta have it:
Swag bag. Contact local (or not!) sponsors early. For some reason I'm hung up on t-shirts being TOTES ESSENTIAL, but this is not necessarily the case.
Chance to meet in advance; break the ice, get the newbies (and we're all newbies) to relax and make friends. If your event is on a Saturday, this is why Friday night was invented. Don't forget to have organizers working the floor.
Everyone in the same room for meals -- either bring it in, or have one place close by designated and ready. You don't want people scattering to the four winds to eat...they'll never come back. And make the vegetarians/vegans happy; if all they get to eat is crackers and soy bologna, you will hear about it.
Random tips:
Price the event according to what you aim to give people.
Think about having a fun track beside one or two serious tracks.
Record the sessions and offer Ogg/MP3 downloads. Don't forget slides and papers, too.
Lead time: 9 months probably isn't enough time to organize an event with 300-400 attendees...but 6 months should do for 50 attendees. (That's more the scale I'm aiming at.)
Careful with vendors; being sold at all day is a definite turnoff
Re: sysadmin conference in particular: Survey local businesses and see what they need, what they'd send people to see.
Always look for ways to delegate stuff, or you'll run yourself ragged.
Getting people back next year:
Finish your closing speech with "See you next year!" ie, ask people to come back, and to spread the word.
Meet within a month of finishing the conference with next year's organizers and start making plans. Put checklists and improvements on a wiki so that the info doesn't get lost.
Get new blood every year, both attendees and in the organizing committee.
Also got various contacts and other suggestions from people...thanks very much!
After that came Matt's two BoFs: small infrastructure and bloggers. Unfortunately, my notes suck from these two events...but it was good talk at both. I was surprised to see how many people were there because they're professional writers; I keep thinking of this as just my way of scribbling on the walls.
First up was Elizabeth Zwicky's talk on distinguishing data from non-data, and how to deal with each when solving problems. She warned us that she was not a statistician, and what she was going to say would probably give a real statistician hives, but that it would be useful for dealing with computers -- "nothing with an ethics board."
Her talk was laced with examples from her career...like the time she tried to track down missing truck axles from a major defense contractor; this was complicated by their complete lack of data collection ("How many do you make in a week?" "The schedule calls for 100." "How many of those are completed by Friday?" "We're not collecting that data."). Or the time she broke into her CEO's office ("It has a lock!") by pushing up a ceililng tile, then reaching down with a coat hanger and pulling up the handle. Lesson learned: "If it stops at the ceiling, it's not really a wall."
Funny stories aside (and they were funny; I recommend listening to the talk), the point was the danger of assuming too much from initial observations -- we schedule X, so we must produce X; it looks like a wall, so it must be impervious. Data is observations, numbers with context -- not hearsay, or conclusions, or numbers without context. Again, listen to the talk; it's worth your time.
Hell, download every MP3 on this page and listen to them; that's what I'm going to do, and I've been to some of them.
Okay, after that came the refereed papers. Mostly I was there for the SEEdit paper, which describes the SEEdit tool (available on Sourceforge!) for editing/creating SELinux policy in a high-level language. After what Rik Farrow said about policy approaching his rule-of-thumb for human comprehension, I was interested to see if this could be used to generate/edit the existing policy. I tried asking this, but I don't think I made myself clear...and I meant to follow up with the presenter later, but I didn't. My bad.
The paper on the SSH-based toolkit was interesting, but it seemed complex; from what I could gather, you SSHd to a machine, then forwarded connections to (say) POP or SMTP over the tunnnel to a daemon at the other end, which would then forward it to the right destination. It kept seeming kludgy and complicated to me, especially compared to something like authpf plus the usual sort of encryption that should be on (say) POP or SMTP to start with. I asked him about this, and he wasn't familiar with authpf; he did say it was similar to another sort of tool, which I didn't write down in my notes. I'm guessing that I missed something.
With that the conference was over for the day; my roommate used my CD to install Ubuntu on his laptop (I knew bringing it along would come in handy!).
(Turns out you need at least three good, verbose albums to come up with that many quotable lyrics.)
Thursday morning (November 5 2009):
While waiting for the room to fill up for the Planck telescope talk, I had a ponies moment and realized that Tobi Oetiker has the coolest Beatles haircut ever. That is all.
The Planck (pronounced almost like "plonk") telescope is going to give the highest resolution maps of the cosmic microwave background, and it's going to be dealing with a metric fuckton (my words) of data -- on the order of 10^12 observations, or 10^8 sky pixels, or 10^4 power spectra (which is where the really interesting data is). To do this, you need a metric fuckton of computing power, and that's NERSC...which, the presenter said, has gone from being a data producer to a data sink, as more stuff comes in to be processed. (Even that has changed; scaling limits and other constraints have changed the math that they use to analyze the data.)
To handle all this data, they use a variety of techniques and hardware:
They've got 60PB of storage in 10 Sun Ultrium 4tape libraries (but as he said later, that's a made-up number based on maximum capacity; in order to maximize retrieval times, they use a mix of Ultrium 3 and Ultrium 4)
A 130 TB disk cache (!)
About 400TB of storage in GPFS
"One of the tricks to doing large data is: don't use I/O." Fast I/O is great, but avoiding it entirely is better. One byte/s of I/O is about 1000x the cost of one FLOP/s. It's easier to calculate it and keep it in memory than to look it up again.
Having common data models across the community of users, to avoid duplication/remunging of data; it's a social challenge as much as a technical challenge, but addressing it early pays off.
And remember: data from observations and experiments tends to increase in value over time (due to new analysis techniques), while data from simulations decreases in value over time (as computing capacity increases).
One question from the audience: Do you use GPU computing? A: No; lack of ECC is the biggest reason. PCI speed also a factor, but we already deal w/different speeds in different subsystems.
After that came the presentation for Anton, which is a specially-built supercomputer for molecular dynamics simulations. It was an interesting talk, and I'll be pointing one of the faculty members I work with at the slides and paper when they're available. Top quote: "Our user community is faster than our monitoring system."
Google has just relased a new firewall generatool called Capirca. I'm in the middle of the presentation right now and it's very exciting. It not only generated firewal ACLs for Cisco, Juniper and iptables but it also will VALIDATE them against netflow info. No support yet for OpenBSD's pf but they say it should be easy to add. And (correction) Apache-licensed to boot!
Ha! Slides here!
Wednesday:
Many miles wandering from room to room
Many trees slain just to write it to you...
"Soundtrack to Mary", Soul Coughing
Wednesday started with a test of the Emergency Viva System. My roommate had to defend a thesis with the University of Manchester, and they'd told him they were going to do it over the phone today at about mid-morning our time. What they didn't tell him was that they were going to call at 5am our time to make sure the phones worked.
So I got an early start to the day. I wrote yesterday's entry, then wandered down to the lobby to get coffee from the coffee shop (which had a sign saying "Now serving...Oatmeal and Grits". Hurl) and a free cinnamon bun from a sweet little old lady (no, really) in a hotel uniform. I met Matt and Bob the Norwegian (#6, I believe), where we discussed:
Matt: That's it, I give up. I've got eye cancer.
Bob the Norwegian: You've got eye cancer? You're crazy.
Me: ...said the guy with the 8 versions of the Gummi Bear theme song on his music player...
Bob the Norwegian: 8 languages. I have more versions than that at home. Want to hear the ska version?
Me: ...so you're in no position to throw the crazy brick around this room.
And then it was...opening time! As it happened I grabbed a seat right up at the front, and noticed Dr. Werner Vogel, CTO of Amazon.com, standing at the wall a couple feet to my left checking his email and waiting to give the keynote speech. "Oh...hello. I thought you'd be wearing a suit." "Nah." Jeans, Harley-Davidson t-shirt, denim long-sleeved shirt untucked.
Highlights from Adam Moskovitz' speech (he's the organizer):
Very quick speech; he knows his stuff.
And then it was the keynote. Dr. Vogel was talking about Amazon Web Services. This was interesting and entertaining and fascinating and all kinds of good gubbins. Highlights:
He gave the example of Animoto, which is a startup that figured out how to detect rhythm and melody changes in music. They use it to automatically generate slideshows using slides submitted by users, or grabbed from their Flickr album. They offer a 30-second snippet, and then you can pay $x.95 to get the full version.
He showed one that used photos of him at a conference, and I forget what the music was but it was very disco-y and made the thing jaw-dropping, both because of the cheese and because the thing was utterly, completely addictive.
He showed a graph of their orders; it was climbing slowly from April 16th through the 18th, and then they released a Facebook app on April 19th. The app would grab pictures from a photo album, compose the slideshow, then notify all the user's friends that they had something cool to watch.
The graph went exponential. They had 25,000 customers signing up per hour. Their conversion rate is astonishingly high, because they ensured that the slideshow was available in 5 minutes or less.
And they own no servers at all: it's all done with Amazon virtual machines. They went from using 50 machines to a peak of 3500. "They're just a bunch of guys in New York with laptops; they use Amazon as their server park. Can you imagine going to VCs and saying, 'Give us $5 million 'cos we're going to release a Facebook app'?"
I thought it was really, really well done and interesting -- aside from one pretty noticeable hiccup. However, others disagreed. The USENIX summary is here. When the recordings/slides are up, I'll post a link.
Wednesday (cont):
Put the fake goatee on
And it moves as cool as sugar free jazz.
"Sugar Free Jazz", Soul Coughing
During the break I got into a conversation with Ali and George about cfengine and Python. I recommended "Dive into Python", and George agreed; "There's no time for yet another 'hello, world!' programming book."
And then I met up with Noah from MIT. w00t! I hadn't known he was coming, but then on Monday he was called by the Rock Star Sysadmin o' the Year' contest guys, who asked if he was coming: "No, not in the budget this year." "Really? Are you sure you're not coming?" "Um..." So here he was. We ducked briefly into the GUru session on Zenoss, but it was not for us and we moved on to the papers session.
The first one was "Pushing Boulders Uphill: The Difficulty of Network Intrusion Recovery". And holy cow, they weren't kidding. The state of the art for intrusion recovery, as the presenter said, is wipe and reinstall from backups. Okay, maybe you can do that with one or two machines -- maybe even a few more than that. But what do you do when your system is massively compromised? When there aren't just some Code Red packets but when every single machine has a rootkit?
Reinstalling from backups is no longer satisfying, and yet no one wants to share solutions they might have come up with: "What, I should put it on my resume? 'Got pantsed in front of Slashdot.' I don't think so." So, without identifying the people involved, he shared the story for the purpose of "adding to the lore" (great term).
In a nutshell, an academic department at an American university had its gold server, from which they pushed updates to one thousand workstations, got compromised. Now the workstations had rootkits in them. They only found this out by accident when various processes were crashing in weird ways. And they found it out in the middle of December, right before exams and Xmas, right before half their IT staff was leaving for unrelated reasons. (You could hear gasps around the room as the story was told. Six of those were mine.)
So what do you do? Do you take everything offline and screw over the students? Do you reset passwords? They didn't know exactly when the compromise had occurred, so backups were out. That left reinstalling -- but with what? Same distro, when you don't know if it's vulnerable, or something else? How do you make sure it's all going to work? The state of the art addresses very little of this, and does nothing to help with the entirely reasonable gut-clenching panic.
(I admit I have not read the paper yet. But once I get some time, it's going to be one of the first.)
The second paper I tuned out of, only to hear Tom Limoncelli get up at the question time and say, "I think this paper is crazy. I think that's good, because LISA needs more crazy papers. But I wonder if you realize how crazy it is." The speaker nodded and said, "Oh, yes."
The third paper was a comparison of two big mail migrations...again, it had the feel of adding to the lore (a good thing). It was an entertaing story, well told, about how all the preparation they'd done had not covered every eventuality. The presenter mentioned that one of the reviewer's comments was "You must not have done enough testing." "And I thought: I know! I'm in the future now, too!" They finished their talk with a video of raised flooring packing foam air hockey...fun times.
During the break I talked to a woman who was attending the conference for free, in return for volunteering at the USENIX desk. She ran her own business, and with the economy tanking she'd had to lay off everyone but herself...which meant that she was the sysadmin, too. She has computer experience but no sysadmin experience, so she came here to learn. I sold her on joining LOPSA by talking about how much the mailing lists had helped me.
The talk on Eucalyptus was next, and man, do I have mixed feelings about this presentation. On the one hand, cool stuff: open-source implementation of the AWS API so that researchers can have an actual cloud (based on the only instance of a cloud that everyone agrees on) to do research. What could be wrong with that?
OTOH, the way this guy talked gave me the same feeling as when I read Marshall McLuhan: it's English, but not as I know it. The one example I wrote down (he spoke at about 300 wpm) was when he described a server as "an aggregated set of state updates." That said, my roommate (who's doing a Ph.D. in this sort of thing) thought he was brilliant, so I'm perfectly willing to admit I may have been out of my depth at times.
He was quite funny at times:
"At the end of the first week after the release, there was a cadre of users who had root who wanted desperately to remove it from their machines." -- on the sysadmin-vs-researcher fight in grid computing (not the cloud stuff he's doing now.)
"If you do an open-source project like this, people often want to tell you things. A lot. And they want to tell you at 4 am."
And one last thing: he said he was quite impressed with Amazon's API. He kept seeing cases where people would change the API, as Eucalyptus had implemented it, in an attempt to improve it; the changes would almost invariably lower the amount that Eucalyptus could scale.
The LOPSA meeting was that night, and it was interesting. They're up to about 500 members, but they need more -- partly to keep it growing and partly to get access to things like O'Reilly Safari. (The magic number for stuff like that is 1000 members.) They mentioned the ties they're making with other countries -- Australia, Ireland, a group in India, "and we've just been talking with someone who wants to start a converence in Vancouver."
Lightning talks! In the spirit of the thing, bullet-point summaries:
(If I've missed any, let me know.)
I talked to the organizer afterward and asked how many people he'd had sign up in advance; the answer was none, and he'd had to go after people in hallways to get them to present. I felt bad for not doing so...I had meant to but I got distracted. Next time, I will Do The Right Thing!(tm)
Rock Star Sysadmin of the Year award...first the good: both Matt and Noah got Finalist and Runner-Up awards (respectively). This is cool and all the winners are to be congratulated. There were cool prizes given out, and the grand prize winner donated his to charity. There was cake. Yay everyone!
Now the bad: my cheeseometer was pinned. As someone pointed out, the presenter looked like Guy Smiley; he had spiky marketer hair and was just smarmy. And the band, for reasons I can only guess at, was the pet band of a guy who's a cake chef/baker in Baltimore and has a TV show about cakes that he makes. I thought the music was awful (but then, Noah liked it a lot and he's the one with the sysadmin prize :-), but more than that it was loud. Fortunately I had earplugs or there would've been blood running out of my ears.
(No, you're old!)
Oh, and there were TV cameras (marketing material? next week's cake episode? memo to myself: must tape cake show) filming the women (who I think were there with the vendor but I could be wrong about that) dancing up at the front of the stage; what the cameras didn't show was that they were pretty much the only dancers up there.
There was an escape to the LOPSA suite. I signed up two more people, then headed off for the hotel bar with Noah and a few other folks. I meant to call it an early night, but that did not happen. Oh well.
Tuesday morning:
And I wondered with great admiration...
"Moon Sammy", Soul Coughing
I got up this morning to find that the weather was absolutely gorgeous; blue sky, sun, and a wonderful look to the part of Baltimore that I could see: church spires, ship's masts, brick towers. I took a short walk around the harbour and found a wooden clipper ship tied up close by. I was hoping I could get to the Constitution, but I think it was further off than I thought.
Back to the conference and to Tom Limoncelli's morning class on time management. I've already devoured his book (seriously, if you don't have it you need to; the link throws Tom a few shekels) and I was looking forward to his course. It was a new approach to time management, based on the idea of looking ahead at your day and treating it accordingly. A day filled with meetings would be focused on making those meetings productive; a day without meeting would be focused on focus itself, making the most of those (blessed, blessed) long stretches of time and handling interrupts.
Some of the material was straight out of TMMSA; after all, the basics are in there. Also, the course was only a half day and that limited the amount of material that could be presented, new or not. And much of the material was aimed, I think, at much larger departments than my own (which == 1), which did limit some of the applicability to my situation.
But. Tom is a wonderful speaker and presenter, and it's well worth going to his course if you haven't before. The course was packed, as was his afternoon course, and I saw at least one guy who was attending Tom's course for the second time. And there was some new material in there that I noted for immediate use.
Some quotes:
On the inapplicability of other time management systems to our profession: "System administration is not like real life."
On the problem of mentoring: "If your boss is technical, she can't give you advice; she's just as screwed up as you."
On using the term "meeting" in his training to mean any large, immovable block of time: "A change window is sort of like a meeting with a router."
"The benefit of a paper planner is it can't play games. I check my calendar with my iPhone, and...let's play Tetris! The paper planner does have a Tic-Tac-Toe implementation, but it's single-user...it gets boring, so I quit. It'd be worse if I lost."
On limiting distractions at your workstation: "I don't know what IM client you're using, but I bet it has a quit feature."
LISA does this every year where you have to go around getting signatures from people; it's a good ice-breaker. In 2006 the organizers had their pictures on the card; this year, it was a scavenger hunt. You had to find someone who had, say, a LISA t-shirt on, or was part of the program board, or supported more than 1000 users. Ten signatures got you a spin of the prize wheel at the registration desk.
At the beginning of his class, Tom asked people from the audience for help filling out his card. ("The trick to doing the card well is to have a PA system. But we'll be talking about abusing power later on.") As it turns out, I was his tenth, since I have a Hallowe'en costume (the OMG PONIES shirt; I'm going to be Slashdot from April Fool's Day 2006). He got to sign my card (he's a vendor, since he's with the Google presence here), and he was my tenth. Card buddies 4ever!
At the break I went to spin the wheel; there was a woman in front of me who actually won a free prize to next year's LISA, which is damned cool. I got the "Jump To Conclusions" mat....no, but it is a little keychain where you press a button and one of three lights comes on: ACK, NAK and EQN. (Gotta verify what EQN means; enquiry?) It's cute.
And during lunch I actually went and napped. I've been up late and up early every morning this week -- there are just so many people to meet here! -- and I'm starting to feel it.
Tuesday afternoon:
And I hear a rumbling
I hear transmission grind
I bear witness
I have the clutch now...
"City of Motors", Soul Coughing
Tuesday afternoon was another Tom Limoncelli class for me: "Design Patterns for System Administrators". I think of design patterns as being a step above algorithms in the abstraction scale. (Tom told us that the term was first used in architecture and city planning; I need to add the titles for the books and maybe look them up too.) DP was a way of capturing passive knowledge: the knowledge you only get from experience.
The course was interesting, and I will be keeping the slides handy for future reference. It was also crowded -- there was not a free seat in the house. However, some of the material was already familiar from Tom's books, and some of it just did not apply to me because it was aimed at much bigger departments.
At the break I talked with Ludmilla, who managed to cram into my brain a better understanding of cross-site scripting attacks; this has always been a mysterious subject to me.
Stopped by the LOPSA desk to ask if they'd be interested in helping me at all with my (still vague and nebulous) sysadmin conference for Vancouver. They pinged the IRC channel (horrible mix of metaphors) and said sure, send an email. We talked about some upcoming changes on the LOPSA website, and I suggested sending a feed to planetsysadmin.com
For supper I headed out to a nice Italian restaurant with a few folks. I heard complaints about Red Hat support; an upgrade from RHEL 4 to RHEL 5 produced massive disk corruption on their SAN. Red Hat and the disk vendor pointed their fingers at each other for a year. Finally the disk vendor came out with a beta/testing firmware upgrade, which fixed the problem, but a final release has not come out yet. He's left deeply unimpressed with RHEL support: they were paying buckets of money and were left in the lurch. And I've heard that from a number of people here.
We got back late, so we hung out in the hallways talking to folks. I ended up talking to a sysadmin from the University of Alberta who, it turns out, can practically touch the OpenBSD FTP server from his desk. He talked about a move on the campus to switch to Google Mail for the entire university.
This was controversial a while back, when Lakehead University in Ontario tried it; one of the groups on campus (teacher's union?) sued because they said it violated privacy restrictions to place their email w/in reach of the Patriot Act. So I was surprised to hear that they were giving it another try. THere were two things that made this a not-wasted effort: first, apparently Ontario's privacy commissioner had ruled that email is just not private, so it was okay. The second is that UofA has invited the Alberta privacy commissioner to participate, so they're hoping to avoid any problems from the start.
So why are they doing it? First off it's free; Google gives it away to universities. Second, there are something like thirty separate email systems at UofA and no unified calendaring system. These are good things but it's interesting to hear of a university-wide concern about this; UBC is balkanized/decentralized to the point that implementing a campus-wide system like this would be pretty much a non-starter.
After a while I headed up to the LOPSA suite. One of the members said, "Hey, are you the Vancouver guy interested in starting a convention there? How would one or two speakers work?" Cazart! I made it clear that it's still in my head and I don't know what I'm doing...but OTOH a recent IT re-organization at UBC means that HR there is interested in making a clear career path for IT folks there, both in the central department and the individual faculties, so they may be interested in helping with this. And of course, university == cheap space in the summer. Anyhow, it's all early days and I still need to email them to remind them, but still...woot!
And then there was the guy who drove five hours after a regular workday to get to LISA. He'd come up on his own dime to organize a BOF but more importantly to make contacts; he's unhappy at his current job and wants to jump ship. "Man, I'm gonna stay here as long as it takes and if I gotta drive all night to get back at 9am, I'm doing it."
Well, I'm here to tell you that within THREE MINUTES he had two different guys fighting over him ("What's your specialty?...Damn! Yeah, talk to that guy...dammit, dammit dammit...") It was the feelgood story of the evening, and he was a damn friendly guy to boot. And when I left for the night, he was talking to Bill Lefebvre ("Hey, do you know who this guy is? He wrote top!").
I worked my magic (hot-cha!) throughout the night; persuaded Matt (almost) to join the FSF, and one of the 8 Norwegian sysadmin's I've met to join LOPSA (on sale! $10 off the rest of the week!). I asked Tom Limoncelli about my idea for training on "The n things a sysadmin must know about development"; he thought it was a good idea, suggested I look at the open-source tools that exist to help w/the situations I described, mentioned that Strata Rose-Chalup had pitched a book about this (but sadly the deal fell through), and suggested I get experience doing training, and doing training on this, by volunteering at my local LUG.
Finally, I spent a good bunch of time -- in both senses -- talking to a manager about what the appeal of the job was for him. He confirmed what the tutorial instructor had said: it is really, really neat to help people improve, to make the environment that allows them to do that and keeps them happy, and to see them get better and climb the ladder. It's not always easy and there are not-fun, difficult decisions to make, but the rewards are there.
I asked him if he'd always known he'd want to climb the ladder, or if this was something he found out later on. He thought a bit, and said that when he was younger he'd had a false sense of what was important; that not having a family had allowed him to fucus on tech fun to the exclusion of all else. Now that he was older and had kids, the long nights spent on tech was shifted to family, and his focus had switched to helping his team -- which was much more rewarding.
Monday night:
Los Angeles beckons the teenagers to come to her on buses
Los Angeles loves love
It is 5am, and you are listening to Los Angeles.
"Screewriter's Blues", Soul Coughing
Monday I met up with Donny and Ludmilla for supper...and who's there but Tobi Oetiker! Another chance for geekish hero worship, hurrah!
After thanking him for MRTG and RRDTool, I asked him what had happened to the call centre he had spent all that time debugging. He said that it was kind of in limbo: the troublesome app had been replaced by a web-based app and was slowly being rolled out...but since it didn't do everything the old app did the old one was being kept around and people were reluctant to upgrade. But because the old app was on the way out, no one wanted to spend money tracking down the problems with it. I have to say, I expect more neatly wrapped-up story endings from the people I admire. :-)
Also along were Walter and Kyle, two sysadmins from Boston's TERC. This was handy, because Kyle had lived once in Baltimore and was able to take us to DuClaw's brewpub, which was not too far from the hotel. The sampler included about 10 different beers:
Despite being from the German-speaking part of Switzerland, Tobi was not interested in drinking the beer, but appeared to be fascinated by the interest we took in it. Crazy Swiss, what are you gonna do?
Tobi also talked about coming to love JQuery and qooxdoo. Everyone kept asking him to repeat that name, and finally he wrote it down while we guessed how it was spelled. None of us were right, because we'd all been guessing crazy Dutch-German variations.
Kyle and Walter talked about their setup a bit. They're in kind of the same boat I am in that (being at an educational institute) funding is erratic yet the results (websites, curricula, etc) need to be around forever. Thus, they still have an NT4 web server which was only last month migrated to a VM. (Walter dulled the pain by asking the bartender to make him something sweet with rum. The procedure had to be repeated once, but then he was good to go.)
After that, we headed off to the James Joyce pub where OpenDNS was engaged in a COMPLETELY FUTILE attempt to gain my good will by buying the entire bar drinks all night. (Futile, do you hear?)
I didn't get to meet the OpenDNS folks, but that didn't stop Ludmilla from pasting OpenDNS stickers on everyone's shirt. And I did get to talk to another Norwegian sysadmin.
So he works for a Norwegian newspaper, whose website half of Norway starts their morning. (Apparently he went to a talk (previous LISA?) where Facebook was talking about their traffic levels; Facebook's traffic was less than their own and they used 1/5th the number of servers Facebook did.) They were using Squid in front of their webservers, but were looking for something to do better. Commercial/proprietary options didn't measure up. What to do?
Well, like any good Norwegian they decided to bring in a fellow Scandinavian. After determining that Linus Torvalds was not interested (not entirely sure how serious that part was), they asked Poul Henning-Kamp if he was interested; he wasn't. "I'm a kernel guy with 20 years of experience doing kernels," he said; "I'm just not interested in doing application work."
But then he comes back two weeks later and says, now that he's had some time to think about it, he is interested in the idea of a caching app that exploits the underlying OS to the hilt. N months later, Varnish was ready to go.
They roll it out at a big news conference, with The Register and others attending. Boss gives a speech while they watch the graph of request latency scroll across the screen; they throw the switch. The line go down from 300 ms to 30 ms and stays that way.
Also met Dan, who works for the U of Kansas Center for Remote Sensing of Ice Sheets. "I keep wanting to go down to Antarctica, but they keep not sending me there."
Monday afternoon:
Born to be a god among salesmen
Working the skinny tie
Slugging down fruit juice
Extra tall, extra wide
"Blueeyed Devil", Soul Coughing
Lunch time I talked with a gov't contractor who was in on the Hadoop tutorial. She talked about using a filesystem that was forty years old -- yes, that's a four zero -- which had lots of "warm" data (her term; I assume between hot caching and archiving to tape) cached to tape, but done very badly. The directory structure needs to be preserved, perhaps not at all costs but nearly; there are instances of old (maybe not 40 years old but close) documentation that refers to old paths that must not be broken. Interesting problem.
Also heard about this problem, which just gobsmacked me with its fullbore crazy.
Also, from other quarters, heard about a lab that lost its funding, which leaves it in a difficult position as it has a crapload of old G4s or G5s, watercooled, about half of which they discovered are leaking...
(Trying not to turn into Perez Hilton here. Not sure how well I'm doing.)
In the afternoon I took the Packaging for Sysadmins tutorial, which would have been much better (IMHO) handled as a hands-on workshop. I came back for the second half, but honestly it was a close thing...and yet when someone asked him, the instructor dropped gems of info about Func and Cobbler, which I'm going to be looking into as soon as I can.
During the break I talked with Derek, who's a sysadmin at a NYC trading firm. This was an absolutely fascinating talk, and only partly because I wasn't really aware of the whole high-frequency/low-latency trading...um, culture? algorithm? So:
He has small data centres -- like, racks -- scattered across NYC in order to be rilly rilly close to the exchanges. Also works well for redundancy. The colos that are close to the exchanges are filled with fellow trading firms.
The idea is that if you get your data from the exchange soon, analyze it soon, then get an order back to the exchange soon, you can make a lot of money. As a result, a 2.5 ms difference, like in swimming or 100-metre dashes, is absolutely huge.
Improvements in speed are looked for all over the place: RT Linux, not running NTP on machines (partly because of the overhead it introduces, and partly because it doesn't have high enough resolution; better to take timing info from ethernet frames, which'll get you down to 7.03 nanosecond acccuracy), RT Java (which I didn't know existed), and even running apps on switches that run Linux (which, yes, may be slower than big servers, but are that much closer to the exchange and so it makes up for it0
So his server rooms are small-ish and many (which if I was a better man I could turn into a full-on Dr. Seuss book), but get this: a trader's desk will have four workstations at it, each with four big-ass monitors sitting on top of their desk so that they can monitor the stocks they're trading. His power and cooling issues are at the desktop as well as in the server room. Madness.
After that back to my room, where my roommate (who's British) and I wondered at the madness of looking to the UK Conservatives for relief from a right-wing Labour agenda. Madness upon madness.
Monday morning:
I've seen the rains of the real world come forward on the plains
I've seen the Kansas of your sweet little myth...
I'm half-drunk on babble you transmit
Through your true dreams of Wichita.
"True Dreams of Wichita", Soul Coughing
This morning I had the SELinux tutorial, held by Rik Farrow. I took a moment to shake hands with Rik Farrow, who's teaching this class, and tell him that ;login: magazine, like, changed my life, man, you know?. If you haven't picked up copies of that magazine/journal, you owe it to yourself to do so. (And if you have and you agree with me, send him an email -- he usually only gets email as editor when there's a problem.)
Matt was there, as was Jay, who I met back in 2006.
The course was quite interesting. Some choice bits:
"How many of you are using SELinux?" (Two hands) "How many of you have disabled SELinux?" (a hundred hands and six tentacles; yes, even Cthulhu disables SELinux) "See, that's why I came up with this course; I kept seeing instructions that started with 'Disable SELinux' and I wanted to know why."
Telling Matt about Jay's firewall testing script.
Me: So how to the big guys test their firewall changes?
Matt: I dunno...probably separate routers, duplicate hardware...
Me: Probably golden coffee cup holders, too.
Matt: Jerks.
You don't write SELinux policy. SELinux policy is hard. It's NP-complete and makes baby Knuth cry. Instead, you use what other people have written, and make use of booleans to toggle different bits of policy.
However, the size of the SELinux policy is big and is only getting bigger. There are something like 85,000 or more rules in recent versions of RHEL/CentOS. This is very close to RF's rule of thumb that a really, really smart and experienced person, who's been intimately involved in its creation, can only comprehend about 100,000 lines of code. This worries him.
Also, the problem of using SELinux is complicated by a lack of up-to-date documentation; like everything else it's a fast-moving target, and a book published in 2007 is now half out-of-date.
But this should not stop you from using SELinux now,; it's handy, it's here, get used to it. Example of SELinux stopping ntpd from running /bin/bash; the SELinux audit file was the only sign.
"In a multi-level secure system, files tend to migrate to higher security levels, and the system becomes less unusable. But that's beyond the scope of this class."
(On programs with long histories of serious security problems) "Flash is the Sendmail of -- what do we call this decade? the naughts?"
(On the difficulty of trying to decode SELinux audit logs) "It says the program 'local' had a problem. 'Local'. What the heck is that? Part of Postfix. Oh, good. Thanks for the descriptive name, Wietse."
Something I hope to quiz him further on: "Most Linux systems have a single filesystem." Really?
During the break I met a guy who works with the Norwegian Meteorological service. This was interesting. He's got 250TB in production right now, and increasing CPU power means that their models can increase their spatial resolution, which means increasing (doubling?) their storage requirements. He talked briefly about running into problems with islands of storage, but I got distracted before I could quiz him further...
...by his story of building a new server room where they were capturing the waste heat and using it to heat the building. Interesting; what kind of contribution would it be making to the overall heating budget? Probably not much, but it all just goes on the grid anyhow, like the hot water from the garbage dump. What?
Turns out that there is a city-wide network of hot-water pipes that collects heat from, among other places, water heaters powered by waste methane from rotting garbage. So they don't use the methane to make electricity and dump it in the electrical grid; they use it to heat hot water and dump that in the hot water grid, consisting of insulated water pipes buried in the ground, which places around the city (and beyond!) will use. We've got what you could call a steam grid at UBC and probably other universities, but I'd never thought of doing this city-wide.
Oh, and he signed my LISA card, which was the second time he got asked today; he was wearing a LISA t-shirt and so he was fair game.
At lunch I buttonholed Jay a bit. I asked him about his coworker's firewall unit testing scheme. He said he's no longer working at that place, but it ended up being a lot less useful than they thought it would be. When I asked why, he said that 90% worked but 10% didn't; that 10% was things like network isolation (to avoid problems with using real IP addresses), and the fact that the interface to the three machines was QEMU serial connections...less than ideal.
The conversation shifted to firewalling, and another guy who was there mentioned that he loved OpenBSD's pf, but had to use iptables because of driver problems that prevented getting full performance out of 10GigE NICs with OpenBSD. Jay said they'd looked at the same problem at his place o' work, and in his words "It was cheaper to throw 8 GigE NICs in a box and pay someone to make Linux interface bonding not suck."
QOTD:
Some kind of verb, some kind of moving thing
Something unseen, some hand is motioning to rise, to rise, to rise
Too fat fat, you must cut clean
You gotta take the elevator to the mezzanine
Chump change, and it's on, super bon bon
Super bon bon, super bon bon...
"Super Bon Bon", Soul Coughing
Tonight was a great deal of fun. I met up with Matt, who had invited me out for Turkish food earlier. I found that the group also included Tom Limoncelli and Doug Hughes, who is one of the Invited Talks coordinator and a very fun guy to boot.
We walked maybe 20 minutes across town to Cazbar on North Charles Street, and which I can recommend to anyone wanting good food. I had a lovely lamb and mozarrella Pide (like a pizza but more ethnic :-), did not like the Raki, but enjoyed the Sierra Nevada well enough.
Lovely food and fun conversation...like the guy who needed a Windows box to run Dell monitoring software, but decided to replace Explorer with Blackbox window manager and some kind of Apple Spotlight-like tool for Windows. My jaw dropped. "You've come this close to making Windows enjoyable for me."
After settling up the bill (non-trivial with 20 people, but we made it) we walked back again. I got to talk with Tom, which was neat (see 2006 entries from LISA re: accidental stalking); always fun to indulge in a little bit of hero worship.
Me: Oh, check it out: it's the Barnes and Noble store! Let's go party there!
Tom: What?
Me: Yeah, I've heard all about it! Free tequila shots at the door, cashiers dancing on top of their tills, DJs 'til 10am...
Tom: Oh, you're thinking of Borders.
I got to see the USS Constitution, which since I've been devouring the Master and Commander books over the last year or so I simply must visit. (Don't know when exactly...)
And so back to the bar. And so to bed. (tm Samuel Pepys.)
QOTD:
I got the will to drive myself sleepless
I got the will to drive myself sleepless
Sleepless....
"Sleepless", Soul Coughing
That time is how I feel, not the time it really is; not only is it Easter but it's Standard time, not DST, which means that the change caught me off guard this morning. I woke up my roommate thinking it was time for us to shift our asses, but no such luck. Oh well.
(Turns out that alarm clocks these days, at least of the sort that were developed for the DOD and have been provided under NDAs to major hotel chains, have a switch on the bottom for DST adjustments with three settings: On +1, Off, and Auto. That is one of the best ideas ever.)
8:35am and registration is good; I've got a cool IPv6 sticker and a copy of all the training material on a USB stick I'm going to try hard not to lose.
First day's training is an all-day course called "Management Skills, or Don't Panic!". It's not the sort of thing I'd usually sign up for -- soft skills, avoidance thereof -- but I figure it's probably a Good Thing for me to do, like exercise and eating right. It's interesting; there are some good anecdotes and quotes in there:
"How do you deal with a visionary-type manager? How do you get him to support your project?" Audience: "Tell him you read it in a Neal Stephenson book."
At the end of the course I had a question: I'd taken this course defensively, in order to pick up some skills that I lack -- but I enjoy the technical side of my job very much. I enjoy learning new things, but the problems involved in management seem best, to me, enjoyed in the abstract and at a distance. You give up your techie skills and joys; what compensating joys are there?
She had two answers. The first joy was seeing, and helping, people develop skills and at best exceed their teacher. The second was the fun of finding the problems that lay in organizations' way, no matter how many disparate groups or layers they might span (techies, mgt, suppliers, finance, cultural), and talking with those different groups/layers in order to solve those problems.
As I said, it was interesting. I'm still not entirely sold on management...but then there's the example of a friend of mine who's been doing this since '92. In a lot of ways, when it comes to technical problems he's been there and done that...so management is a (possible) way to keep it interesting for him.
On another topic: Lunch time I got into a very interesting discussion with a woman who figures that MS will lose majority market share on June 30, 2011. Her reasons:
First off, it was a two-year prediction made at a conference in June; had to come up with some kind of date. But also, MS only has majority market share in web browsers, PC OS and office suites. Of those, she figures the stats for web browsers are cooked for marketing purpose, and says that there is very little actual independent, large-scale data; however, data from W3 Schools shows increasing FF share. PC OS: less and less important as people move to Google docs and Gmail, which let's face it are plenty good enough for most home use. And the increasing ability of OpenOffice and other tools means that the domination of office suites is on the way down to.
Check out (her own? not sure) website at http://www.whatwillweuse.com.
After the course I met up with Matt and finally got to put a face to the face. He was there on Official Usenix Bizness(tm), as he's blogging for LISA and wanted to interview the instructor. Very friendly guy who's doing a lot to spread his knowledge around. And as it turns out he also got bit by DST, though worse than me. Poor bastard...
Thanks to this conference's theme band, Soul Coughing!
Saskatoon is in the room
Pyongyang is in the room...
Is Chicago
Is not Chicago
"Is Chicago, Is Not Chicago" -- Soul Coughing
Midway through my flight to Baltimore and I'm in Chicago, listening to periodic announcements that the Threat Advisory Level is Orange. The wifi here isn't working for me (associates fine but no address by DHCP), so I'm sititng at my gate, with two hours 'til I leave, wondering if any of the people around me are going to LISA as well.
The airport here has this amazing tunnel that goes between two concourses. Again, it made me think I was in Logan's Run and it was only the thought of being arrested that kept me from running down the moving sidewalk, shouting "Carousel is a LIE!"
Departure was entirely uneventful; I didn't even get pulled over for extra questions. One odd thing was that (like O'Hare) the customs section of YVR was quite warm, and each of the customs officers had identical clip-on fans placed above them. The cords curled down out of site, and the reflection in the cubicle glass reminded me of spines; I kept thinking they were skeleton decorations for Hallowe'en.
Hey, everyone -- I'm organizing a BoF at LISA this year on conference organization. For a couple of years, I've wanted to create a local conference on system administration here in Vancouver, but I've been unsure how to start. I figure what better place to brainstorm and seek advice than at LISA?
So if you have questions or knowledge to share on:
then drop on by the Dover C room on Thursday, November 5th, between 8:30 and 9:30pm. C'mon, you've gotta kill that hour before Matt's BoFs somehow...
Following in Matt's footsteps, I ran into a serious problem just before heading to LISA.
Wednesday afternoon, I'm showing my (sort of) backup how to connect to the console server. Since we're already on the firewall, I get him to SSH to it from there, I show him how to connect to a serial port, and we move on.
About an hour later, I get paged about problems with the database
server: SSH and SNMP aren't responding. I try to log in, and sure
enough it hangs. I connect to its console and log in as root; it
works instantly. Uhoh, I smell LDAP problems...only there's nothing
in the logs, and id <uid>
works fine. I flip to another terminal
and try SSHing to another machine, and that doesn't work either.
But already-existing sessions work fine until I try to run sudo
or
do ls -l
. So yeah, that's LDAP.
I try connecting via openssl to the LDAP server (stick alias
telnets='openssl s_client -connect'
in your .bashrc today!) and get
this:
CONNECTED(00000003)
...and that's all. Wha? I tried connecting to it from the other LDAP server and got the usual (certificate, certificate chain, cipher, driver's license, note from mom, etc). Now that's just weird.
After a long and fruitless hour trying to figure out if the LDAP server had suddenly decided that SSL was for suckers and chumps, I finally thought to run tcpdump on the client, the LDAP server and the firewall (which sits between the two). And there it was, plain as day:
Near as I can figure, this was the sequence of events:
This took me two hours to figure out, and another 90 minutes to fix; setting the link speed manually on the firewall just convinced the nic/driver/kernel that there was no carrier there. In the end the combination that worked was telling the switch it was a gigabit port, but letting it negotiate duplexiciousnessity.
Gah. Just gah.
So this morning, again, I got paged about machines in our server room dropping off the network. And again, it was the bridge that was the problem. This time, though, I think I've figured out what the problem is.
The firewall has two interfaces, em0
(on the outside) and em1
(on
the inside) , which are bridged. em1
has an IP address. I was able
to SSH to the machine from the outside and poke around a bit. I still
didn't find anything in the logs, but I did notice this (edited for brevity):
$ ifconfig
em0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> mtu 9000
lladdr 00:15:17:ab:cd:ef
media: Ethernet autoselect (1000baseT full-duplex)
status: active
inet6 fe80::215:17ff:feab:cdef%em0 prefixlen 64 scopeid 0x1
em1: flags=8d43<UP,BROADCAST,RUNNING,PROMISC,OACTIVE,SIMPLEX,MULTICAST> mtu 9000
lladdr 00:15:17:ab:cd:ee:
groups: egress
media: Ethernet autoselect (1000baseT full-duplex)
status: active
inet 10.0.0.1 netmask 0xffffff80 broadcast 10.0.0.1
inet6 fe80::215:17ff:feab:cdee%em1 prefixlen 64 scopeid 0x2
See that? em1
has OACTIVE
set. A quick search turned up
some interesting hits, so for fun I tried resetting the
interface:
$ sudo ifconfig em1 down
$ sudo ifconfig em1 up
and huzzah! it worked.
When I got to work I did some more digging and figured out that this
and the earlier outage were almost certainly caused by running
a full backup, via Bacula, of the /home
partition on the machine.
The timing was just about exact. The weird thing, though, is that
the partition itself is smaller than var
, which was backed up
successfully both times:
$ df -hl
Filesystem Size Used Avail Capacity Mounted on
/dev/sd0a 509M 42.4M 442M 9% /
/dev/sd0g 106G 11.4G 89.1G 11% /home
/dev/sd0d 3.9G 6.0K 3.7G 0% /tmp
/dev/sd0f 15.7G 2.4G 12.5G 16% /usr
/dev/sd0e 15.7G 13.6G 1.4G 91% /var
The bacula file daemon logged this on the firewall:
Oct 28 02:46:15 bacula-fd: backup-fd JobId 3761: Fatal error: backup.c:892 Network send error to SD. ERR=Broken pipe
Oct 28 02:46:15 bacula-fd: backup-fd JobId 3761: Error: bsock.c:306 Write error sending 36841 bytes to Storage daemon:backup.example.com:9103: ERR=Broken pipe
With the earlier outage it was 65536 bytes, but otherwise the same error.
Okay, so the firewall's working again...now what? I'm about to head off to LISA in three days, so I can't very well upgrade to the latest OpenBSD right now. I settled for:
OACTIVE
flag and, if found, resets the interface.Hopefully that'll keep things going 'til I get back.
I've come across a few LISA items today, and it's only 9am...
Matt Simmons is going, and got one of the blogger gigs too.
The BOFs are starting to fill up: Matt's got one for bloggers and another for small infrastructure, there's one for lightning talks, and one for uninvited talks.
OpenDNS is hosting a happy hour at a nice-looking pub, which alleges it was actually "designed and built in Ireland and shipped over in the fall of 2002, where it was then fitted on site." Huh.
Man, I'm looking forward to this.
Just got the approval from the boss...LISA, here I come! w00t!
Work...hell, life is busy these days.
At work, our (only) tape drive failed a couple of weeks ago; Bacula asked for a new tape, I put it in, and suddenly the "Drive Error" LED started blinking and the drive would not eject the tape. No combination of power cycling, paperclips or pleading would help. Fortunately, $UNIVERSITY_VENDOR had an external HP Ultrium 960 tape drive + 24 tapes in a local warehouse. Hurray for expedited shipping from Richmond!
Not only that, the Ultrium 3 drive can still read/write our Ultrium 2 media. By this I mean that a) I'd forgotten that the LTO standard calls for R/W for the last generation, not R/O, and b) the few tests I've been able to do with reading random old backups and reading/writing random new backups seem to go just fine.
Question for the peanut gallery: Has anyone had an Ultrium tape written by one drive that couldn't be read by another? I've read about tapes not being readable by drives other than the one that wrote it, but haven't heard any accounts first-hand for modern stuff.
Another question for the peanut gallery: I ended up finding instructions from HP that showed how to take apart a tape drive and manually eject a stuck tape. I did it for the old Ultrium 2. (No, it wasn't an HP drive, but they're all made in Hungary...so how many companies can be making these things, really?) The question is, do I trust this thing or not? My instinct is "not as far as I can throw it", but the instructions didn't mention anything one way or the other.
In other news, $NEW_ASSIGNMENT is looking to build a machine room in the basement of a building across the way, and I'm (natch) involved in that. Unfortunately, I've never been involved in one before. Fortunately, I got training on this when I went to LISA in 2006, and there's also Limoncelli, Hogan and Chalup to help out. (That link sends the author a few pennies, BTW; if you haven't bought it yet, get your boss to buy it for you.)
As part of the movement of servers from one data centre across town to new, temporary space here (in advance of this new machine room), another chunk of $UNIVERSITY has volunteered to help out with backups by sucking data over the ether with Tivoli. Nice, neighbourly think of them to do!
I met with the two sysadmins today and got a tour of their server room. (Not strictly necessary when arranging for backups, but was I gonna turn down the chance to tour a 1500-node cluster? No, I was not.) And oh, it was nice. Proper cable management...I just about cried. :-) Big racks full of blades, batteries, fibre everywhere, and a big-ass robotic Ultrium 2 tape cabinet. (I was surprised that it was 2, and not U3 or U4, but they pointed out that this had all been bought about four or five years ago…and like I've heard about other government-funded efforts, there's millions for capital and little for maintenance or upgrades.)
They told me about assembling most of it from scratch...partly for the experience, partly because they weren't happy with the way the vendor was doing it ("learning as they went along" was how they described it). I urged them to think about presenting at LISA, and was surprised that they hadn't heard of the conference or considered writing up their efforts.
Similarly, I was arranging for MX service for the new place with the university IT department, and the guy I was speaking to mentioned using Postfix. That surprised me, as I'd been under the impression that they used Sendmail, and I said so. He said that they had, but they switched to Postfix a year ago and were quite happy with it: excellent performance as an MTA (I think he said millions of emails per day, which I think is higher than my entire career total :-) and much better Milter performance than Sendmail. I told him he should make a presentation to the university sysadmin group, and he said he'd never considered it.
Oh, and I've completely passed over the A/C leak in my main job's server room…or the buttload of new servers we're gonna be getting at the new job…or adding the Sieve plugin for Dovecot on a CentOS box...or OpenBSD on a Dell R300 (completely fine; the only thing I've got to figure out is how it'll handle the onboard RAID if a drive fails). I've just been busy busy busy: two work places, still a 90-minute commute by transit, and two kids, one of whom is about to wake up right now.
Not that I'm complaining. Things are going great, and they're only getting better.
Last note: I'm seriously considering moving to Steve Kemp's Chronicle engine. Chris Siebenmann's note about the attraction of file-based systems for techies is quite true, as is his note about it being hard to do well. I haven't done it well, and I don't think I've got the time to make it good. Chronicle looks damn nice, even if it does mean opening up comments via the web again…which might mean actually getting comments every now and then. Anyhow, another project for the pile.
USENIX has done a wonderful thing: their conference proceedings are now open to the public, rather than requiring a USENIX membership.
This is very, very good. If you haven't gone through the list of presentations and papers from LISA, FAST, WOOT, or the USENIX conference itself, you really need to.
Come to that, if you haven't picked up a membership yet to USENIX and SAGE, you really need to. A dead-tree copy of ;login: magazine is the most interesting single publication I've found about computing in general, and system administration in particular. You owe it to yourself.
I've been listening to the presentations from LISA07, and I have a few observations.
Trey Darley's presentation reminded me a lot of my last job, but much more intense: fast growth, no control, and no budget. The difference is that he had the experience and the chops to deal with it well. Also, if he can present at LISA, so can I.
Andrew Hume's presentation, "No Terabyte Left Behind", was interesting, by which I mean frightening. People mostly just trust that hardware does what it says it does/will do when it comes to storage. But that doesn't always work: he tells the story of a prof he worked with who checksummed all his files once a week. When a checksum changed — and it did about every 6 months — he'd retrieve it from backup. His rough guess for undetectable errors: 1 per 10 terabyte-years. And we're getting to the point where that's going to be significant very soon.
Tony Cass' presentation on grid computing for CERN was fascinating. This is the place I wanted to work (though as a particle physicist). UBC/TRIUMF is doing some work for this project as well, which makes me think I should jump over.
David Josephson's presentation was interesting, as much for the Q&A afterward as for his point. Which was? Glad you asked: that focussing on IP-based spam filtering (RBLs, greylisting) provides an incentive to spammers to hijack network prefixes via BGP attacks, and generally do nasty things to the Internet; please switch to content-based filtering post-haste. (To clarify, he was talking in particular about fast naive Bayesian classifiers, not SpamAssassin.) Since IP-based filtering treats IPs as valuable things — tokens that demonstrate your email is worth accepting — spammers steal IP addresses.
I'm not sure how much I buy his argument; he kept promising that the BGP attacks he described were only part of the problem, but he never seemed to get beyond that. But during the Q&A Brad Knowles got up and said (my summary) Content filtering doesn't scale, at least in his experience (as Senior Internet Mail Systems Administrator for AOL). At that point, another guy got up and said (again, my summary) that sort of thing is heard all the time, but with no data to back it up. The responder had co-authored a paper with Josephson that got Best Paper award at LISA '04, and they'd made damn sure to include a ton of footnotes. If their conclusions were wrong, people were free to challenge them; if Knowle's were wrong, they were unchallengeable because there was no data to back it up -- it was all just story that got passed along and became myth.
Knowles' response was "I don't have time to write papers; I'm a technician, not an academic." Which is true, in lots of ways. And I don't mean any insult to Knowles; he's done things I will probably never match, we are all flooded with work, and so on. I'm one guy, working at a small shop, with none of his experience, or chops, or rep, or audience.
But there's a reason my .signature says "Because the plural of Anecdote is Myth": it's to remind me that unless you can back something up with facts, preferably written down and logged and repeatable, all you've got is a bunch of stories that become more and more True the more you repeat them.
It's obnoxious to sneer and say, "Cite, please"; it's worse to be ignorant.
Lots more listening to do. If you haven't downloaded them yet, you really should.
At last: I'm finally coming to the end of working with the verdammnt web registration forms. We're going from our awful hack of a glued-together mess of Mambo and custom PHP, to something that'll mainly be Drupal with no custom code. Allegedly it's six weeks 'til launch date; the registration forms in use right now will limp along 'til they're no longer needed (end of the summer).
The registration form I'm working on now is not complicated in the absolute sense, but it's the most complicated one we've got. Last year I was afraid to touch the (old, legacy, ugly) code, and mostly just changed dates. This year I thought "fuck it" and rewrote nearly all of it, using the tools and skills I'd picked up in the meantime. (I'm still not a great programmer, understand, but I have improved some over last year.)
After a full day banging my head against it, I'm finally coming to the point where I'm pretty confident that the code will do what it's supposed to. And that's a relief. Therefore, in the stylee du Chromatic, I give thanks to:
In other news...just downloaded the second dev preview of Indiana, which I'd managed to not hear about at all (the preview releases, that is). I love university bandwidth; 640MB in about 1 minute. Sweet. I'll give it a try at home and see how it feels.
I've just finished reading the summaries of LISA '07 in the latest issue of ;login:. I feel…incredibly left out. I'm starting to think this profession might not be such a simple thing, you know, man? Sir? The presentations on autonomic computing have left me feeling a bit like a buggy whip maker with his nose to the grindstone.
And yes, it's a way off, and yes, small shops and generalists will probably be around for a while to come. But I'm not sure how much I want to keep being at a small shop. Which means learning the big stuff. Which, natch, is hard to do when you're trying to figure out how to properly test registration forms. Sigh.
But: I just stuck my head out a door at work and saw a chickadee. It chirped for a while, sitting on a tree near our building, then flew off. On a rare sunny day in Vancouver in Frebruary, after a week of messed-up sleep and feeling like I've been spinning my wheels, this is nice.
When I got my first job in IT, a friend of mine bought me a copy of the third edition of Unix in a Nutshell. (Incidentally, why does O'Reilly's search, which in my client returns "Sorry, no matches were found containing ." (sic), suck so much?) Sure, it was help desk on a small ISP, but it was something. I read that book front to back on the bus to and from work, and filled it full of stickers from all the servers or PCs I assembled.
The sysadmin at that first job also had a cordless drill, and that made things so much easier when assembling or racking servers. I wanted one, but I didn't buy one 'cos I figured I hadn't earned it yet. When my Italian millwright father-in-law bought me one, I felt like it was a vote of confidence in a way.
Another thing the sysadmin had was a Leatherman Wave. Again, I wanted one, but I didn't think I'd earned it yet. Last week, I decided to get one; and if I was going to get one, I was going to wear the damn thing. I started wearing the sheath on my belt, and waited for a chance to use it.
Today I had that chance.
I got to work and went to the kitchen to grab a coffee. "There's a bat behind the fridge," I heard.
What?
The cleaning woman pointed. "I moved out the fridge to clean it," she said. "There was a bat behind it. I don't want to touch it."
I looked, and sure enough there was one hanging by the edge of the cupbard. It was small, like a mouse wearing an overcoat. (Goth mouse?)
And then my moment came.
There were no gloves (I was worried about rabies), but there was a towel. I draped the towel over the bat while frightened coworkers watched, and then covered it with a recycling bin.
And then I took out the Leatherman, and flipped out the knife. "I need help cutting cardboard," I said, and the receptionist came to help. She sliced up a cardboard box and gave me a square of it. I slid it between the cupboard and the towel, sandwiching the bat gently between it and the towel, with the recycling bin behind.
I carried it outside to a clump of trees (ah, the advantages of living on a beautiful campus), found a stick, coaxed it onto it and then left it up a tree.
But I couldn't have done it...
...without the Leatherman.
(This writing style brought to you by my third reading of Battlefield Earth. Our motto: Yeah, it's trash...so what?)
In other news, Hunter Matthews is giving a workshop on server room best practices at LISA '07. I met him at LISA last year, when he was another attendee of an otherwise thin tutorial on setting up server rooms/closet. He was also at the documentation BOF, and the one who said "I've got one user who considers 7-bit ASCII a luxury compared to what you can get from 5 or 6 bits." (Oh, and: "Cooperative collaboration. Yeah, its part of our vision statement.") He's a good guy and a good teacher, and if you're going to LISA you could do a lot worse than going to his workshop.
Memo to myself: Don't eat the Turkey sashimi.
In other news: I don't usually post links to things just to say "go read this". However, I'll make an exception in these cases.
First, I was recently going to use the word "Manichean" to mean "dualistic, good-vs-evil view of the universe, with an implied inevitable battle between the two". However, when I Googled for it to check the spelling, I came across this article explaining why that wasn't a terribly accurate use of the word. Interesting stuff...I certainly didn't know there were any Buddhist-influenced ascetics hanging around Baghdad in the 3rd century.
Second, there's some interesting and contradictory stuff on the procedures for GPG/PGP keysigning parties here and here. Why does publicizing a public key "slightly reduce the security of a key pair"? I don't know. I've had a quick look through my copy of Applied Cryptography (3rd Ed.), donated by the kind man behind Pangolin Systems, but can't find anything from Saint Bruce about this. Anyone?
Third, there's an excellent set of tools for keysigning parties available here. One of the people who signed my key at LISA had used caff to send it back, which is a nice wrapper around the whole procedure (grab the key, sign the key, encrypt the key with itself, email it back to each of the key's email addresses). The lack of understandable (but see next paragraph's self-ass-kicking) documentation for GPG means that a) this automation is very nice, and b) I'm kicking myself for not buying Michael Lucas' book from the No Starch Press booth at LISA.
Fourth, if'n you've got GPG, it's worth reading the documentation, like the FAQ or the GNU Privacy Handbook. Shame on me for not doing that previously. (And shame on me for taking so long to email people's keys back to them.)
Fifth, you can find some pretty stats here, or the trust path from me to Wietse Venema. Geek Pride!
Sixth and finally, there is this handy little page about how to set up a CPAN library in your home directory. Since it took me a while to track this down, I'm throwing it in here so's I can find it quicker next time.
One of the great things about going to LISA is that you get the proceedings and/or training for everything on CD or dead tree. (Well, nearly everything...I've heard that some people didn't or couldn't make their training materials available (though I've not been motivated to confirm this yet), and some of the talks didn't do this (Tom, where are your slides?)). There is some wonderful stuff to be found in them...
...like WWW::Mechanize, which is just perfect for testing out this conference registration form I'm working on. Only I've run into a bug that comes when trying to specify which button to click on:
$agent->click_button(value => 'Okay to submit');
That li'l chunk gave me this error:
Can't call method "header" on an undefined value at /home/admin/hugh/perl/lib/perl5/WWW/Mechanize.pm line 2003.
One guy reported the same trouble, but got no response. And the RT queue is fulla spam.
But aha, I found out how to use the Perl debugger in Emacs (M-x
perldb
. Shhhh!) and was able to track things down. Turns out there
are a couple things going on:
In the page that I'm parsing, there are actually two forms, not
one; one sends you back to correct mistakes, one sends you forward to
keep going. Since I was not specifying which one to use, it used the
first...and in that one, there is no button labelled "Okay to
submit". Once I specified the right form ($agent->form_number(2);
)
everything was good.
But of course, this sort of thing shouldn't happen, right? Right.
There are a couple subroutines/methods in this module that aren't
testing for the right number of arguments. One of 'em is
click_button
, which has this loop:
my $request;
.
.
.
elsif ( $args{value} ) {
my $i = 1;
while ( my $input = $form->find_input(undef, 'submit', $i) ) {
if ( $args{value} && ($args{value} eq $input->value) ) {
$request = $input->click( $form, $args{x}, $args{y} );
last;
}
$i++;
} # while
} # $args{value}
return $self->request( $request );
No test/case for not finding a button named whatever, so it just
blithely returns $self->request( $request )
. But of course,
request
does the same thing:
sub request {
my $self = shift;
my $request = shift;
$request = $self->_modify_request( $request );
if ( $request->method eq "GET" || $request->method eq "POST" ) {
$self->_push_page_stack();
}
$self->_update_page($request, $self->_make_request( $request, @_ ));
}
Again, no test for the right number of arguments. And having just read
the Test::Tutorial
manpage, I'm all about unit testing and such,
baby.
Come on, come out of the rain.
You're not oppressed, you're just too learned...
"Streets of Fire", The New Pornographers
Friday afternoon, a bunch of us were standing in the lobby. Jessica came by and said she was having problems getting into her home machine to get her boarding pass info. She was using the business centre, which only had locked-down Windows machines with no SSH client. The wireless was $87/hr or some such, and the free wireless set up by Usenix was way the hell over on the other side of the hotel. She was just about resigned to get up and go when a guy beside her piped up and said, "Hey, there's this tool that should help you out..."
"So I use it," she said, "and it turns out it tunnels SSH over DNS. It was the slowest connection I've ever used, but it was usable, and I got into my home machine."
I looked at her with wide eyes. "Was that...was that Dan Kaminsky who helped you?"
"I dunno," she said, "I've never meen him before. What does he look like?"
Normally I suck at descriptions, but I had this one down. "He looks like Brendan Frasier," I said confidently.
She shrugged. "I dunno, I don't think that was him...oh wait, there's the guy there."
We all turned to see Dan Kaminsky grinning. "That's one of the few times I've seen that tool actually be useful," he said.
Turns out he's a very friendly and funny guy, and if I heard him right he was roommates with the guy who started Friendster, who Jessica also knew. I foamed at the mouth for a bit in fanboyish wonder, then told him about IPoD and William Shatner's rap of the "Friends, Romans, Countrymen" speech from Free Enterprise. And of course, he wore the tracking monkey:
After that we split up for a bit, then re-united for supper. We hit FIXME, where we found a cute Mongolian waitress ("How many times can you say that?" asked Andy) and Bill Clinton burgers. We hit The Angry Inch in search of Angry Ale, which they no longer sold. Andy bought a t-shirt ("I'm never coming back to this place. And the last time I said I wasn't coming back to a place, I bought the place a round. This is cheaper").
Then we headed back to the the final LISA party. It was in the original hotel building, and it was the biggest goddamned suite I've ever seen. It had to be bigger than any two apartments I've lived in put together. There were lots of people there. I drank toasts with Wout (Cisco IT guy from Belgium; friendly, funny and BEST NAME EVAR) and Noah to Strata Rose Chalup, drinking this godawful Romanian plum moonshine...oh god, it was harsh. I spent a good 15 minutes with one of the board members of LOPSA trying to figure out the purpose of one of the suite's alcoves (we were stumped). And natch, I got more pix of the tracking monkey with William Lefebvre (top, 'member?):
and many, many others.
Eventually it came time to go home, so I said goodbye and collapsed in my suite.
Quotes I missed earlier:
I stole a page from your book, and a line from your page
And flew into a lesbian rage...
"Chump Change", The New Pornographers
Friday morning was Dan Fucking Kaminsky's talk, which I'd really been looking forward to. I dragged Ricky to it, telling him he rilly rilly needed to go, kthxbye.
My notes could not possibly do justice to his presentation, which was both funny and awe-inspiring. Anyway, Dan also makes the best slide shows I've seen; they're a whole textbook on their own. Go read all his stuff. And go see him talk! He's intelligent and friendly on rye bread.
Some random observations/quotes:
Ricky allowed as how Dan Fucking Kaminsky might have been worth getting up early for.
Okay, but after that the bitter pill of (FIXME: full name, title) Dmitri. This was a depressing, scary talk about network threats and how they're driven by very, very successful criminals. I'd heard this before, but the facts and stats he brought in were enough to just crush your soul.
The usual list:
Dan Kaminsky asked if maybe the answer was to abandon persistence on the desktop, and just hand out Knoppix disks to everyone. Dmitri replied that would just push the attack to web databases and such that held the user's settings. DK pointed out that would mean a much smaller number of machines to secure, which Dmitri conceded.
Q: I work for a web farm; what can we do? A:: watch your netflows carefully and learn your normal traffic. (cf Dan Klein's presentation).
Q: I use fuzzy OCR plugin for SA and it works fine. A: you might not be seeing adaptation yet, but you will. OCR is bound to fail; too easy to trick.
He closed his talk by saying the obvious: he's very, very pessimistic, he sees no magic bullet, and he can't see any light at the end of the tunnel.
Introducing for the first time, Pharoah on the microphone!
Sing: All hail what will be revealed today
From the fear of the great unknown, from the line to the throne.
"The Laws Have Changed", The New Pornographers
Thursday night was the USENIX Carnival Of Fun: lots of carnival games that got you more tickets for the door prizes (which were a huge pile of No Starch Press books plus a Monty Python box set). I wandered around for a while, looking at the huge crowd and fighting the temptation to run to the balcony and shout, "Carousel is a lie! You can LIVE!"
I talked for a while to a woman I'd been running into the whole week, a sysadmin at a defence contractor. She had been to Andy's talk as well. One difference between her job and Andy's is that she's responsible both for classified and unclassified networks. One effect of this is that she's able to contact more people for support...but there are limits.
For example, she had to send off logs from one app that was failing to the vendor for them to pore over. The app was on a classified computer; she was forbidden to copy any data from that machine directly to an unclassified network, so that meant no SSH, no ftp, no USB disk, no burning of CDs, nothing. What did she do? She printed out the logs, verified that nothing in there was classified, then put them through a scanner and used OCR to munge the images back into text.
Later, an engineer from another vendor came to poke at an app running on an unclassified computer, and it was her job not just to supervise him, but to run the big K-Mart Special flashing blue light to let everyone around her know that there was someone without clearance in the room, and to watch their mouths and adjust their monitors appropriately. In other situations, she's had to sit at the keyboard and type what the engineer told her to...because without clearance, you're not allowed to touch the machine.
I wandered on, and picked up a tracking monkey. There was a security consultant with a huge bag of stuffed monkeys that were meant to wrap around your arm or shoulder or something. I couldn't make that work, so I wrapped it around my neck. A little tight, but it was worth it: when people would ask what it was or where I'd got it, I'd fix them with a stern look and ask suspiciously, "Where's your tracking monkey, citizen?"
Eventually I hooked up with Noah (CSAIL) and Deb (FSF). Deb made us smack things (Noah won the strength test) and throw things (she cheated at skeeball, but I managed to win another ticket so that was okay). When the draw came over I dragged over Ricky the Bostonian/iite/aniananan for luck, since at least 8 people who'd been w/in 70 feet of him had won. However, turns out his luck function really peaks at 70 feet, and at 4 feet away it's pretty minimal. Oh well.
We went to check out the Google BOF, but on the way out Deb dared me to play Logan. I dragged her up to the balcony overlooking the ball room and yelled my line, but sadly it got lost in the noise. The lineup for the Google BOF was insane; someone told us that they were giving away a MacBook Pro. <post-hoc rationalization> We decided to form a Bass BOF and headed to the bar.</post-hoc rationalization> (Sorry I couldn't make your scotch BOF, Jessica!)
There was massive talk about salting the cod (which just sounds like the best euphemism anywhere, and I really want everyone to pick up on that, so go!), places to drink in Boston (incl. one place that has 100 beers on tap), and many, many other things. After a while we headed to the LOPSA room, where a lot of people ended up. I talked briefly to Andy, the guy who talked about Command and Control:
I got a lot of pictures with the tracking monkey, including Tom Limoncelli:
and dkap and Melanie Rieback:
And when the night wound down, we went back down to the bar to verify that their supplies were still good. (They were.) Man, it's been a long time since I've closed a bar. :-)
Sound of tires, sound of God...
"Electric Version", The New Pornographers.
Thursday morning came far too early. My roommate offered some of his 800mg Ibuprofins, and I accepted. First thing I attended was the presentation "Drowning in the Data Tsunami" by Lee Damon and Evan Marcus. It was interesting, but seemed to be mostly about US data regulations (HIPPA/SOX et al.) and wasn't really relevant to me. I had been expecting more of an outline of, say, how in God's name we're going to preserve information for, say, a hundred years (heroic efforts of the Internet Archive notwithstanding). There was mention of an interesting approach to simply not accumulating cruft as you upgrade storage (because it's easier than sorting through to see what can be discarded; "Why bother weeding out 200MB when the new disk is 800GB?"): a paper by Radia Perlman (sp?) (she of OSPF fame) that proposes an encrypted data storage system (called The Ephemerizer) combined with key escrow that, to expire data, simply deletes the key when the time is up. Still, I moved on before too long.
...Which was good, because I sat in on Alva Couch's presentation on his and Mark Burgess' paper, "Modelling Next-Generation Configuration Management Tools". Some very, very confusing stuff about aspects, promises and closures -- confusing because the bastard didn't preface his talk with "This is what Hugh from Vancouver will need to know to understand this." (May be in the published paper; will check later.) Here's what I could gather:
I will do the right thing and read his paper, and I may update this later; these are just my notes and impressions, and aren't gospel. Couch is an incredibly enthusiastic speaker, and even though I didn't understand a lot of it I ended up excited anyway. :-) He gave another talk later in the week that Ricky went to, about how system administration will have to become more automatic; as a result, we'd all better learn how to think high-level and to be better communicators, because more and more of our stuff will be management -- and not just in the sense of managing computers. I'm going to seek out more of his stuff and see if it'll fit in my head.
After the break was a talk on "QA and the System Administrator", presented by a Google sysadmin. I went because it was Google, and frankly it wasn't that interesting. One thing that did jump out at me was when he described a Windows tool called Eggplant, a QA/validation tool. It has OCR built-in to recognize a menu, no matter where it is on the screen. This astounded me; when you start needing OCR to script things, that's broken. I don't doubt that it's a good tool, and I can think of lots of ways that would come in handy. But come on. I mean, a system that requires that is just so ugly.
I went out to lunch with Jay, a sysadmin from a shop that's just got permission from the boss to BSD a unit-testing program they've come up with for OpenBSD firewalls: it uses QEMU instances to fully test a firewall with production IP addresses, making sure that you're blocking and allowing everything you want. It sounds incredibly cool, and he's promised to send me a copy when he gets back. I can't wait to have a look at it.
After that was the meet-the-author session. I got to thank Tom Limoncelli for "Time Management for System Administrators", and got an autograph sticker from him and Strata Rose Chalup, his co-author for Ed 2. Sadly, I didn't get a chance to thank Tobias Oetiker (who I nearly ran into at lunch the day before).
Next up was the talk from Tom Limoncelli and Adam Moskovitz (Adam's looking for a job! Somebody hire him!) about how to get your paper accepted at LISA. Probably basic stuff if you've written a paper before, but I haven't so it was good to know. Thing like how to write a good abstract, what kind of paper is good for LISA, and how you shouldn't say things like "...and if our paper is accepted, we'll start work right away on the solution." Jay asked whether a paper on the pf testing tool would be good, and they both nodded enthusiastically.
Must Google:
Quotes from the talk:
At this point I started getting fairly depressed. Part of it was just being tired, but I kept thinking that not only could I not think of something to write a paper about, I could not think of how I'd get to find something to write about. I wandered over to the next talk feeling rather sad and lost.
The next talk was from Andy Seely on being a sysadmin in US Armed Forces Command and Control. Jessica was there, and we chatted a bit about how this talk conflicted with Tom Limoncelli's Time Management Guru session, and maybe ducking over to see that. Then Andy came over and asked Jessica to snap some picture, so she ended up staying. I was prepared to give it five minutes before deciding whether or not to leave.
Well, brother, let me tell you: Andy Seely is one of the best goddamned speakers on the planet. He was funny, engaging, and I could no more leave the room than I could get my jaw to undrop. Not only that, his talk was fascinating, and not just because he's a sysadmin for the US Armed Forces while simultaneously having a ponytail, earrings and tattoos. You can read the article in ;login: (FIXME: Add link) that it was based on, but he expanded on it considerably. Let me see what I can recall:
Longer story: Because of the nature of his work, he's got boxes that he has to keep working when he knows next to nothing about what they're meant to do. Case in point: a new Sun box arrives ("and it's literally painted black!"), but the person responsible for it wants to send it back because it doesn't work -- which means that when they click the icon to start the app it's meant to run, it doesn't launch and there's no visible sign that it's running. There's no documentation. And yet he's obligated to support this application. What do you do?
Even tracking down the path to the program launched by the icon is a challenge, but he does, tracks down the nested shell scripts and finally finds the jar that is the app ("Aha! It is Java!"). He finds log files which are verbose but useless. He contacts the company that wrote it, and is told he needs a support contract...which the government, when putting together the contract for the thing, did not think to include. So he calls back an hour later, talks to the help desk and tells them he's lost the number -- "Can you help a brother out?" They do, but they're stumped as well, and say they've never seen anything like this.
Time to pull out truss, which produces a huge amount of output. Somewhere in the middle of all that he notices a failing hard read of a file in /bin: it was trying to read 6 bytes and failing. Turns out the damned thing was trying to keep state in /bin, and failing because the file was zero bytes long. He removed the file, and suddenly the app works.
Andy also talked about trying to get a multiple GB dump file from Florida to Qatar. Physical transport was not an option, because arranging it would take too long. So he tries FTPing the file -- which works until he goes home for the day, at which point the network connection goes down and he loses a day. So he writes a Perl script that divides the file into 300MB chunks, then sends those one at a time. It works!
At this point, someone yells out "What about split?" Andy says, "What?" He hadn't known about it. There was a lot of good-natured laughter. He asked, "Is there an unsplit?" "Cat!" came the response from all over the room. He smacked his forehead and laughed. "This is why I come to LISA," he said. "At my job, I've been there 10 years. People come to me 'cos I'm the smart one. Here, I'm the dumb one. I love that."
There are two things I would like to say at this point.
First off, Andy is at least the tenth coolest person on the entire Eastern seaboard. No, he didn't know about cat -- but not only did he reimplement it in Perl rather than give up, he didn't even flinch when being told about it in the middle of giving a talk at LISA. I would probably have self-combusted from embarassment ("foomp!"), and I would have felt awful. Andy's attitude? "I learned something." That's incredibly strong. (Although he told a story later about being in the elevator with some Google people. They recognized him and said, "Hey, it's the 'man cat' guy!")
Second, when he said, "Here, I'm the dumb one. I love that" I sat up straight and thought, "Holy shit, he's right." Here I am at LISA for the first time ever. I've met people who can help me, and people I can help. I've made a crapload of new friends and have learned more in one week than I would've thought possible. And I'm worried 'cos it might be a few years before I can think about presenting a paper? That's messed up. I tend to set unreasonably high goals for myself and then get depressed when I can't reach them. Andy's statement made me feel a whole lot better.
During Q & A I asked what he did for peer support, since his ability to (say) post to a mailing list asking for help must be pretty restricted. He said that he's started a wiki for internal use and it's getting used...but both the culture and the job function mean that it's slow going. He's also started a conference for fellow sysadmins: 100 or so this year, and he's hoping for more next year.
In conclusion: if you ever get the chance to go see him, do so. And then buy him a beer.
You looked as though I'd picked your name out of a hat
Next thing I know, you're fast asleep in someone's lap...
"The Bleeding Heart Show", The New Pornographers
Small shops BOF is coming up tonight, not last night. Wednesday's BOFs were:
Should you roll your own config tool? I was actually looking for Tobias Oetiker's (!) (who received an award here for MRTG and RRDTool, and who I nearly tripped over at lunch yesterday) BOF on his tools but wandered into this one by mistake. I left after a few minutes, as a lot of the concepts were over my head. A shame, because apparently I missed a big discussion between Luke (puppet), the bcfg2 devels and (if I remember right) Mark Burgess (cfengine). Mark Burgess lost the fight and committed seppeku, with William Lefebvre as his second. Blood everywhere. USENIX is gonna lose the damage deposit for sure.
Splunk: Someone directed me here because this was where the beer was. Again, all that was left was Bud (Lite). Splunk isn't really my bag, although the guy I've met who's gotta deal with 50GB of logfiles a day (or some such) was quite interested.
LOPSA: Interesting, not least because someone else asked the question I was afraid to: What the hell is up with LOPSA and SAGE? Short answer: SAGE is about advancing the profession (research, training); LOPSA is about advancing you (professional development, support, fellowship...). Long answer: politics and tax laws.
Streaming media at universities: Seven people including me. Everyone else was streaming terabytes of data with multiple servers; I wasn't. That was pretty much it.
To wild homes we go,
To wild homes we return,
To wild homes we go.
"To Wild Homes", The New Pornographers
This morning was the keynote address by Cory Doctorow on "Hollywood's Secret War On Your NOC". Excellent stuff...lots of stuff I was already familiar with, but some specifics that were incredible and/or funny:
Must Google:
Whew! Met up with the Boston sysadmin again, and I pointed him to Windflower -- he's a small enough shop that it may actually be useful for him. Good stuff. Picked up a ribbon that said "Blogger", another that says "Newcomer", and a third that says "Usenix Baby" for Arlo.
After that came technical papers on spam. First up was a paper by Brent Kang et al. on Privilege Messaging (FIXME: Add link). Third-hand, but: allegedly, as of last year, phishing is making more money than drug smuggling. A cite would be really nice for that, but he didn't have one. He also mentioned a recent paper (again, need cite) showing that spam coming from Gmail accounts (not forged, but real accounts) had rised from 1% at the start to 10%...interesting to think of how that might indicate a failure of friend-of-a-friend. OTOH, maybe that's an indication of success of FOAF, since...
...the next paper, on the experience of an Italian research network, showed that their percentage of legit mail (not caught by the spam filters) had, over the last few months, gone as low as 8%. That's fucking incredible. However, he's having excellent results with Bayes and SpamAssassin, so maybe there's some hope.
After that was "A Forensic Analysis of a Distributed Two-Stage Web-Based Spam Attack" by Daniel Klein. Very interesting: showed how regular monitoring of his systems and looking at the graphs it produced let him notice -- the second time it happened -- a very subtle attack that let 5,000 messages go out the door because of a subtle, simple CGI bug. As at least some (and probably most) of the attacks were through web proxies, I asked him (knees knocking; I was very nervous) if he thought it would be worth looking for this sort of traffic, or this sort of traffic on certain pages. He pointed out that actually, this sort of traffic -- distributed, small requests, high in numbers -- was exactly what you wanted from a website, so it was extremely hard to analyze as it happened.
After that, I talked with Noah, a Debian security guy and senior sysadmin at MIT's Artificial Intelligence lab. ! We talked about spam, getting depressed about DRM (him) vs spammers (me), and moving the AI lab to a new building after 40 years (me. no, wait). Very interesting stuff, and a good guy.
The afternoon was taken up with data closet/centre setup training. Very, very good stuff once everyone got talking -- the slides were 'way thin, but my notes filled the rest of the book. Since I've learned what I know about this by making mistakes, it was good to think of maybe shaving a mistake or two off my list from the future.
And then...then the vendor exhibit. Beer (yay!), Budweiser (boo!), and a chance to pick up the cable modem hacking book from No Starch Press' table. I also got a chance to talk with the FSF folks, up/down from Boston, and pick up a t-shirt. No luck convincing a fellow attendee to join, but I'll keep working on him. Splunk had the best booth babes (or so I heard), but Google by far had the most people around their table. Interesting.
Now off to the BOFS. Quite looking forward to the one on life at small shops.
Cities and circles drawn perfect, complete
These are the fables on my street, my street, my street
"My Street", The New Pornographers
Okay, my (lawyers, please note) TOTALLY ACCIDENTAL stalking of Tom Limoncelli continues. I met another sysadmin from Boston (who, BTW, is into LISP. Call that accidental? 'Cos I don't) (alsoplus he's the third guy I've met from a small shop, which is damned reassuring in a conference full o'people from multi-continent corporations/teams) who invited me along to the LOPSA hospitality room. I talked to David Parter from LOPSA about why I should join. He also gave me the sad news that the Burritos-as-big-as-your-head place in Madison, WI is closed. Noooooo!
Nice bunch of people, who'll probably be getting a membership fee from me post-haste. Totally unrelated to the free beer. I met a guy from a Scandinavian hosting company that has, like, 300,000 domains (!). We talked about spam for a while, and PHP's ability to include files remotely (he's a big fan. Oh, wait, no) ("When I meet the guy who put that in..." "You'll punch him in the cock?" "Oh, that's just the start of it."), and Perl vs. C vs. LISP vs. Dvorak keyboards vs. I don't know what all.
And who else is in the room AND stared at my badge trying to figure out who the hell I was? That's right, Tom! Still no chance to lean over casually and say, "So I hear Google's trying to figure out what to about TCP scalability bringdown. 'Cos, like, my enterprise-fu PHP taint mode will totally nebbish your gubbins. Scalable. Solution. Moving forward. Come back!"
Also went to the: Free Beer and Ice Cream BOF, PGP/CACert BOF, and the
Bash scripting BOF. Last challenge: using Bash built-ins only, check
to see if a given TCP port on a given host is open. Welp, I did know
about Bash's built-in /dev/tcp/host/port
, but totally foundered on
syntax. We were told to email our scripts to polvi.net...which sounded
familiar, and it should, 'cos was Alex Polvi, who works at
Oregon State University Open Source Lab, they who provide
bandwidth to such as Gentoo, Mozilla and Kerneltrap. At
one point, a few friends of his came in and sat down close to where I
was, and he came over and talked to them during one of the
challenges. "I think everyone would get freaked out if they knew a
Google recruiter was here," he said, laughing. Worked for me.
And, BTW, I thought I was at least quarter-decent at Bash. Hah! It is to laugh.
What the last ten minutes have taught me:
Bet the hand that your money's on
"Letter From An Occupant", The New Pornographers
Attended my first BOF last night on wikis for sysadmin documentation -- amazingly fun and informative. I even managed to contribute to the conversation. And when I told the war story about recovering my wiki from spammers (that's right! because PHPWiki sucks!) I got a gratifying look of sympathy from the audience.
Today's talk was "Habits of Highly Effective Sysadmins". It was aimed at folks like me who've been mostly self-taught, and I thought they hit the mark extremely well. (I've heard lots of people here say that they'll go see anything put on by Lee Damon or Mike Ciavarella just on principal (principle?).) Very, very informative and great teachers, too.
I found out today that Tom Limoncelli's name is pronounced "li-mon-sell-ee", not "li-mon-chell-ee". W/luck, this will save me embarassment later.
Tonight the BOFs start in earnest, including the one that offers free beer and ice cream. Sadly, I will be attending the one on pet counting instead. I will die a little bit inside.
Two sips from the cup of human kindness, and I'm shit-faced
Just laid to waste
If there's a choice between chance and flight, Choose it tonight.
"Choose It", The New Pornographers
Just got back from a whirlwind walk from the Lincoln Memorial to the Washington Monument to the White House. Beautiful, all of it...though a) the White House is small and b) there was something being filmed/videotaped in the courtyard, which made me think of Vancouver.
Training again. AFrisch was good, convering Cfengine quite well; would've liked to see more info about expect. (Apparently there are Perl/Python bindings...I had no idea.) Afternoon course was "Interviewing For System Administrators" by Adam Moskowitz and that was great -- lots of things I didn't know, lots of tips on doing it better next time.
Saw Tom Limoncelli in the hall during a break. Managed to restrain myself. I have the reputation for quiet restraint of a nation to uphold.
Very tired now. Time to go get beer.
As we sift through the bones of an idol
We dig for the bones of an idol
When the will is gone
'Cause something keeps turning us on
"Bones of an Idol", The New Pornographers
Today was Solaris 10 Administration, an all-day course that introduced all the nifty features of Solaris 10. I've only worked with Solaris since July, but I've been reading so much about Solaris 10 that most of the stuff presented (dtrace, SMF, zones) was familiar to me. OTOH, the course was aimed at admins of older versions of Solaris (2.veryearly through 8 and 9), and so the explanation of the differences assumed a lot more familiarity with Solaris than I had. It was a curious sensation.
Still, though, it was worth going to. Good quote: "Oracle DBAs are the most Kool-Aid drinking people I've ever met." And another: "Zones are the most controversial thing we'll be talking about today, and spending the most time on. I saw someone carrying two cups of coffee -- that's the right attitude." Also, Bill Lefebvre, the man I was going to accuse of stealing my underwear, wrote top(1).
Oh, and it's a good thing I brought a second wireless network card; the onboard one in the laptop kept dying, with an entry in syslog that read "fatal firmware error". Now I've got an Orinoco Gold in here, and it's working just fine.
Met a sysadmin today who works in the VOIP department of a phone company; they've moved most of their stuff from racks and racks of old-style Alcatel equipment to one rack of Solaris machines acting as soft switches. I was curious about the difference in reliability and uptime; my understanding is that the demands on telecom equipment are worlds above anything that can be provided by COTS Unix, and asked him how it worked for them.
He said that, yes, you'd get situations where a phone call would be delayed because of a system crash: instead of taking one second to connect, it might take two or even three. And if that was anything beyond a small fraction of their customers, that would be a big problem. However, the soft switches had much better failover ability than the old stuff; the old stuff would be up much longer, but when it failed everything would cascade and the whole system would come tumbling down, at which point a customer would hear "Your call cannot be completed as dialed."
Met another guy who was very excited about ZFS, because of an app at his work that writes 4 TB of data in individual 4 KB files. The best they've heard from their current storage vendor of choice is a block size of 8 KB...which means doubling their storage requirements just to deal with filesystem overhead.
I had alligator jumbalaya. It's official: it tastes like salty chicken.
Jackie, you yourself said it best when you said
There's been a break in the continuum
The United States used to be lots of fun...
"Jackie", The New Pornographers
10am CST: Welp, I'm in the air on my way to Chicago, and from thence to Washington for LISA. The laptop is running well (stress-tested by Sloan, The New Pornographers and Yo La Tengo), and I'm using my time to skip watching "Lady in the Water" (not how I want to see this film for the first time) and work on AsciiDoc. I think this is going to work pretty well for my plan: to start having my blog in just plain text for source, and plain HTML for output. I like it a lot, and the less PHP I have to audit the happier I am. (Not that I *do* audit PHP. But I feel guilty when I don't.)
Turned out I was rather stupidly cautious at the airport. The flight left at 6.15am PST, and I was there at 3.45am. What I didn't realize is that the ticket counter didn't open til 4.30am, and customs not til *5am*, thank you. But once they got started, everyone moved along pretty quickly.
I did get pulled over for extra searching, but nothing serious: where was I going, could I open the bag, where do I work. Once that was done, the officer was quite friendly; he urged me to take time to go see the sights, since work was paying for this. I expected worse.
But man, I don't know when I'll have the time. Training starts tomorrow with a full day of Solaris 10, and it just keeps going from there. Plus, of course, there's the free beer and ice cream. The time, she flies, no?
I need to get a haircut. I haven't shaved my head in two weeks, so I've got a damned dirty commie hippie head of hair at the moment.
Wow...over somewhere midwestern now, and the patchwork of land is neat to look at. Not half as beautiful as a city at night from 3000 metres, though...man, that's God's own set of Xmas lights.
12.30pm CST: Later...In O'Hare at Chicago, taking advantage of the free electrical outlets for charging laptops. The wifi access is charged-for, though, same as in Vancouver. And me without OzymanDNS...
10.20pm EST: Now in my hotel room. No wireless from USENIX up here, but it does work in the lobby where there's simply an amazing amount of very dressed-up corporate types. I think it's some sort of Xmas party. The contrast between them and the t-shirts-and-jeans crowd (not to mention me typing away alone on my laptop) is stunning. (Incidentally, my grandmother was both shocked *and* appalled to learn that not only was I not purchasing a new suit for this conference, I would not be wearing a suit at all.)
My luggage, I found out after an hour of waiting, is currently wending its way here from Chicago; I imagine some sort of Die Hard 2-esque leap across the tarmac that failed, but only barely. Allegedly United expected it here at 7pm and will courier it over Real Soon Now. We'll see.
By the time I finally made it to the hotel and checked in, it was 6.30pm . It had been a long time since I'd had anything but Mountain Dew (SPECIAL CAFFEINATED US VERSION!) to eat, so I was just starving enough to go for the -- wait for it -- $13 (US!) cheeseburger in the lobby. That and two Guinesses pretty much blew my budget for the week; at this point, I'm looking into the carb count in a BSSID beacon frame. (Yes, I'm making that term up.) Worth it, though; my roommate and I exchanged war/horror stories with a Sony engineer/sysadmin from San Francisco over the beer. Good times.
I'm pretty sure I saw Aeleen Frisch in the lobby. I think I saw William LeFebvre, the program chair, at the airport picking up baggage from the SAME BAGGAGE CAROUSEL where my stuff was supposed to be. There's this thing called USENIX bingo, where they give you cards with organizers' photos in it and you're supposed to get them to sign it. I think I'm going to tackle LeFebvre and ask him where my underwear is, then get him to sign my card to affirm that he didn't steal it.
I have not yet seem Tom Limoncelli, and I wouldn't recognize Dan Kaminsky if he queried my DNS server via avian carrier, so my plans to see what they've done with my underwear are, as yet, hazy. If my underwear doesn't show up, I may have to go shopping. I think the nearest Wal-Mart is in Tennessee.
Thank you to our sponsors for the title.
Good news: I'm going to LISA! I convinced my employers to heavily subsidize my trip. I've booked a double room at the hotel; I'll be posting to the roomshare mailing list shortly, but feel free to comment or email if you wanna split the cost.
Bad news: I somehow borked X on my desktop at work yesterday. The symptoms are quite strange, and mostly involve not being able to click on a window and have focus move there. It's IceWM, and I haven't changed focus model, and the symptoms persisted over multiple restarts of KDM (ctrl-alt-backspace). I looked for open files, running processes and even removed .gconf*
and .gnome*
on principle; nothing. The only thing that was different was running, for the first time, the new(ish - 1.5.0.2) version of Firefox after d/l it from the Mozilla site. The machine is running SuSE 10, and for various reasons I can't update it right now. In the end, I got desparate enough to try a reboot, and of course that fixed it...which is NO FUCKING WAY to solve problems, dammit.
(Interesting how this pokes holes in my manly command-line-only stance; yes, I was able to get some work done by going to the console, but frankly I've become very very used to managing terminals and a browser with IceWM and it's hard to switch back. Damn.)
Weird news: A while back I came across a problem with a Solaris 10 machine: lpq just hung, and eventually timed out with an error (that I haven't written down, so I suck). Eventually figured out it was trying to contact the lpd service on the machine's main interface (handwave goes here about BSD-compatibility printing commands), which should've been run by inetd. Okay, but inetd is now taken care of by inetadm
and svcs
, not /etc/inetd.conf
anymore. And while the command is called in.lpd
, it's actually called svc:/application/print/rfc1179
. Which is in maintenance mode, so start it up only it doesn't and I cannot figure out why: no log files I can see (the scattering of log files in a default Solaris install is really driving me nuts), no reason given, nothing. I ask another sysadmin who admits he's stumped by it but just for fun tries putting in an entry in /etc/inetd.conf
and then running inetconv
, the way you're not supposed to have to do except for weird legacy stuff that hasn't been moved to svcs
yet. And damnitall, it works. Again, no idea why.
And that is it for now. I am tired beyond belief, having moved up my annual snifter of port from Xmas to go out with coworkers last night. I stopped drinking at 7pm and I'm still tired today. Pathetic. Arlo would be so disappointed in me.
I just love clever network hacks.
Speaking of which, I think I'm going to ask my boss if she'll send me to LISA. I didn't realize I had sysadmin heroes 'til I started looking at the program: Æleen Frisch! Michael Lucas! Tom Limoncelli (who's working at Google now, natch)! W. Curtis Preston! But also Dan Fucking Kaminsky, that's who:
I like big graphs and I can't deny...You other hackers can't deny...when a packet routes in with an itty bitty length and a huge string in your face you get sick...cuz you've fuzzed that trick...
...who's going to be presenting the results of a worldwide SSL scan among lots of other stuff.
I think it'd be great to attend, but it's a long shot. Wish me luck.