So one thing that's been hanging around in my mailbox for (checks mailbox) good God, three weeks now is an exchange I had with Jess Males (aka Hefeweizen on IRC, which is a damn good name). He wrote to me about my Infrastructure Code Testing BoF, and asked:
I don't see notes to this effect, so I'll ask: what's the difference between monitoning and test-driven infrastructure? Monitoring started as a way to tell us that the infrastructure we need is available and operating as expected. Test-driven infrastructure serves the role of verifying that the environment we're describing (in code) is being implemented, and thus, operating as expected. Those sound awfully similar to me. Is there a nuance that flies over my head?
Before I insert my uninformed opinions, I'll point you to an excellent article from this year's SysAdvent by Yvonne Lam called How To Talk About Monitors, Tests, and Diagnostics. But if you're still interested, here goes...
First, often (though not always) it's a pain in the ass to point already-existing monitoring at possibly ephemeral VMs, dev machines and the like. Just think of the pain involved in adding a new machine + a bunch of services AND remembering to disable alerts while you do so. Not to say it can't be done, just that it's a source of friction, which means it's less likely to be done.
Second, there are often times when we're building something new, and we don't have already-existing monitoring to point at it. Case in point: I recently set up RabbitMQ at work; this was new to us, and I was completely unfamiliar with it. The tests I added can go on to form the basis of new monitoring, but they emerged from my desire to get familiar with them.
Third, these tests were also about getting familiar with RabbitMQ (and Puppet, which is new to me), and doubtless there are some things in there that will not be needed for monitoring. These are valuable to have in testing, but don't always need to be kept around.
I fully stipulate that monitoring, as often implemented, falls woefully short of our ideal. More often than not, monitoring is a ping check or a port check. Our test driven environment should check for page load times or members behind a load-balancer, or &c. If what we really want are better, more accurate environment measuring, then know there's a refreshing reimagination of monitoring with #monitoringlove. If they're already marching in our direction, let's join ranks.
True story. I've shot myself in the foot more times than I care to remember by, for example, testing that port 80's open without checking the content coming back.
Now that I've said this, I think I start to answer my own question of TDI (test-driven infrastructure) vs monitoring. I begin to see these points: write the tests first (duh, devs have been saying this for years), and better test (monitoring artifacts) generation (ideally, automatic).
Test first: yes, particularly when starting with a new technology (see above re: RabbitMQ). Also, in theory you can rip stuff out and try something else in its place (think nginx vs Apache); if the tests still pass, you're golden. Still missing: Better test generation. jI'd love something that ate serverspec tests and spat out Nagios configs; even as a first draft of the tests, it'd be valuable.
So LISA's over...has been for a week now. Time to put up some thoughts about what it was like.
First off, the blogging was a lot of work, and big enormous shouts out to BeerOps and Mark Lamourine, as well as Noah Meyerhans and Matt Simmons, for doing such excellent work. We kept a pretty good cadence going. Our goal was two posts/week in the leadup to the conference, which we pretty much met, and then one post/day per person during the conference itself, which we nailed. I'm glad I did it and I'm glad it's done and I'm glad I went. (Glad glad glad gladdity glad glad, in fact.)
I realized in retrospect that I did not mention anywhere that I got free admission to the conference for our work; that's bad. I love the conference and would have been happy to go anyhow, but I should have made it clear that I was being compensated for it. Not sure if that's necessarily required on the USENIX site itself, but I was also writing here...anyhow: I messed up there.
Speaking of posts, I also wrote a post for $WORK on testing Puppet modules with ServerSpec and Vagrant. (I wasn't paid for that, but it's expected that we'll take turns coming up with posts for the blog; this came out of a lunch-and-learn I did for coworkers on this topic.) That one was a lot of work, too, both for the writing and the example code that goes with it, and I'm glad it's finally done.
But back to LISA. Some highlights from the conference:
I gave the @EFF all my cash & all I got
was this pic of @kurtopsahl with the #goldenpony
(Oh, + internet freedom.)
—
Saint Aardvark (@saintaardvark) November
12, 2014
There's more to write, but I need to post this...so I'll come back to the BoF and some questions I got asked about testing code in the first place.