Post-Maintenance Fallout

20 Dec 2012

Two things bit me after doing big patching yesterday.

First, Cacti's logs suddenly exploded with a crapton of errors like this:

12/20/2012 03:41:41 PM - CMDPHP: Poller[0] ERROR: SQL Assoc Failed!,
Error:'1146', SQL:"SELECT 1 AS id, ph.name, ph.file, ph.function FROM
plugin_hooks AS ph LEFT JOIN plugin_config AS ...

and on it went. The problem: Cacti got upgraded, but I forgot to run the upgrade step.

Second, LDAP Replication stopped working. The single master (multi-master replication is for people who don't get enough pain in their lives already) suddenly stopped, with terribly uninformative log messages like:

NSMMReplicationPlugin - Replication agreement for agmt="cn=eg-02" (eg-02:636) could not be updated. For replication to take place, please enable the suffix and restart the server

Forcing initialization didn't work, and neither did recreating the agreement; that got me this error:

agmtlist_add_callback: Can't start agreement "cn=eg-02,cn=replica,cn=dc\example\2c dc\3dcom,cn=mapping tree,cn=config"

But that log message did hold the key. As described here, 389/CentOS/Fedora DS/RHDS switched to a new DN format. And near as I can figure, either some upgrade step didn't work or it simply wasn't there in the first place.

The solution: Shut down the server. Edit dse.ldif and change

cn=eg-02,cn=replica,cn=dc\example\2c dc\3dcom,cn=mapping tree,cn=config

to:

cn=eg-02,cn=replica,cn=dc\example\2cdc\3dcom,cn=mapping tree,cn=config

Notice the space that just went away. NOTICE IT.
Now restart the server.
I also deleted the replication agreements and recreated them; not sure if that was strictly necessary, but there you go.

UPDATE: Nope, the problem recurred, leading to this amusing return from the Nagios plugin:

UNKNOWN - WTF is return code 'ERROR'???

In unrelated news, I have now switched to keeping account information in flat files distributed by rcp. Replication agreements are for the fucking birds.

SECOND UPDATE: A second re-initialization of the client fixed the problem. In still yet unrelated news, I've submitted a patch to the Linux folks to eliminate UIDs entirely.

So THAT'S what it's doing

19 Dec 2012

Rebooting a KVM host at $WORK seems to take a long time -- as in, a long time to actually reboot the host after I type "shutdown -r now". But then the process list shows this:

root     26881  0.0  0.0  64328   824 ?        S    13:30   0:00 /bin/sh /etc/rc6.d/K01libvirt-guests stop
root     26882  0.0  0.0 130220  3504 ?        S    13:30   0:00 virsh managedsave 128b38e0-ce1a-eb4b-5ee5-2746cd0926ce
root     26890  0.0  0.0   8716  1084 ?        S    13:30   0:00 sh -c cat | { dd bs=4096 seek=1 if=/dev/null && dd bs=1048576; } 1<>/var/lib/libvirt/qemu/save//vm-01.example.com.save
root     26891  1.1  0.0   3808   440 ?        S    13:30   0:00 cat
root     26892  0.0  0.0   8716   576 ?        S    13:30   0:00 sh -c cat | { dd bs=4096 seek=1 if=/dev/null && dd bs=1048576; } 1<>/var/lib/libvirt/qemu/save//vm-01.example.com.save

And now I understand.

LISA12 Miscellany

15 Dec 2012

A collection of stuff that didn't fit anywhere else:

St Vidicon of Cathode. Only slightly spoiled by the disclaimer "Saint Vidicon and his story are the intellectual property of Christopher Stasheff."
A Vagrant box for OmniOS, the OpenSolaris distro I heard about at LISA.
A picture of Matt Simmons. When I took this, he was committing some PHP code and mumbling something like "All I gotta do is enable globals and everything'll be fine..."

Standalone PHP Programmer

I finished the scavenger hunt and won the USENIX Dart of Truth:

USENIX Dart of Truth

Theologians

14 Dec 2012

Where I'm going, you cannot come...
"Theologians", Wilco

At 2.45am, I woke up because a) my phone was buzzing with a page from work, and b) the room was shaking. I was quite bagged, since I'd been up 'til 1 finishing yesterday's blog entry, and all I could think was "Huh...earthquake. How did Nagios know about this?" Since the building didn't seem to be falling, I went back to sleep. In the morning, I found out it was a magnitude 6.2 earthquake.

I was going to go to the presentation by the CBC on "What your CDN won't tell you" (initially read as "What your Canadian won't tell you": "Goddammit, it's prounced BOOT") but changed my mind at the last minute and went to the Cf3 "Guru is in" session with Diego Zamboni. (But not before accidentally going to the Cf3 tutorial room; I made an Indiana Jones-like escape as Mark Burgess was closing the door.) I'm glad I went; I got to ask what people are doing for testing, and got a some good hints.

Vagrant's good for testing (and also awesome in general). I'm trying to get a good routine set up for this, but I have not started using the Cf3 provider for Vagrant...because of crack? Not sure.
You might want to use different directories in your revision control; that makes it easy to designate dev, testing, and production machines (don't have to worry about getting different branches; just point them at the directories in your repo).
Make sure you can promote different branches in an automated way (merging branches, whatever). It's easy to screw this up, and it's worth taking the time to make it very, very easy to do it right.
If you've got a bundle meant to fix a problem, deliberately break a machine to make sure it actually does fix the problem.
Consider using git + gerrit + jenkins to test and review code.

The Cf3 sketch tool still looks neat. The Enterprise version looked cool, too; it was the first time I'd seen it demonstrated, and I was intrigued.

At the break I got drugs^Wcold medication from Jennifer. Then I sang to Matt:

(and the sailors say) MAAAA-AAAT you're a FIIINNE girl what a GOOOD WAAAF you would be but my life, my love and my LAY-ee-daaaay is the sea (DOOOO doo doo DOO DOO doot doooooooo)

I believe Ben has video; I'll see if it shows up.

BTW, Matt made me sing "Brandy" to him when I took this picture:

Matt dreams of the sea

I discussed Yo Dawg Compliance with Ben ("Yo Dawg, I put an X in your Y so you could X when you Y"; == self-reference), and we decided to race each other to @YoDawgCompliance on Twitter. (Haha, I got @YoDawgCompliance2K. Suck it!)

(Incidentally, looking for a fully-YoDawg compliant ITIL implementation? Leverage @YoDawgCompliance2K thought leadership TODAY!)

Next up was the talk on the Greenfield HPC by @arksecond. I didn't know the term, and earlier in the week I'd pestered him for an explanation. Explanation follows: Greenfield is a term from the construction industry, and denotes a site devoid of any existing infrastructure, buildings, etc where one might do anything; Brownfield means a site where there is existing buildings, etc and you have to take those into account. Explanation ends. Back to the talk. Which was interesting.

They're budgeting 25 kW/rack, twice what we do. For cooling they use spot cooling, but they also were able to quickly prototype aisle containment with duct tape and cardboard. I laughed, but that's awesome: quick and easy, and it lets you play around and get it right. (The cardboard was replaced with plexiglass.)

Lunch was with Matt and Ken from FOO National Labs, then Sysad1138 and Scott. Regression was done, fun was had and phones were stolen.

The plenary! Geoff Halprin spoke about how DevOps has been done for a long time, isn't new and doesn't fix everything. Q from the audience: I work at MIT, and we turn out PhDs, not code; what of this applies to me? A: In one sense, not much; this is not as relevant to HPC, edu, etc; not everything looks like enterprise setups. But look at the techniques, underlying philosophy, etc and see what can be taken.

That's my summary, and the emphasis is prob. something he'd disagree with. But it's Friday as I write this and I am tired as I sit in the airport, bone tired and I want to be home. There are other summaries out there, but this one is mine.

Flair

13 Dec 2012

Silly simple lies
They made a human being out of you...
"Flair", Josh Rouse

Thursday I gave my Lightning Talk. I prepared for it by writing it out, then rehearsing a couple times in my room to get it down to five minutes. I think it helped, since I got in about two seconds under the wire. I think I did okay; I'll post it separately. Pic c/o Bob the Viking:

Lightning Talk

Some other interesting talks:

@perlstalker on his experience with Ceph (he's happy);
@chrisstpierre on why XML is good for (it's code with a built-in validator; don't use it for setting syslog levels);
the guy who wanted to use retired aircraft carriers as floating data centres;
Dustin on MozPool (think cloud for Panda Boards);
Stew (@digitalcrow) on Machination, his homegrown hierarchical config management tool (users can set their preferences; if needed for the rest of their group, it can be promoted up the hierarchy as needed);
Derek Balling on megacity.org/timeline (keep your fingers crossed!);
a Google dev on his experience bringing down GMail.

Afterward I went to the vendor booths again, and tried the RackSpace challenge: here's a VM and it's root password; it needs to do X, Y and Z. GO. I was told my time wasn't bad (8.5 mins; wasn't actually too hard), and I may actually win something. Had lunch with John again and discussed academia, fads in theoretical computer science and the like.

The afternoon talk on OmniOS was interesting; it's an Illumos version/distro with a rigourous update schedule. The presenter's company uses it in a LOT of machines, and their customers expect THEM to fix any problems/security problems...not say "Yeah, the vendor's patch is coming in a couple weeks." Stripped down; they only include about 110 packages (JEOS: "Just Enough Operating System") in the default install. "Holy wars" slide: they use IPS ("because ALL package managers suck") and vi (holler from audience: "Which one?"). They wrote their own installer: "If you've worked with OpenSolaris before, you know that it's actually pretty easy getting it to work versus fucking getting it on the disk in the first place."

At the break I met with Nick Anderson (@cmdln_) and Diego Zamboni (@zzamboni, author of "Learning Cfengine 3"). Very cool to meet them both, particularly as they did not knee me in the groin for my impertinence in criticising of Cf3 syntax. Very, very nice and generous folk.

The next talk, "NSA on the Cheap", was one I'd already heard from the USENIX conference in the summer (downloaded the MP3), so I ended up talking to Chris Allison. I met him in Baltimore on the last day, and it turns out he's Matt's coworker (and both work for David Blank-Edelman). And when he found out that Victor was there (we'd all gone out on our last night in Baltimore) he came along to meet him. We all met up, along with Victor's wife Jennifer, and caught up even more. (Sorry, I'm writing this on Friday; quality of writing taking a nosedive.)

And so but Victor, Jennifer and I went out to Banker's Hill, a restaurant close to the hotel. Very nice chipotle bacon meatloaf, some excellent beer, and great conversation and company. Retired back to the hotel and we both attended the .EDU BoF. Cool story: someone who's unable to put a firewall on his network (he's in a department, not central IT, so not an option for him) woke up one day to find his printer not only hacked, but the firmware running a proxy of PubMed to China ("Why is the data light blinking so much?"). Not only that, but he couldn't upgrade the firmware because the firmware reloading code had been overwritten.

Q: How do you know you're dealing with a Scary Viking Sysadmin?

A: Service monitoring is done via two ravens named Huginn and Muninn.

The White Trash Period Of My Life

13 Dec 2012

Careful with words -- they are so meaningful
Yet they scatter like the booze from our breath...
"The White Trash Period Of My Life", Josh Rouse

I woke up at a reasonable time and went down to the lobby for free wireless; finished up yesterday's entry (2400 words!), posted and ate breakfast with Andy, Alf ("I went back to the Instagram hat store yesterday and bought the fedora. But now I want to accessorize it") and...Bob in full Viking drag.

Bob the Scary Viking Sysadmin

Andy: "Now you...you look like a major in the Norwegian army."

Off to the Powershell tutorial. I've been telling people since that I like two things from Microsoft: the Natural Keyboard, and now Powershell. There are some very, very nice features in there:

common args/functions for each command, provided by the PS library
directory-like listings for lots of things (though apparently manipulating the registry through PS is sub-optimal); feels Unix/Plan 9-like
$error contains all the errors in your interactive cycle
"programming with hand grenades": because just 'bout everything in PS is an object, you can pass that along through a pipe and the receiving command explodes it and tries to do the right thing.

My notes are kind of scattered: I was trying to install version 3 (hey MS: please make this easier), and then I got distracted by something I had to do for work. But I also got to talk to Steve Murawski, the instructor, during the afternoon break, as we were both on the LOPSA booth. I think MS managed to derive a lot of advantage from being the last to show up at the party.

Interestingly, during the course I saw on Twitter that Samba 4 has finally been released. My jaw dropped. It looks like there are still some missing bits, but it can be an AD now. [Keanu voice] Whoah.

During the break I helped staff the LOPSA booth and hung out with a syadmin from NASA; one of her users is a scientist who gets data from the ChemCam (I think) on Curiosity. WAH.

The afternoon's course was on Ganeti, given by Tom Limoncelli and Guido Trotter. THAT is my project for next year: migrating my VMs, currently on one host, to Ganeti. It seems very, very cool. And on top of that, you can test it out in VirtualBox. I won't put in all my notes, since I'm writing this in a hurry (I always fall behind as the week goes on) and a lot of it is avail on the documentaion. But:

You avoid needing a SAN by letting it do DRBD on different pairs of nodes. Need to migrate a machine? Ganeti will pass it over to the other pair.
If you've got a pair of machines (which is about my scale), you've just gained failover of your VMs. If you've got more machines, you can declare a machine down (memory starts crapping out, PS failing, etc) and migrate the machines over to their alternate. When the machine's back up, Ganeti will do the necessary to get the machine back in the cluster (sync DRBDs, etc).
You can import already-existing VMs (Tom: "Thank God for summer interns.")
There's a master, but there are master candidates ready to take over if requested or if the master becomes unavailable.
There's a web manager to let users self-provision. There's also Synnefo, a AWS-like web FE that's commercialized as Okeanos.io (free trial: 3-hour lifetime VMs)

I talked with Scott afterward, and learned something I didn't know: NFS over GigE works fine for VM images. Turn on hard mounts (you want to know when something goes wrong), use TCP, use big block sizes, but it works just fine. This changes everything.

In the evening the bar was full and the hotel restaurant was definitely outside my per diem, so I took a cab downtown to the Tipsy Crow. Good food, nice beer, and great people watching. (Top tip for Canadians: here, the hipsters wear moustaches even when it's not Movember. Prepare now and get ahead of the curve.) Then back to the hotel for the BoFs. I missed Matt's on small infrastructure (damn) but did make the amateur astronomy BoF, which was quite cool. I ran into John Hewson, my roommate from the Baltimore LISA, and found out he's presenting tomorrow; I'll be there for that.

Q: How do you know you're with a Scary Viking Sysadmin?

A: Prefaces new cool thing he's about to show you with "So I learned about this at the last sysadmin Althing...."

Handshake Drugs

13 Dec 2012

And if I ever was myself,
I wasn't that night...
"Handshake Drugs", Wilco

Wednesday was opening day: the stats (1000+ attendees) and the awards (the Powershell devs got one for "bringing the power of automated system administration to Windows, where it previously largely unsupported"). Then the keynote from Vint Cerf, co-designer of TCP and yeah. He went over a lot of things, but made it clear he was asking questions, not proposing answers. Many cool quotes, including: "TCP/IP runs over everything, including you if you're not paying attention." Discussed the recent ITU talks a lot, and what exactly he's worried about there. Grab the audio/watch the video.

Next talk was about a giant scan of the entire Internet (/0) for SIP servers. Partway through my phone rang and I had to take it, but by the time I got out to the hall it'd stopped and it turned out to be a wrong number anyway. Grr.

IPv6 numbering strategies was next. "How many hosts can you fit in a /48? ALL OF THEM." Align your netblocks by nibble boundaries (hex numbers); it makes visual recognition of demarcation so much easier. Don't worry about packing addresses, because there's lots of room and why complicate things? You don't want to be doing bitwise math in the middle of the night.

Lunch, and the vendor tent. But first an eye-wateringly expensive burrito -- tasty, but $9. It was NOT a $9-sized burrito. I talked to the CloudStack folks and the Ceph folks, and got cool stuff from each. Both look very cool, and I'm going to have to look into them more when I get home. Boxer shorts from the Zenoss folks ("We figured everyone had enough t-shirts").

I got to buttonhole Mark Burgess, tell him how much I'm grateful for what he's done but OMG would he please do something about the mess of brackets. Like the Wordpress sketch:

commands:
  !wordpress_tarball_is_present::
    "/usr/bin/wget -q -O $($(params)[_tarfile]) $($(params)[_downloadurl])"
      comment => "Downloading latest version of WordPress.";

His response, as previously, was "Don't do that, then." To be fair, I didn't have this example and was trying to describe it verbally ("You know, dollar bracket dollar bracket variable square bracket...C'mon, I tweeted about it in January!"). And he agreed yes, it's a problem, but it's in the language now, and indirection is a problem no matter what. All of which is true, and I realize it's easy for me to propose work for other people without coming up with patches. And I let him know that this was a minor nit, that I really was grateful for Cf3. So there.

I got to ask Dru Lavigne about FreeBSD's support for ZFS (same as Illumos) and her opinion of DragonflyBSD (neat, thinks of it as meant for big data rather than desktops, "but maybe I'm just old and crotchety").

I Talked with a PhD student who was there to present a paper. He said it was an accident he'd done this; he's not a sysadmin, and though his nominal field is CS, he's much more interested in improving the teaching of undergraduate students. ("The joke is that primary/secondary school teachers know all about teaching and not so much about the subject matter, and at university it's the other way around."). In CompSci it's all about the conferences -- that's where/how you present new work, not journals (Science, Nature) like the natural sciences. What's more, the prestigious conferences are the theoretical ones run by the ACM and the IEEE, not a practical/professional one like LISA. "My colleagues think I'm slumming."

Off to the talks! First one was a practice and experience report on the config and management of a crapton (700) iPads for students at an Australian university. The iPads belonged to the students -- so whatever profile was set up had to be removable when the course was over, and locking down permanently was not an option.

No suitable tools for them -- so they wrote their own. ("That's the way it is in education.") Started with Django, which the presenter said should be part of any sysadmin's toolset; easy to use, management interface for free. They configured one iPad, copied the configuration off, de-specified it with some judicious search and replace, and then prepared it for templating in Django. To install it on the iPad, the students would connect to an open wireless network, auth to the web app (which was connected to the university LDAP), and the iPad would prompt them to install the profile.

The open network was chosen because the secure network would require a password....which the iPad didn't have yet. And the settings file required an open password in it for the secure wireless to work. The reviewers commented on this a lot, but it was a conscious decision: setting up the iPad was one of ten tasks done on their second day, and a relatively technical one. And these were foreign students, so language comprehension was a problem. In the end, they felt it was a reasonable risk.

John Hewson was up next, talking about ConfSolve, his declarative configuration language connected to/written with a constraint solver. ("Just cross this red wire with this blue wire...") John was my roommate at the Baltimore LISA, and it was neat to see what he's been working on. Basically, you can say things like "I want this VM to have 500 GB of disk" and ConfSolve will be all like, "Fuck you, you only have 200 GB of storage left". You can also express hard limits and soft preferences ("Maximize memory use. It'd be great if you could minimise disk space as well, but just do your best"). This lets you do things like cloudbursting: "Please keep my VMs here unless things start to suck, in which case move my web, MySQL and DNS to AWS and leave behind my SMTP/IMAP."

After his presentation I went off to grab lunch, then back to the LISA game show. It was surprisingly fun and funny. And then, Matt and I went to the San Diego Maritime Museum, which was incredibly awesome. We walked through The Star of India, a huge three-masted cargo ship that still goes out and sails. There were actors there doing Living History (you could hear the caps) with kids, and displays/dioramas to look at. And then we met one of the actors who told us about the ship, the friggin' ENORMOUS sails that make it go (no motor), and about being the Master at Arms in the movie "Master and Commander". Which was our cue to head over to the HMS Surprise, used in the filming thereof. It's a replica, but accurate and really, really neat to see. Not nearly as big as the Star of India, and so many ropes...so very, very many ropes. And after that we went to a Soviet (!) Foxtrot-class submarine, where we had to climb through four circular hatches, each about a metre in diameter. You know how they say life in a submarine is claustrophobic? Yeah, they're not kidding. Amazing, and I can't recommend it enough.

We walked back to the hotel, got some food and beer, and headed off to the LOPSA annual meeting. I did not win a prize. Talked with Peter from the University of Alberta about the lightning talk I promised to do the next day about reproducible science. And thence to bed.

Q: How do you know you're with a Scary Viking Sysadmin?

A: When describing multiple roles at the office, says "My other hat is made of handforged steel."

Christmas With Jesus

10 Dec 2012

And my conscience has it stripped down to science
Why does everything displease me?
Still, I'm trying...

"Christmas with Jesus", Josh Rouse

At 3am my phone went off with a page from $WORK. It was benign, but do you think I could get back to sleep? Could I bollocks. I gave up at 5am and came down to the hotel lobby (where the wireless does NOT cost $11/day for 512 Kb/s, or $15 for 3Mb/s) to get some work done and email my family. The music volume was set to 11, and after I heard the covers of "Living Thing" (Beautiful South) and "Stop Me If You Think That You've Heard This One Before" (Marc Ronson; disco) I retreated back to my hotel room to sit on my balcony and watch the airplanes. The airport is right by both the hotel and the downtown, so when you're flying in you get this amazing view of the buildings OH CRAP RIGHT THERE; from my balcony I can hear them coming in but not see them. But I can see the ones that are, I guess, flying to Japan; they go straight up, slowly, and the contrail against the morning twilight looks like rockets ascending to space. Sigh.

Abluted (ablated? hm...) and then down to the conference lounge to stock up on muffins and have conversations. I talked to the guy giving the .EDU workshop ("What we've found is that we didn't need a bachelor's degree in LDAP and iptables"), and with someone else about kids these days ("We had a rich heritage of naming schemes. Do you think they're going to name their desktop after Lord of the Rings?" "Naw, it's all gonna be Twilight and Glee.")

Which brought up another story of network debugging. After an organizational merger, network problems persisted until someone figured out that each network had its own DNS servers that had inconsistent views. To make matters worse, one set was named Kirk and Picard, and the other was named Gandalf and Frodo. Our Hero knew then what to do, and in the post-mortem Root Cause Diagnosis, Executive Summary, wrote "Genre Mismatch." [rimshot]

(6.48 am and the sun is rising right this moment. The earth, she is a beautiful place.)

And but so on to the HPC workshop, which intimidated me. I felt unprepared. I felt too small, too newbieish to be there. And when the guy from fucking Oak Ridge got up and said sheepishly, "I'm probably running one of the smaller clusters here," I cringed. But I needn't have worried. For one, maybe 1/3rd of the people introduced themselves as having small clusters (smallest I heard was 10 nodes, 120 cores), or being newbies, or both. For two, the host/moderator/glorious leader was truly excellent, in the best possible Bill and Ted sense, and made time for everyone's questions. For three, the participants were also generous with time and knowledge, and whether I asked questions or just sat back and listened, I learned so much.

Participants: Oak Ridge, Los Alamos, a lot of universities, and a financial trading firm that does a lot of modelling and some really interesting, regulatory-driven filesystem characteristics: nothing can be deleted for 7 years. So if someone's job blows up and it litters the filesystem with crap, you can't remove the files. Sure, they're only 10-100 MB each, but with a million jobs a day that adds up. You can archive...but if the SEC shows up asking for files, they need to have them within four hours.

The guy from Oak Ridge runs at least one of his clusters diskless: less moving parts to fail. Everything gets saved to Lustre. This became a requirement when, in an earlier cluster, a node failed and it had Very Important Data on a local scratch disk, and it took a long time to recover. The PI (==principal investigator, for those not from an .EDU; prof/faculty member/etc who leads a lab) said, "I want to be able to walk into your server room, fire a shotgun at a random node, and have it back within 20 minutes." So, diskless. (He's also lucky because he gets biweekly maintenance windows. Another admin announces his quarterly outages a year in advance.)

There were a lot of people who ran configuration management (Cf3, Puppet, etc) on their compute nodes, which surprised me. I've thought about doing that, but assumed I'd be stealing precious CPU cycles from the science. Overwhelming response: Meh, they'll never notice. OTOH, using more than one management tool is going to cause admin confusion or state flapping, and you don't want to do that.

One guy said (both about this and the question of what installer to use), "Why are you using anything but Rocks? It's federally funded, so you've already paid for it. It works and it gets you a working cluster quickly. You should use it unless you have a good reason not to." "I think I can address that..." (laughter) Answer: inconsistency with installations; not all RPMs get installed when you're doing 700 nodes at once, so he uses Rocks for a bare-ish install and Cf3 after that -- a lot like I do with Cobbler for servers. And FAI was mentioned too, which apparently has support for CentOS now.

One .EDU admin gloms all his lab's desktops into the cluster, and uses Condor to tie it all together. "If it's idle, it's part of the cluster." No head node, jobs can be submitted from anywhere, and the dev environment matches the run environment. There's a wide mix of hardware,so part of user education a) is getting people to specify minimal CPU and memory requirements and b) letting them know that the ideal job is 2 hours long. (Actually, there were a lot of people who talked about high-turnover jobs like that, which is different from what I expected; I always thought of HPC as letting your cluster go to town for 3 weeks on something. Perhaps that's a function of my lab's work, or having a smaller cluster.)

User education was something that came up over and over again: telling people how to efficiently use the cluster, how to tweak settings (and then vetting jobs with scripts).

I asked about how people learned about HPC; there's not nearly the wealth of resources that there are for programming, sysadmin, networking, etc. Answer: yep, it's pretty quiet out there. Mailing lists tend to be product-specific (though are pretty excellent), vendor training is always good if you can get it, but generally you need to look around a lot. ACM has started a SIG for HPC.

I asked about checkpointing, which was something I've been very fuzzy about. Here's the skinny:

Checkpointing is freezing the process so that you can resurrect it later. It protects against node failures (maybe with automatic moving of the process/job to another node if one goes down) and outages (maybe caused by maintenance windows.)
Checkpointing can be done at a few different layers:
- the app itself
- the scheduler (Condor can do this; Torque can't)
- the OS (BLCR for Linux, but see below)
- or just suspending a VM and moving it around; I was unclear how ``` many people did this.


* The easiest and best by far is for the app to do it.  It knows its
  state intimately and is in the best position to do this.  However,
  the app needs to support this.  Not necessary to have it explicitly
  save the process (as in, kernel-resident memory image, registers,
  etc); if it can look at logs or something and say "Oh, I'm 3/4
  done", then that's good too.

* The Condor scheduler supports this, *but* you have to do this by
  linking in its special libraries when you compile your program.  And
  none of the big vendors do this (Matlab, Mathematica, etc).

* BLCR: "It's 90% working, but the 10% will kill you." Segfaults,
  restarts only work 2/3 of the time, etc.  Open-source project from a
  federal lab and until very recently not funded -- so the response to
  "There's this bug..." was "Yeah, we're not funded. Can't do nothing
  for you." Funding has been obtained recently, so keep your fingers
  crossed.

One admin had problems with his nodes:  random slowdowns, not caused
by cstates or the other usual suspects.  It's a BIOS problem of some
sort and they're working it out with the vendor, but in the meantime
the only way around it is to pull the affected node and let the power
drain completely.  This was pointed out by a user ("Hey, why is my job
suddenly taking so long?") who was clever enough to write a
dirt-simple 10 million iteration for-loop that very, very obviously
took a lot longer on the affected node than the others.  At this point
I asked if people were doing regular benchmarking on their clusters to
pick up problems like this.  Answer: no.  They'll do benchmarking on
their cluster when it's stood up so they have something to compare it
to later, but users will unfailingly tell them if something's slow.

I asked about HPL; my impression when setting up the cluster was, yes,
benchmark your own stuff, but benchmark HPL too 'cos that's what you
do with a cluster.  This brought up a host of problems for me, like
compiling it and figuring out the best parameters for it.  Answers:

* Yes, HPL is a bear.  Oak Ridge: "We've got someone for that and
  that's all he does."  (Response: "That's your answer for everything
  at Oak Ridge.")

* Fiddle with the params P, Q and N, and leave the rest alone.  You
  can predict the FLOPS you should get on your hardware, and if you
  get 90% or so within that you're fine.

* HPL is not that relevant for most people, and if you tune your
  cluster for linear algebra (which is what HPL does) you may get
  crappy performance on your real work.

* You can benchmark it if you want (and download Intel's binary if you
  do; FIXME: add link), but it's probably better and easier to stick
  to your own apps.

Random:

* There's a significant number of clusters that expose interactive
  sessions to users via qlogin; that had not occurred to me.

* Recommended tools:
  * ubmod: accounting graphs
  * Healthcheck scripts (Werewolf)
  * stress: cluster stress test tool
  * munin: to collect arbitrary info from a machine
  * collectl: good for ie millisecond resolution of traffic spikes

* "So if a box gets knocked over -- and this is just anecdotal -- my
  experience is that the user that logs back in first is the one who
  caused it."

* A lot of the discussion was prompted by questions like "Is anyone
  else doing X?" or "How many people here are doing Y?"  Very helpful.

* If you have to return warranty-covered disks to the vendor but you
  really don't want the data to go, see if they'll accept the metal
  cover of the disk.  You get to keep the spinning rust.

* A lot of talk about OOM-killing in the bad old days ("I can't tell
  you how many times it took out init.").  One guy insisted it's a lot
  better now (3.x series).

* "The question of changing schedulers comes up in my group every six
  months."

* "What are you doing for log analysis?" "We log to /dev/null."
  (laughter) "No, really, we send syslog to /dev/null."

* Splunk is eye-wateringly expensive: 1.5 TB data/day =~ $1-2 million
  annual license.

* On how much disk space Oak Ridge has:  "It's...I dunno, 12 or 13 PB?
  It's 33 tons of disks, that's what I remember."

* Cheap and cheerful NFS:  OpenSolaris or FreeBSD running ZFS. For
  extra points, use an Aztec Zeus for a ZIL: a battery-backed 8GB
  DIMM that dumps to a compact flash card if the power goes out.

* Some people monitor not just for overutilization, but for
  underutilization: it's a chance for user education ("You're paying
  for my time and the hardware; let me help you get the best value for
  that").  For Oak Ridge, though, there's less pressure for that:
  scientists get billed no matter what.

* "We used to blame the network when there were problems.  Now their
  app relies on SQL Server and we blame that."

* Sweeping for expired data is important.  If it's scratch, then
  *treat* it as such: negotiate expiry dates and sweep regularly.

* Celebrity resemblances: Michael Moore and the guy from Dead Poet's
  Society/The Good Wife.  (Those are two different sysadmins, btw.)

* Asked about my .TK file problem; no insight.  Take it to the lists.
  (Don't think I've written about this, and I should.)

* On why one lab couldn't get Vendor X to supply DKMS kernel modules
  for their hardware:  "We're three orders of magnitude away from
  their biggest customer.  We have *no* influence."

* Another vote for SoftwareCarpentry.org as a way to get people up to
  speed on Linux.

* A lot of people encountered problems upgrading to Torque 4.x and
  rolled back to 2.5.  "The source code is disgusting.  Have you ever
  looked at it?  There's 15 years of cruft in there. The devs
  acknowledged the problem and announced they were going to be taking
  steps to fix things. One step: they're migrating to C++.
  [Kif sigh]"

* "Has anyone here used Moab Web Services? It's as scary as it sounds.
  Tomcat...yeah, I'll stop there." "You've turned the web into RPC. Again."

* "We don't have regulatory issues, but we do have a
  physicist/geologist issue."

* 1/3 of the Top 500 use SLURM as a scheduler.  Slurm's srun =~
  Torque's pdbsh; I have the impression it does not use MPI (well,
  okay, neither does Torque, but a lot of people use Torque + mpirun),
  but I really need to do more reading.

* lmod (FIXME: add link) is a Environment Modules-compatible (works
  with old module files) replacement that fixes some problems with old
  EM, actively developed, written in lua.

* People have had lots of bad experiences with external Fermi GPU
  boxes from Dell, particularly when attached to non-Dell equipment.

* Puppet has git hooks that let you pull out a particular branch on a node.

And finally:

Q: How do you know you're with a Scary Viking Sysadmin?

A: They ask for Thor's Skullsplitter Mead at the Google Bof.

Hotel Arizona

10 Dec 2012

Hotel in Arizona made us all wanna feel like stars...
"Hotel Arizona", Wilco

Sunday morning I was down in the lobby at 7.15am, drinking coffee purchased with my $5 gift certificate from the hotel for passing up housekeeping ("Sheraton Hotels Green Initiative"). I registered for the conference, came back to my hotel room to write some more, then back downstairs to wait for my tutorial on Amazon Web Services from Bill LeFebvre (former LISA chair and author of top(1)) and Marc Chianti. It was pretty damned awesome: an all-day course that introduced us to AWS and the many, many services they offer. For reasons that vary from budgeting to legal we're unlikely to move anything to AWS at $WORK, but it was very, very enlightening to learn more about it. Like:

Amazon lights up four new racks a day, just keeping up with increased demand.
Their RDS service (DB inna box) will set up replication automagically AND apply patches during configurable regular downtime. WAH.
vmstat(1) will, for a VM, show CPU cycles stolen by/for other VMs in the ST column
Amazon will not really guarantee CPU specs, which makes sense (you're on guest on a host of 20 VMs, many hardware generations, etc). One customer they know will spin up a new instance and immediately benchmark it to see if performance is acceptable; if not, they'll destroy it and try again.
Netflix, one of AWS' biggest customers, does not use EBS (persistent) storage for its instances. If there's an EBS problem -- and this probably happens a few times a year -- they keep trucking.
It's quite hard to "burst into the cloud" -- to use your own data centre most of the time, then move stuff to AWS at Xmas, when you're Slashdotted, etc. The problem is: where's your load balancer? And how do you make that available no matter what?

One question I asked: How would you scale up an email service? 'Cos for that, you don't only need CPU power, but (say) expanded disk space, and that shared across instances. A: Either do something like GlusterFS on instances to share FS, or just stick everything in RDS (AWS' MySWL service) and let them take care of it.

The instructors know their stuff and taught it well. If you have the chance, I highly recommend it.

Lunch/Breaks:

Met someone from Mozilla who told me that they'd just decommissioned the last of their community mirrors in favour of CDNs -- less downtime. They're using AWS for a new set of sites they need in Brazil, rather than opening up a new data centre or some such.
Met someone from a flash sale site: they do sales every day at noon, when they'll get a million visitors in an hour, and then it's quiet for the next 23 hours. They don't use AWS -- they've got enough capacity in their data centre for this, and they recently dropped another cloud provider (not AWS) because they couldn't get the raw/root/hypervisor-level performance metrics they wanted.
Saw members of (I think) this show choir wearing spangly skirts and carrying two duffel bags over each shoulder, getting ready to head into one of the ballrooms for a performance at a charity lunch.
Met a sysadmin from a US government/educational lab, talking about fun new legal constraints: to keep running the lab, the gov't required not a university but a LLC. For SLAC, that required a new entity called SLAC National Lab, because Stanford was already trademarked and you can't delegate a trademark like you can DNS zones. And, it turns out, we're not the only .edu getting fuck-off prices from Oracle. No surprise, but still reassuring.
I saw Matt get tapped on the shoulder by one of the LISA organizers and taken aside. When he came back to the table he was wearing a rubber Nixon mask and carrying a large clanking duffel bag. I asked him what was happening and he said to shut up. I cried, and he slapped me, then told me he loved me, that it was just one last job and it would make everything right. (In the spirit of logrolling, here he is scoping out bank guards:

Matt scoping out bank guards

Where does the close bracket go?)

After that, I ran into my roommate from the Baltimore LISA in 2009 (check my tshirt...yep, 2009). Very good to see him. Then someone pointed out that I could get free toothpaste at the concierge desk, and I was all like, free toothpaste?

And then who should come in but Andy Seely, Tampa Bay homeboy and LISA Organizing Committee member. We went out for beer and supper at Karl Strauss (tl;dr: AWESOME stout). Discussed fatherhood, the ageing process, free-range parenting in a hanger full of B-52s, and just how beer is made. He got the hang of it eventually:

Andy and beer

I bought beer for my wife, he took a picture of me to show his wife, and he shared his toothpaste by putting it on a microbrewery coaster so I didn't have to pay $7 for a tube at the hotel store, 'cos the concierge was out of toothpaste. It's not a euphemism.

Q: How do you know you're with a Scary Viking Sysadmin?

A: They insist on hard drive destruction via longboat funeral pyre.

(nothinsevergonnagetinmyway) Again

09 Dec 2012

Wasted days, wasted nights
Try to downplay being uptight...
-- "(nothinsevergonnastandinmyway) Again", Wilco

Saturday I headed out the door at 5.30am -- just like I was going into work early. I'd been up late the night before finishing up "Zone One" by Colson Whitehead, which ZOMG is incredible and you should read, but I did not want to read while alone and feeling discombobulated in a hotel room far from home. Cab to the airport, and I was suprised to find I didn't even have to opt out; the L3 scanners were only being used irregularly. I noticed the hospital curtains set up for the private screening area; it looked a bit like God's own shower curtain.

The customs guard asked me where I was going, and whether I liked my job. "That's important, you know?" Young, a shaved head and a friendly manner. Confidential look left, right, then back at me. "My last job? I knew when it was time to leave that one. You have a good trip."

The gate for the airline I took was way out on a side wing of the airport, which I can only assume meant that airline lost a coin toss or something. The flight to Seattle was quick and low, so it wasn't until the flight to San Diego that a) we climbed up to our cruising altitude of $(echo "39000/3.3" | bc) 11818 meters and b) my ears started to hurt. I've got a cold and thought that my aggressive taking of cold medication would help, but no. The first seatmate had a shaved head, a Howie Mandel soul patch, a Toki watch and read "Road and Track" magazine, staring at the ads for mag wheels; the other seatmate announced that he was in the Navy, going to his last command, and was going to use the seat tray as a headrest as soon as they got to cruising. "I was up late last night, you know?" I ate my Ranch Corn Nuggets (seriously).

Once at the hotel, I ran into Bob the Norwegian, who berated me for being surprised that he was there. "I've TOLD you this over and over again!" Not only that, but he was there with three fellow Norwegian sysadmins, including his minion. I immediately started composing Scary Viking Sysadmin questions in my head; you may begin to look forward to them.

We went out to the Gaslamp district of San Diego, which reminds me a lot of Gastown in Vancouver; very familiar feel, and a similar arc to its history. Alf the Norwegian wanted a hat for cosplay, so we hit two -- TWO -- hat stores. The second resembled nothing so much as a souvenir shop in a tourist town, but the first was staffed by two hipsters looking like they'd stepped straight out of Instagram:

Hipster Hat Shop

They sold $160 Panama hats. I very carefully stayed away from the merchandise. Oh -- and this is unrelated -- from the minibar in my hotel room:

Mini bar fees

We had dinner at a restaurant whose name I forget; stylish kind of place, with ten staff members (four of whom announced, separately, that they would be our server for the night). They seemed disappointed when I ordered a Wipeout IPA ("Yeah, we're really known more for our Sangria"), but Bob made up for it by ordering a Hawaiian Hoo-Hoo:

What a Scary Viking Sysadmin drinks

We watched the bar crawlers getting out of cabs dressed in Sexy Santa costumes ("The 12 Bars of Xmas Pub Crawl 2012") and discussed Agile Programming (which phrase, when embedded in a long string of Norwegian, sounds a lot like "Anger Management".)

Q: How do you know you're with a Scary Viking Sysadmin?

A: They explain the difference between a fjord and a fjell in terms of IPv6 connectivity.

There was also this truck in the streets, showing the good folks of San Diego just what they were missing by not being at home watching Fox Sports:

Fox Sports

We headed back to the hotel, and Bob and I waited for Matt to show up. Eventually he did, with Ben Cotton in tow (never met him before -- nice guy, gives Matt as much crap as I do -> GOOD) and Matt regaled us with tales of his hotel room:

Matt: So -- I don't wanna sound special or anything -- but is your room on the 7th floor overlooking the pool and the marina with a great big king-sized bed? 'Cos mine is.

Me: Go on.

Matt: I asked the guy at the desk when I was checking in if I could get a king-size bed instead of a double --

Me: "Hi, I'm Matt Simmons. You may know me from Standalone Hyphen Sysadmin Dot Com?"

Ben: "I'm kind of a big deal on the Internet."

Matt: -- and he says sure, but we're gonna charge you a lot more if you trash it.

Not Matt's balcony:

Not Matt's balcony

(UPDATE: Matt read this and said "Actually, I'm on the 9th floor? Not the 7th." saintaardvarkthecarpeted.com regrets the error.)

I tweeted from the bar using my laptop ("It's an old AOLPhone prototype"). It was all good.

Tampa Bay Breakfasts

05 Dec 2012

My friend Andy, who blogs at Tampa Bay Breakfasts, got an article written about him here. Like his blog, it's good reading. You should read both.

He's also a sysadmin who's on the LISA organizing committee this year, and I'm going to be seeing him in a few days when I head down to San Diego. The weather is looking shockingly good for this Rain City inhabitant. I'm looking forward to it. Now I just have to pick out my theme band for this year's conference....I'm thinking maybe Josh Rouse.

Standalone bundles in Cf3

03 Dec 2012

I always seem to forget how to do this, but it's actually pretty simple. Assume you want to test a new bundle called "test", and it's in a file called "test.cf". First, make sure your file has a control stanza like this:

body common control {
  inputs => { "/var/cfengine/inputs/cfengine_stdlib.cf" } ;
  bundlesequence => { "test" } ;
}

Note:

inputs must not include the file "test.cf" itself -- otherwise, you'll get the error "Redefinition of body "control" for "common" is a broken promise, near token '{'. Think of "inputs" as really being named "additional inputs".
I'm including the cfengine_stdlib.cf file. You should too.
bundlesequence is set to your bundle (which I'm leaving out of this entry for simplicity).

Second, invoke it like so:

sudo /var/cfeing/bin/cf-agent -KI -f /path/to/test.cf

Note:

-K means "run no matter how soon after the last time it was run."
-I shows a list of promises repaired.
-f gives the path to the file you're testing.

Niet zo goed

30 Nov 2012

So yesterday I got an email from another sysadmin: "Hey, looks like there's a lot of blocked connections to your server X. Anything happening there?" Me: "Well, I moved email from X to Y on Tuesday...but I changed the MX to point at Y. What's happening there?"

Turns out I'd missed a fucking domain: I'd left the MX pointing to the old server instead of moving it to the new one. And when I turned off the mail server on the old domain, delivery to this domain stopped. Fortunately I was able to get things going again: I changed the MX to point at the new server, and turned on the old server again to handle things until the new record propogated.

So how in hell did this happen? I can see two things I did wrong:

Poor planning: my plans and checklists included all the steps I needed to do, but did not mention the actual domains being moved. I relied on memory, which meant I remembered (and tested) two and forgot the third. I should have included the actual domains: both a note to check the settings and a test of email delivery.
No email delivery check by Nagios: Nagios checks that the email server is up, displays a banner and so on, but does not check actual email delivery for the domains I'm responsible for. There's a plugin for that, of course, and I'm going to be adding that.

I try to make a point of writing down things that go bad at $WORK, along with things that go well. This is one of those things.

Sniff

30 Nov 2012

Okay, this made me cry.

A sub for Cf3

28 Nov 2012

When sub was released by 37signals, I liked it a lot. Over the last couple of months I've been putting together a sub for Cfengine. Now it's up on Github, and of course my own repo. It's not pretty, but there are some pretty handy things in there. Enjoy!

Things out from under my desk FTMFW

28 Nov 2012

Yesterday I finally moved the $WORK mail server (well, services) from a workstation under my desk to a proper VM and all. Mailman, Postfix, Dovecot -- all went. Not only that, but I've got them running under SELinux no less. Woot!

Next step was to update all the documentation, or at least most of it, that referred to the old location. In the process I came across something I'd written in advance of the last time I went to LISA: "My workstation is not important. It does no services. I mention this so that no one will panic if it goes down."

Whoops: not true! While migrating to Cfengine 3, I'd set up the Cf3 master server on my workstation. After all, it was only for testing, right? Heh. We all know how that goes. So I finally bit the bullet and moved it over to a now-even-more-important VM (no, not the mail server) and put the policy files under /masterfiles so that bootstrapping works. Now we're back to my workstation only holding my stuff. Hurrah!

And did I mention that I'm going to LISA? True story. Sunday I'm doing Amazon Web Services training; Monday I'm in the HPC workshop; Tuesday I'm doing Powershell Fundamentals (time to see how the other half lives, and anyway I've heard good things about Powershell) and Ganeti (wanted to learn about that for a while). As for the talks: I'm not as overwhelmed this year, but the Vint Cerf speech oughta be good, and anyhow I'm sure there will be lots I can figure out on the day.

Completely non-techrelated link of the day: "On Drawing". This woman is an amazing writer.

Deploying SELinux modules from Cfengine

23 Nov 2012

Back in January, yo, I wrote about trying to figure out how to use Cfengine3 to do SELinux tasks; one of those was pushing out SELinux modules. These are encapsulated bits of policy, usually generated by piping SELinux logs to the audit2allow command. audit2allow usually makes two files: a source file that's human-readable, and a sorta-compiled version that's actually loaded by semodule.

So how do you deploy this sort of thing on multiple machines? One option would be to copy around the compiled module...but while that's technically possible, the SELinux developers don't guarantee it'll work (link lost, sorry). The better way is to copy around the source file, compile it, and then load it.

SANSNOC used this approach in puppet. I contacted them to ask if it was okay for me to copy their approach/translate their code to Cf3, and they said go for it. Here's my implementation:

bundle agent add_selinux_module(module) {
  # This whole approach copied/ported from the SANS Institute's puppet modules:
  # https://github.com/sansnoc/puppet
   files:
     centos::
       "/etc/selinux/local/."
         comment        => "Create local SELinux directory for modules, etc.",
         create         => "true",
         perms          => mog("700", "root", "root");

       "/etc/selinux/local/$(module).te"
         comment        => "Copy over module source.",
         copy_from      => secure_cp("$(g.masterfiles)/centos/5/etc/selinux/local/$(module).te", "$(g.masterserver)"),
         perms          => mog("440", "root", "root"),
         classes        => if_repaired("rebuild_$(module)");

       "/etc/selinux/local/setup.cf3_template"
         comment        => "Copy over module source.",
         copy_from      => secure_cp("$(g.masterfiles)/centos/5/etc/selinux/local/setup.cf3_template", "$(g.masterserver)"),
         perms          => mog("750", "root", "root"),
         classes        => if_repaired("rebuild_$(module)");

       "/etc/selinux/local/$(module)-setup.sh"
         comment        => "Create setup script. FIXME: This was easily done in one step in Puppet, and may be stupid for Cf3.",
         create         => "true",
         edit_line      => expand_template("/etc/selinux/local/setup.cf3_template"),
         perms          => mog("750", "root", "root"),
         edit_defaults  => empty,
         classes        => if_repaired("rebuild_$(module)");


  commands:
    centos::
      "/etc/selinux/local/$(module)-setup.sh"
        comment         => "Actually rebuild module.",
        ifvarclass      => canonify("rebuild_$(module)");
}

Here's how I invoke it as part of setting up a mail server:

bundle agent mail_server {
  vars:
    centos::
      "selinux_mailserver_modules" slist => { "postfixpipe",
                                              "dovecotdeliver" };

  methods:
    centos.selinux_on::
      "Add mail server SELinux modules" usebundle => add_selinux_module("$(selinux_mailserver_modules)");
}

(Yes, that really is all I do as part of setting up a mail server. Why do you ask? :-) )

So in the add_selinux_module bundle, a directory is created for local modules. The module source code, named after the module itself, is copied over, and a setup script created from a Cf3 template. The setup template looks like this:

#!/bin/sh
# This file is configured by cfengine.  Any local changes will be overwritten!
#
# Note that with template files, the variable needs to be referenced
# like so:
#
#   $(bundle_name.variable_name)

# Where to store selinux related files
SOURCE=/etc/selinux/local
BUILD=/etc/selinux/local

/usr/bin/checkmodule -M -m -o ${BUILD}/$(add_selinux_module.module).mod ${SOURCE}/$(add_selinux_module.module).te
/usr/bin/semodule_package -o ${BUILD}/$(add_selinux_module.module).pp -m ${BUILD}/$(add_selinux_module.module).mod
/usr/sbin/semodule -i ${BUILD}/$(add_selinux_module.module).pp

/bin/rm ${BUILD}/$(add_selinux_module.module).mod ${BUILD}/$(add_selinux_module.module).pp

Note the two kinds of disambiguating brackets here: {curly} to indicate shell variables, and (round) to indicate Cf3 variables.

As noted in the bundle comment, the template might be overkill; I think it would be easy enough to have the rebuild script just take the name of the module as an argument. But it was a good excuse to get familiar with Cf3 templates.

I've been using this bundle a lot in the last few days as I prep a new mail server, which will be running under SELinux, and it works well. Actually creating the module source file is something I'll put in another post. Also, at some point I should probably put this up on Github FWIW. (SANS had their stuff in the public domain, so I'll probably do BSD or some such... in the meantime,please use this if it's helpful to you.)

UPDATE: It's available on Github and my own server; released under the MIT license. Share and enjoy!

Invoking Cfengine from Nagios

21 Nov 2012

Nagios and Cf3 each have their strengths:

Nagios has nicely-encapsulated checks for lots of different things, and I'm quite familiar with it.
Cfengine is a nice way of sanely ensuring things are the way we want them to be (ie, without running amok and restarting something infinity times).

Nagios plugins, frankly, are hard to duplicate in Cfengine. Check out this Cf3 implementation of a web server check:

bundle agent check_tcp_response {
  vars:
    "read_web_srv_response" string  => readtcp("php.net", "80", "GET /manual/en/index.php HTTP/1.1$(const.r)$(const.n)Host: php.net$(const.r)$(const.n)$(const.r)$(const.n)", 60);

  classes:
    "expectedResponse" expression   => regcmp(".*200 OK.*\n.*", "$(read_web_srv_response)");

  reports:
    !expectedResponse::
      "Something is wrong with php.net - see for yourself: $(read_web_srv_response)";

}

That simply does not compare with this Nagios stanza:

define service{
    use                             local-service         ; Name of service template to use
    hostgroup_name                  http-servers
    service_description             HTTP
    check_command                   check_http
}
define command{
    command_name                    check_http
    command_line                    $USER1$/check_http -I $HOSTADDRESS$ $ARG1$
}

My idea, which I totally stole from this article, was to invoke Cfengine from Nagios when necessary, and let Cf3 restart the service. Example: I've got this one service that monitors a disk array for faults. It's flaky, and needs to be restarted when it stops responding. I've already got a check for the service in Nagios, so I added an event handler:

define service{
    use                             local-service         ; Name of service template to use
    host_name                       diskarray-mon
    service_description             diskarray-mon website
    check_command                   check_http!-H diskmon.example.com -S -u /login.html
    event_handler                   invoke_cfrunagent
}
define command{
    command_name invoke_cfrunagent
    command_line $USER2/invoke_cfrunagent.sh  -n "$SERVICEDESC" -s $SERVICESTATE$ -t $SERVICESTATETYPE$ -a $HOSTADDRESS$
}

Leaving out some getopt() stuff, invoke_cfrunagent.sh looks like this:

# Convert "diskarray-mon website to disarray-mon_website":
SVC=${SVC/ /_}
STATE="nagios_$STATE"
TYPE="nagios_$TYPE"

# Debugging
echo "About to run sudo /var/cfengine/bin/cf-runagent -D $SVC -D $STATE -D $TYPE" | /usr/bin/logger
# We allow this in sudoers:
sudo /var/cfengine/bin/cf-runagent -D $SVC -D $STATE -D $TYPE

cf-runagent is a request, not an order, to the running cf-server process to fulfill already-configured processes; it's like saying "If you don't mind, could you please run now?"

Finally, this was to be detected in Cf3 like so:

  methods:
    diskarray-mon_website.nagios_CRITICAL.nagios_HARD::
      "Restart the diskarray monitoring service" usebundle => restart_diskarray_monitor();

(This stanza is in a bundle that I know is called on the disk array monitor.)

Here's what works:

If I run cf-agent -- not cf-runagent -- with those args ("-D diskarray-monwebsite -D nagiosCRITICAL -D nagios_HARD"), it'll run the restart script.

What doesn't work:

running cf-runagent, either as root or as nagios. It seems to stop after analyzing classes and such, and not actually do anything. I'm probably misunderstanding how cf-runagent is meant to work.
Nagios will only run an event handler when things change -- not all the time until things get better. That means that if the first attempt by Cf3 to restart doesn't work, for whatever reason, it won't get run again.

What might work better is using this Cf3 wrapper for Nagios plugins (which I think is the same approach, or possibly code, discussed in this mailing list post).

Anyhow...This is a sort of half-assed attempt in a morning to get something working. Not there yet.

Holy God, It's Done At Last

20 Nov 2012

After a lot of faffing about, I've accomplished the following on the backup server at $WORK:

Broken out /var/lib/mysql to a separate, mirrored, Linux software raid-1 275 GB partition; it's using about 36 GB of that at the moment, which is 15% -- the lowest it's been in a long, LONG-ass time.
Migrated the Bacula catalog db to Innodb.
Shrunk the raid-0 spool partition to about 1.6 TB, down from 2 TB; did this to free up the two disks for the mirrored partition
Ensured that MySQL will use /dev/shm as a temporary area
Sped up the restoration of files (which was mostly because of earlier "analyze" commands on the File table while it was still MyISAM)
innodbfilepertable is on; innodbbufferpoolsize=10G; ``` defaultstorageengine=InnoDB


I encountered the following problems:

   * The stupid raid card in the backup server only supports two RAID drives --

thus, the mirrored drive for /var/lib/mysql is Linux software raid. I'd have preferred to keep things consistent, but it was not to be. ```

The many "analyze" and "repair" steps took HOURS...only to turn out to be deadlocked because it was running out of tmp space.
I had to copy the mysql files to the raid-0 drive to have enough space to do the conversion.
Knock-on effects included lack of sleep and backups not being run last night
Basically, this took a lot of tries to get right, and about all ``` of my time for the last 24 hours.


I learned:

   * The repair of the File table while MyISAM, with tmp files in

/dev/shm, took about 15 minutes. That compares with leaving it overnight and still not having it done. ```

You have to watch the mysql log file for errors about disk space, and/or watch df -h to see /tmp or whatever fill up.
You can interrupt a repair and go back to it afterward if you have to. At least, I was able to...I wouldn't do it on a regular basis, but it gives me cautious optimism that it's not an automatic ticket to backups.
Importing the File.sql file (nominally 18 GB but du shows 5 ``` GB...sparse?), which converted it to InnoDB, took 2.5 hours.


I still have to do these things:

   * Update documentation.
   * Update Bacula checks to include /var/lib/mysql.
   * Perhaps up pool_size to 20 GB.
   * Set up a slave server again.
   * A better way of doing this might've been to set up LVM on md0, then use snapshots for database backup.
   * Test with a reboot!  Right now I need to get some sleep.

Smarty compile path problems

16 Nov 2012

A couple days ago a website stopped working, refusing to show anything below its top menu bar. I knew it was written in PHP, but I hadn't realized it used Smarty, a template engine. I didn't realize it before the outage, but one of Smarty's features is that it "compiles" (generates, really) template files into PHP code on the fly as they're accessed and/or updated. The idea is that if you need to change the template files, you just upload them; the next time that page is accessed, Smarty notices that it needs to recompile the PHP file, does so, and writes out the new file to a directory called "template_c".

This means that it needs to write to that directory (which Smarty docs recommend stay out of the DocumentRoot). But I hadn't turned that on, and in general this sort of thing makes me nervous. (Though I do allow it for other websites, so I'm not being very consistent here.)

The development machine has the files rooted at ~user/public_html; to deploy, we rsync to the server's /var/www/$PROJECT directory. The template_c files have already been compiled in the process of testing, and those compiled files include() other Smarty libraries -- and it specifies the full path to those libraries. Can you see where this is going?

As far as I can tell, what happened was:

The PHP was tested on the dev server; Smarty compiled the templates, and added include() directives or the library files using the full path to the user's home directory.
The PHP was copied into place on the web server and tested.
Smarty looked at the templates, found the compiled versions, and ran them. In the process, it tried to load other Smarty libraries.
It worked, because autofs mounted this user's home directory when requested.
Time passed.
On Wednesday, someone tried using the site. Autofs, for some reason, couldn't mount the developer's home directory, so Smarty couldn't include those files, and we saw nothing after the page's menu bar.

I got around this by using sed on the compiled template files to set the correct path to the library files, which at least got the site up again.

I'm still not sure what the hell went wrong with autofs. On this particular server I used to have a static map for home directories, and it doesn't work with the developer's home. I thought I had replaced that with an LDAP-based map a long time ago...but there it was, and I can't find anything in the Cfengine git repo that shows a change, or that I'd deployed the LDAP map in the first place. I pushed that out, and now this all works the way it should...

...except that it shouldn't work like this. I'm reluctant to let PHP write to the filesystem, but that's what Smarty expects. I think it supports relative paths -- gotta dig up my notes -- which'd be fine. (I saw one suggestion on the Smarty support forums to include "." in your PHP_INCLUDE path using .htaccess, which made me run around the room hollering and cursing) As a workaround, I'm going to move the development files to /var/www/$PROJECT on the dev server, which will match production; I'm unhappy with this because it breaks the real advantages Smarty brings, and makes the deployment process a bit harder...but I'm still a nervous nelly.

Older Newer