Happy birthday, Pre!

So to return the compliment, I should mention that my wife is turning 36 today. She's wise, completely supportive (including giving me a boot to the head when I need one), and helped me get started as a sysadmin. She's let me take time to go to conferences and make beer. She sometimes thinks she's a smurf, but that won't stop her ripping your heart out.

She convinced me that this would be a good idea:

Two damned cute kids

And sometimes she looks like this:

Popotch!

But she never sopped reaching for that rainbow:

No, she was never really a cheerleader

Happy birthday, Pre!

Tags:

NFS dotfiles

Reminder to myself: Got a file called .nfs.*? Here's what's going on:

# These files are created by NFS clients when an open file is
# removed. To preserve some semblance of Unix semantics the client
# renames the file to a unique name so that the file appears to have
# been removed from the directory, but is still usable by the process
# that has the file open.

That quote is from /usr/lib/fs/nfs/nfsfind, a shell script on Solaris 10 that's run once a week from root's crontab. Some references:

Tags: toptip unix networking opensolaris solaris

Jumbo frames again

Arghh...I just spent 24 hours trying to figure out why shadow migration was causing our new 7310 to hang. The answer? Because jumbo frames were not enabled on the switch the 7310 was on, and they were on the machine we're migrating from. Arghh, I say!

Tags: jumboframes debugging networking

Reading for the end of January

Once upon a time, Apple made the machines that made me who I am. I
became who I am by tinkering. Now it seems they’re doing
everything in their power to stop my kids from finding that sense
of wonder.

allow Authoritative Nameservers to return varying replies based
upon the network address of the client that initiated the query
rather than of the client's Recursive Resolver.

His response?

if we're going to add client identity to the query, can we do so
in a more general way?  i'd like to know lat-long, country, isp,
language, and adult/child.  and the ip address should be
multiprotocol, covering ipv6.

The Internet economy rewards unlimited creativity in the
monetization of human action, and fairly often this takes the form
of some kind of intermediation. For DNS, monetized intermediation
means lying.

Off to go craft a budget for next year.

Tags: linky

Hate hate hate

Here's how to piss me off:

  • Refuse to update your software so that it uses a Makefile. Yes, I know you're not only 'way smarter than I am but you've got a source tree going back to 1977. I don't care; editing six different shell scripts is not the way to do things.

  • Sprinkle those six scripts with assumptions about which software is present and where the source code is being compiled. Document most of them.

  • Run tests but carefully delete all the results; don't include an option to save them. That way I have to edit your scripts to figure out what the hell went wrong.

  • Assume the presence of csh for everything, rather than POSIX-standard sh.

  • Put configurable options inside shell scripts, rather than in a configuration file or allowing them to be set by arguments to those scripts.

  • Include directions like "Perhaps change foo, bar and baz", without explaining why or what they're set to. When tests later fail because you didn't properly set foo, bar and baz, don't explain where these are set or how they affect the tests.

  • Set a hard-coded location for temporary output. Die silently when those locations aren't present, rather than explaining why or offering to create them or using /tmp. Refuse to overwrite already-present files, but don't explain this anywhere; instead, say that they might be useful next time.

  • Have important variables, like the hard-coded location for temporary output, set in two or more different places. Suggest editing some of them.

  • Have test failure indicated by "!!FAILED", ensuring a moment's confusion about whether that means "FAILED!", "NOT FAILED!" or "NOT NOT FAILED".

_Update, April 20th 2010:

  • Be so glad that you're done installing this software that you never come back to document how you did it in the first place, leaving no clues but a short rant on your blog. Sigh.

Tags: rant software

Cassini and Saturn

I came across these pictures taken by the Cassini probe, while looking for pictures of Saturn to show my oldest son. They're beautiful, and I'm heartsick that I'll never get to see these views first-hand.

Tags: wow

Checks

  • This Nagios check looks for extra Wordpress admins.$ARG1$ should look include "-w min:max -c min:max", giving the acceptable ranges; in my case, I know I should have 3 and only 3 admins, so I have "-w 3:3 -c 3:3".
    define command{
    command_name check_wp_admins
    command_line $USER1$/check_mysql_query -q 'SELECT COUNT(wp_users.user_login) AS "Admins"
                           FROM wp_users, wp_usermeta
                           WHERE wp_usermeta.meta_value LIKE "%administrator%" AND
                           wp_usermeta.user_id=wp_users.ID' -H $HOSTADDRESS$ $ARG1$
    }



  • This one looks for nasty Wordpress posts. Note the dependency on MySQL's regex command. In my case, I know that I do not have any posts with these words, so in $ARG1$ I have "-w 0:0 -c 0:0".
    define command{
    command_name check_wp_nasty_posts
    command_line $USER1$/check_mysql_query -q 'SELECT COUNT(*)
                           FROM wp_posts
                           WHERE post_content REGEXP "iframe|noscript|display"' -H $HOSTADDRESS$ $ARG1$
    }

  • And a Python script that picks out a random client, job and file from Bacula's database and tries to retrieve it. It's not ideal -- checking for sanity is left as a DARPA Grand Challenge -- but at least it's one way of exercising backups. I anticipate running this often. If anyone's interested, let me know.

The more I work with Python, the more I don't just like it but admire it.

Ugh...not much more right now. I've got a blocked eustachian tube that I'm self-medicating with a Python script^W^Wcold medicine, and the acetominiphen in it is making me hazy.

Tags: monitoring security mysql

Powerpoint of the damned

From (I think) a fellow Canuck:

A few weeks ago, I was sent a power point presentation on the "Dynamic Planning for COIN in Afghanistan". I looked at it briefly, but thought that it was some kind of joke; so, I flushed it immediately. However, I received it from another source. So, it appears the joke is on me.

A quick look at this bird's nest of a concept, would seem to suggest that Dilbert or some escapee from the Project Management Institute has taken over planning for COIN operations in Afghanistan. What I see is yet another attempt to take a complex human activity and turn it into an MBA project management flowchart. I can see the thinking, “Now that we have the power point correct, we are sure to win the war in Afghanistan!” In fact, I'm sure that, if we showed this power point to the insurgents, they would throw in the towel, convinced that our superior power point skills indicate that we cannot be defeated. Really, I don't know how we fought wars before power point.

Original post here. Found via WarHistorian.org (well recommended).

Tags: wtf

Must change title

Happy 2010 everyone! Now that it seems to be well and truly under way, I feel I can say that safely.

It's been busy so far. All the stuff I didn't do in 2009 is still on my plate...which is obvious, right? but it still caught me by surprise after the 3 days doing Xmas maintenance on my own. It was easy to forget that there are, you know, people waiting to show up and do work.

Like the new students we've got for one of the faculty members. I'd upgraded OpenSuSE on their new workstations over the holidays, then when they came in yesterday the carefully-tweaked dual monitor displays weren't working. Arghh.

Or the guy who's let me know that he wants to get moving on the MySQL/PHP website he's building...which reminds me that I've still got to move the website to a virtualized machine. I'm tempted to do that RIGHT NOW and put his site in there, but I don't think that'll be the best way to do it.

Or the new project my boss is part of, which involves researchers from across Canada. For me, it's a new website, hardware recommendation and purchases, maybe a new LDAP server. I could add a new root suffix to the existing LDAP server, but

a. we don't need it yet a. that seems like it'll make it more difficult to move later a. while I can create one in the existing LDAP server (Fedora/389/CentOS DS), the cn=config tree seems suspiciously empty of any entries related to the new root...so I'm leery of trusting it.

I still haven't sat down yet and tried to plan my year. Partly I've been busy, partly my planning tools are a bit of a mess (daytimer + orgmode + RT). But at some point I need to get my priorities straight and oh, how I long to have them straight. I feel a bit like I'm spinning my wheels right now.

Ah well. In other news, Xmas was good; my kids got two guitars (one acoustic with an Elmo sticker, one fake double-neck electric) which makes four guitars they have now. Since they no longer have that to fight over, they've taken to fighting over a microphone (cardboard tube stuck in a toy that acts like a stand). But damnit, they're still cute.

Family

Finally: Just for fun right now I did a word count of all my blog entries. I've been blogging since 2004, and I've got something like 158,000 words. Amazing. And there are still some entries I've got to grab from my old Slashdot journal.

Tags: work geekdad

Well, that'll teach me

While trying to figure out why Nagios was suddenly unable to check up on our databases, I suddenly realized that the permissions on /dev/null were wrong: 0600 instead of 0666. What the hell? I've had this problem before, and I was in the middle of something, so I set them back and went on with my life. Then in happened again, not half an hour later. I was in the same shell, so I figured it had to have been a command I'd run that had inadvertantly done this.

Yep: don't run the MySQL client as root. Yes yes yes, it's bad anyway, I'll go to sysadmin hell, but this is an interesting bug. The environment variable MYSQL_HISTFILE is set to /dev/null for root...and when you exit the client, it sets the permissions for the history file to 0600. So, you know, don't do that then. (Still no fix committed, btw...)

Tags: bug mysql

Xmas maintenance

A nice thing about working at a university is that you get all this time off at Xmas, which is really nice; however, it's also the best possible time to do all the stuff you've been saving up. Last year my time was split between this job and my last; now, the time's all mine, baby.

Today will be my last of three days in a row where the machines have been all mine to play with^W^Wupgrade. I've been able to twiddle the firewall's NIC settings, upgrade CentOS using Cfengine, and set up a new LDAP server using Cobbler and CentOS Directory Server. I've tested our UPS' ATS, but discovered that NUT is different from APCUPSD in one important way: it doesn't easily allow you to say "shut down now, even though there's 95% battery left". I may have to leave testing of that for another day.

It hasn't all gone smoothly, but I've accomplished almost all the important things. This is a nice surprise; I'm always hesistant when I estimate how long something will take, because I feel like I have no way of knowing in advance (interruptions, unexpected obstacles...you know the drill). In this case, the time estimates for individual tasks were, in fact, 'way paranoid, but that gave me the buffer that I needed.

One example: after upgrading CentOS, two of our three servers attached to StorageTek 2500 disk arrays reported problems with the disks. Upon closer inspection, they were reporting problems with half of the LUNs that the array was presenting to them -- and they were reporting them in different ways. It had been a year or longer since I'd set them up, and my documentation was pretty damn slim, so it took me a while to figure it out. (Had to sleep on it, even.)

The servers have dual paths to the arrays. In Linux, the multipath drivers don't work so well with these, so we used the Sun drivers instead. But:

  1. You have to rebuild the drivers after a kernel change.
  2. This only showed up on two servers because the third server had not upgraded its kernel (or indeed, any of its packages). Why? cfservd had refused its connection because I had the MaxConnections parameter too low.
  3. And of the two that did upgrade, the one machine we'd tested the Linux drivers on still had an old multipath.conf file in /etc, which even though the multipathd. service wasn't starting up was enough to get drivers loaded. This took a while to figure out because I'd completely forgotten how to tell which driver was in use.

I got it fixed in the end, and I expanded the documentation considerably. (49,000 words and counting in the wiki. Damn right I'm bragging!)

Putting off 'til next time, tempted though I am: reinstalling CentOS on the monitoring machine, which due to a mix of EPEL and Dag repos and operator error appears to be stuck in a corner, unable to upgrade without ripping out (say) Cacti. I moved the web server to a backup machine on Tuesday, and I'll be moving it back today; this is not the time to fiddle with the thing that's going to tell me I've moved everything back correctly.

(Incidentally, thanks to Matt for the rubber duck, who successfully talked me down off the roof when I was mulling this over. Man, that duck is so wise...)

Last day today. (Like, ever!) If I remember correctly I'm going to test the water leak detector...and I forget the rest; it's all in my daytimer and I'm too lazy to get up and look right now. Wish me luck.

And best of 2010 to all of you!

Tags: work centos cfengine monitoring packagemanagement serverroom upgrades

Things I love:

  • Waiting while my wife furiously edits a blog entry, then finding she's posted it and getting to be the first to read it. And not just because she mentions me every now and then. Seriously, she rocks. Also? She got me the XKCD book for Xmas. Like I need more reasons to love her.

Tags:

IPv6 at home

After coming back from LISA I've been wanting to try IPv6 at home; I've dabbled with it on and off for the last few years, but haven't made a serious go of it.

Originally I had a tunnel with SixXS.net, but:

  1. I was having problems with uptime that, in the end, turned out to be my own silly firewall problems

  2. The points system they use just seems needlessly complicated

  3. Hurricane Electric was just 'way easier to set up, including getting a /64 right away

  4. There are a number of complaints about SixXS.net arbitrarily (so it's claimed) closing accounts.

But enough gossip! Now you can visit http://ipv6.saintaardvarkthecarpeted.com. Soon as I get a chance I'll set it up so it doesn't require the separate hostname.

Tags: ipv6

Editing Foswiki files from the command line

Hah! Thanks to Teridon, I can now edit Foswiki files from the command line:

rcs -l TextFileName.txt
ci -mnone -t-none -wusername -u TextFileName.txt

Sweet! Now to automate it in Emacs...

Tags: emacs foswiki

Five years and still going

At the risk of tempting fate, I just realized that my web server is five years old (and a bit). Happy birthday, Thornhill!

Tags: hardware

Catchup

Here we go:

And that's all for now.

Tags: virtualization

chkconfig woes

Irritating: chkconfig on RHEL/CentOS returns non-zero if a service isn't configured for a runlevel. IOW, you can do:

chkconfig --level 3 foo

and have 0 returned if it's on, 1 if it's not.

But not SuSE; nope, it just returns 0 whether or not it's enabled, or even if the service itself doesn't exist. Because, you know, grep doesn't get used enough.

I'm doing this because I'm trying to use cfengine 2 to manage services. This works well in CentOS, where you can add something like:

service_foo_on = (ReturnsZero("/sbin/chkconfig --level 3 foo"))

and it'll work. ("servicefooon" is a bit of a misnomer, because I'm checking runlevels, not whether it's actually running.)

Update: Nope, I'm wrong. chkconfig --check does exactly what I want. Many thanks to yaloki on #openSUSE-server for the help.

Tags: packagemanagement opensuse cfengine

Zounds

Busy day:

  • Wake up at 5am because the youngest son's teething and he's going to be up at 5.20am

  • Clean up ZFS snapshots yet again; repeat "I must schedule this in cron" for the nth time

  • Put in request to maintenance to look at server room humidifier; current block o' cold weather == 10% RH in there

  • Find out why Mailman stopped working (probably permission problems on the logs), and how to monitor this (settle for web interface for now, since that wasn't working either; will probably need to make my peace with user accounts for Nagios on machines I'm monitoring)

  • Figure out why Drupal is shitting cron.php files all over the place (still no idea)

  • Fill out performance review for self

  • Back up Windows 2003 server before tossing it over the fence to software installer

  • Start writing article for SysAdvent at last on OCSNG/GLPI

  • Struggle with cfengine tidy stanza that doesn't work; repeat "I must upgrade to cfengine 3 or puppet" for the nth time

Tonight, bed at 8.30pm. And there's no shame in that.

Tags: geekdad work

(my) First Sysavent Calendar entry up!

Woohoo! My first entry for the Sysadvent Calendar, on Development for Sysadmins, has been posted! Thanks to Jordan for tidying it up and adding integration testing, which I'd missed when writing the article. There will be another article from me coming up soon-ish on OCSNG and GLPI.

As of December 1st, Jordan and Matt were still accepting entries -- so head on over if you've got something to say.

Tags: documentation geekpride programming reading

sesearch

Need to figure out what bit of selinux policy is forbidding something? sesearch is what you want.

Tags: selinux