12 Feb 2010
So to return the compliment, I should mention that my wife
is turning 36 today. She's wise, completely supportive (including
giving me a boot to the head when I need one), and helped me get
started as a sysadmin. She's let me take time to go to conferences
and make beer. She sometimes thinks she's a smurf, but that
won't stop her ripping your heart out.
She convinced me that this would be a good idea:

And sometimes she looks like this:

But she never sopped reaching for that rainbow:

Happy birthday, Pre!
Tags:
05 Feb 2010
Reminder to myself: Got a file called .nfs.*
? Here's what's going
on:
# These files are created by NFS clients when an open file is
# removed. To preserve some semblance of Unix semantics the client
# renames the file to a unique name so that the file appears to have
# been removed from the directory, but is still usable by the process
# that has the file open.
That quote is from /usr/lib/fs/nfs/nfsfind
, a shell script on
Solaris 10 that's run once a week from root's crontab. Some
references:
Tags:
toptip
unix
networking
opensolaris
solaris
03 Feb 2010
Arghh...I just spent 24 hours trying to figure out why shadow
migration was causing our new 7310 to hang. The answer? Because
jumbo frames were not enabled on the switch the 7310 was on, and they
were on the machine we're migrating from. Arghh, I say!
Tags:
jumboframes
debugging
networking
29 Jan 2010
Once upon a time, Apple made the machines that made me who I am. I
became who I am by tinkering. Now it seems they’re doing
everything in their power to stop my kids from finding that sense
of wonder.
allow Authoritative Nameservers to return varying replies based
upon the network address of the client that initiated the query
rather than of the client's Recursive Resolver.
His response?
if we're going to add client identity to the query, can we do so
in a more general way? i'd like to know lat-long, country, isp,
language, and adult/child. and the ip address should be
multiprotocol, covering ipv6.
The Internet economy rewards unlimited creativity in the
monetization of human action, and fairly often this takes the form
of some kind of intermediation. For DNS, monetized intermediation
means lying.
Off to go craft a budget for next year.
Tags:
linky
29 Jan 2010
Here's how to piss me off:
Refuse to update your software so that it uses a Makefile. Yes, I
know you're not only 'way smarter than I am but you've got a source
tree going back to 1977. I don't care; editing six different shell
scripts is not the way to do things.
Sprinkle those six scripts with assumptions about which software is
present and where the source code is being compiled. Document most
of them.
Run tests but carefully delete all the results; don't include an
option to save them. That way I have to edit your scripts to figure
out what the hell went wrong.
Assume the presence of csh for everything, rather than
POSIX-standard sh.
Put configurable options inside shell scripts, rather than in a
configuration file or allowing them to be set by arguments to those
scripts.
Include directions like "Perhaps change foo, bar and baz", without
explaining why or what they're set to. When tests later fail
because you didn't properly set foo, bar and baz, don't explain
where these are set or how they affect the tests.
Set a hard-coded location for temporary output. Die silently when
those locations aren't present, rather than explaining why or
offering to create them or using /tmp. Refuse to overwrite
already-present files, but don't explain this anywhere; instead, say
that they might be useful next time.
Have important variables, like the hard-coded location for temporary
output, set in two or more different places. Suggest editing some
of them.
Have test failure indicated by "!!FAILED", ensuring a moment's
confusion about whether that means "FAILED!", "NOT FAILED!" or
"NOT NOT FAILED".
_Update, April 20th 2010:
- Be so glad that you're done installing this software that you never
come back to document how you did it in the first place, leaving no
clues but a short rant on your blog. Sigh.
Tags:
rant
software
26 Jan 2010
I came across these pictures taken by the Cassini probe,
while looking for pictures of Saturn to show my oldest son. They're
beautiful, and I'm heartsick that I'll never get to see these views
first-hand.
Tags:
wow
20 Jan 2010
- This Nagios check looks for extra Wordpress admins.$ARG1$ should
look include "-w min:max -c min:max", giving the acceptable ranges; in
my case, I know I should have 3 and only 3 admins, so I have "-w 3:3
-c 3:3".
command_name check_wp_admins
command_line $USER1$/check_mysql_query -q 'SELECT COUNT(wp_users.user_login) AS "Admins"
FROM wp_users, wp_usermeta
WHERE wp_usermeta.meta_value LIKE "%administrator%" AND
wp_usermeta.user_id=wp_users.ID' -H $HOSTADDRESS$ $ARG1$
- This one looks for nasty Wordpress posts. Note the dependency on
MySQL's regex command. In my case, I know that I do not have any
posts with these words, so in $ARG1$ I have "-w 0:0 -c 0:0".
command_name check_wp_nasty_posts
command_line $USER1$/check_mysql_query -q 'SELECT COUNT(*)
FROM wp_posts
WHERE post_content REGEXP "iframe|noscript|display"' -H $HOSTADDRESS$ $ARG1$
- And a Python script that picks out a random client, job and file
from Bacula's database and tries to retrieve it. It's not ideal --
checking for sanity is left as a DARPA Grand Challenge -- but at
least it's one way of exercising backups. I anticipate running this
often. If anyone's interested, let me know.
The more I work with Python, the more I don't just like it but
admire it.
Ugh...not much more right now. I've got a blocked eustachian tube
that I'm self-medicating with a Python script^W^Wcold medicine, and
the acetominiphen in it is making me hazy.
Tags:
monitoring
security
mysql
16 Jan 2010
From (I think) a fellow Canuck:
A few weeks ago, I was sent a power point presentation on the
"Dynamic Planning for COIN in Afghanistan". I looked at
it briefly, but thought that it was some kind of joke; so, I
flushed it immediately. However, I received it from another
source. So, it appears the joke is on me.
A quick look at this bird's nest of a concept, would seem to suggest
that Dilbert or some escapee from the Project Management Institute has
taken over planning for COIN operations in Afghanistan. What I see is
yet another attempt to take a complex human activity and turn it into
an MBA project management flowchart. I can see the thinking, “Now that
we have the power point correct, we are sure to win the war in
Afghanistan!” In fact, I'm sure that, if we showed this power point to
the insurgents, they would throw in the towel, convinced that our
superior power point skills indicate that we cannot be
defeated. Really, I don't know how we fought wars before power point.
Original post here. Found via WarHistorian.org (well recommended).
Tags:
wtf
12 Jan 2010
Happy 2010 everyone! Now that it seems to be well and truly under
way, I feel I can say that safely.
It's been busy so far. All the stuff I didn't do in 2009 is still on
my plate...which is obvious, right? but it still caught me by surprise
after the 3 days doing Xmas maintenance on my own. It was easy to
forget that there are, you know, people waiting to show up and do
work.
Like the new students we've got for one of the faculty members. I'd
upgraded OpenSuSE on their new workstations over the holidays, then
when they came in yesterday the carefully-tweaked dual monitor displays
weren't working. Arghh.
Or the guy who's let me know that he wants to get moving on the
MySQL/PHP website he's building...which reminds me that I've still got
to move the website to a virtualized machine. I'm tempted to do that
RIGHT NOW and put his site in there, but I don't think that'll be the
best way to do it.
Or the new project my boss is part of, which involves researchers from
across Canada. For me, it's a new website, hardware recommendation
and purchases, maybe a new LDAP server. I could add a new root
suffix to the existing LDAP server, but
a. we don't need it yet
a. that seems like it'll make it more difficult to move later
a. while I can create one in the existing LDAP server
(Fedora/389/CentOS DS), the cn=config
tree seems suspiciously empty
of any entries related to the new root...so I'm leery of trusting it.
I still haven't sat down yet and tried to plan my year. Partly I've
been busy, partly my planning tools are a bit of a mess (daytimer +
orgmode + RT). But at some point I need to get my priorities straight
and oh, how I long to have them straight. I feel a bit like I'm
spinning my wheels right now.
Ah well. In other news, Xmas was good; my kids got two guitars (one
acoustic with an Elmo sticker, one fake double-neck electric) which
makes four guitars they have now. Since they no longer have that to
fight over, they've taken to fighting over a microphone (cardboard
tube stuck in a toy that acts like a stand). But damnit, they're
still cute.

Finally: Just for fun right now I did a word count of all my blog
entries. I've been blogging since 2004, and I've got something like
158,000 words. Amazing. And there are still some entries I've got to
grab from my old Slashdot journal.
Tags:
work
geekdad
31 Dec 2009
While trying to figure out why Nagios was suddenly unable to check up
on our databases, I suddenly realized that the permissions on
/dev/null were wrong: 0600 instead of 0666. What the hell? I've had
this problem before, and I was in the middle of something, so I set
them back and went on with my life. Then in happened again, not
half an hour later. I was in the same shell, so I figured it had to
have been a command I'd run that had inadvertantly done this.
Yep: don't run the MySQL client as root. Yes yes yes, it's bad
anyway, I'll go to sysadmin hell, but this is an interesting bug. The
environment variable MYSQL_HISTFILE
is set to /dev/null
for
root...and when you exit the client, it sets the permissions for the
history file to 0600. So, you know, don't do that then. (Still no
fix committed, btw...)
Tags:
bug
mysql
31 Dec 2009
A nice thing about working at a university is that you get all this
time off at Xmas, which is really nice; however, it's also the best
possible time to do all the stuff you've been saving up. Last year my
time was split between this job and my last; now, the time's all mine,
baby.
Today will be my last of three days in a row where the machines have
been all mine to play with^W^Wupgrade. I've been able to twiddle the
firewall's NIC settings, upgrade CentOS using Cfengine, and set up a
new LDAP server using Cobbler and CentOS Directory Server.
I've tested our UPS' ATS, but discovered that NUT is
different from APCUPSD in one important way: it doesn't easily
allow you to say "shut down now, even though there's 95% battery
left". I may have to leave testing of that for another day.
It hasn't all gone smoothly, but I've accomplished almost all the
important things. This is a nice surprise; I'm always hesistant when
I estimate how long something will take, because I feel like I have no
way of knowing in advance (interruptions, unexpected obstacles...you
know the drill). In this case, the time estimates for individual
tasks were, in fact, 'way paranoid, but that gave me the buffer that I
needed.
One example: after upgrading CentOS, two of our three servers attached
to StorageTek 2500 disk arrays reported problems with the disks. Upon
closer inspection, they were reporting problems with half of the LUNs
that the array was presenting to them -- and they were reporting them
in different ways. It had been a year or longer since I'd set them
up, and my documentation was pretty damn slim, so it took me a while
to figure it out. (Had to sleep on it, even.)
The servers have dual paths to the arrays. In Linux, the multipath
drivers don't work so well with these, so we used the Sun drivers
instead. But:
- You have to rebuild the drivers after a kernel change.
- This only showed up on two servers because the third server had not
upgraded its kernel (or indeed, any of its packages). Why?
cfservd
had refused its connection because I had the MaxConnections
parameter too low.
- And of the two that did upgrade, the one machine we'd tested the
Linux drivers on still had an old multipath.conf file in /etc, which
even though the multipathd. service wasn't starting up was enough
to get drivers loaded. This took a while to figure out because I'd
completely forgotten how to tell which driver was in use.
I got it fixed in the end, and I expanded the documentation
considerably. (49,000 words and counting in the wiki. Damn right I'm
bragging!)
Putting off 'til next time, tempted though I am: reinstalling CentOS
on the monitoring machine, which due to a mix of EPEL and
Dag repos and operator error appears to be stuck in a
corner, unable to upgrade without ripping out (say) Cacti. I moved
the web server to a backup machine on Tuesday, and I'll be moving it
back today; this is not the time to fiddle with the thing that's
going to tell me I've moved everything back correctly.
(Incidentally, thanks to Matt for the rubber duck, who
successfully talked me down off the roof when I was mulling this
over. Man, that duck is so wise...)
Last day today. (Like, ever!) If I remember correctly I'm going to
test the water leak detector...and I forget the rest; it's all in my
daytimer and I'm too lazy to get up and look right now. Wish me luck.
And best of 2010 to all of you!
Tags:
work
centos
cfengine
monitoring
packagemanagement
serverroom
upgrades
27 Dec 2009
- Waiting while my wife furiously edits a blog entry, then finding
she's posted it and getting to be the first to read it. And not
just because she mentions me every now and then. Seriously, she
rocks. Also? She got me the XKCD book for Xmas. Like I need
more reasons to love her.
Tags:
26 Dec 2009
After coming back from LISA I've been wanting to try IPv6 at home;
I've dabbled with it on and off for the last few years, but haven't
made a serious go of it.
Originally I had a tunnel with SixXS.net, but:
I was having problems with uptime that, in the end, turned out to
be my own silly firewall problems
The points system they use just seems needlessly complicated
Hurricane Electric was just 'way easier to set up, including
getting a /64 right away
There are a number of complaints about SixXS.net
arbitrarily (so it's claimed) closing accounts.
But enough gossip! Now you can visit
http://ipv6.saintaardvarkthecarpeted.com. Soon as I get a chance
I'll set it up so it doesn't require the separate hostname.
Tags:
ipv6
22 Dec 2009
Hah! Thanks to Teridon, I can now edit Foswiki files from the
command line:
rcs -l TextFileName.txt
ci -mnone -t-none -wusername -u TextFileName.txt
Sweet! Now to automate it in Emacs...
Tags:
emacs
foswiki
22 Dec 2009
At the risk of tempting fate, I just realized that my web server is
five years old (and a bit). Happy birthday, Thornhill!
Tags:
hardware
22 Dec 2009
Here we go:
And that's all for now.
Tags:
virtualization
14 Dec 2009
Irritating: chkconfig
on RHEL/CentOS returns non-zero if a service
isn't configured for a runlevel. IOW, you can do:
and have 0 returned if it's on, 1 if it's not.
But not SuSE; nope, it just returns 0 whether or not it's enabled, or
even if the service itself doesn't exist. Because, you know, grep
doesn't get used enough.
I'm doing this because I'm trying to use cfengine 2 to manage
services. This works well in CentOS, where you can add something
like:
service_foo_on = (ReturnsZero("/sbin/chkconfig --level 3 foo"))
and it'll work. ("servicefooon" is a bit of a misnomer, because I'm
checking runlevels, not whether it's actually running.)
Update: Nope, I'm wrong. chkconfig --check
does exactly what I
want. Many thanks to yaloki on #openSUSE-server for the help.
Tags:
packagemanagement
opensuse
cfengine
07 Dec 2009
Busy day:
Wake up at 5am because the youngest son's teething and he's going to
be up at 5.20am
Clean up ZFS snapshots yet again; repeat "I must schedule this in
cron" for the nth time
Put in request to maintenance to look at server room humidifier;
current block o' cold weather == 10% RH in there
Find out why Mailman stopped working (probably permission problems
on the logs), and how to monitor this (settle for web interface for
now, since that wasn't working either; will probably need to make my
peace with user accounts for Nagios on machines I'm monitoring)
Figure out why Drupal is shitting cron.php files all over the place
(still no idea)
Fill out performance review for self
Back up Windows 2003 server before tossing it over the fence to
software installer
Start writing article for SysAdvent at last on OCSNG/GLPI
Struggle with cfengine tidy stanza that doesn't work; repeat "I must
upgrade to cfengine 3 or puppet" for the nth time
Tonight, bed at 8.30pm. And there's no shame in that.
Tags:
geekdad
work
03 Dec 2009
Woohoo! My first entry for the Sysadvent Calendar, on
Development for Sysadmins, has been posted! Thanks to
Jordan for tidying it up and adding integration testing, which
I'd missed when writing the article. There will be another article
from me coming up soon-ish on OCSNG and GLPI.
As of December 1st, Jordan and Matt were still accepting
entries -- so head on over if you've got something to say.
Tags:
documentation
geekpride
programming
reading
30 Nov 2009
Need to figure out what bit of selinux policy is forbidding something?
sesearch
is what you want.
Tags:
selinux