Cfengine 3 has a lot of things going for it. But its syntax is not
one of them.
Consider this situation: you have CentOS machines, SuSE machines and
Solaris machines. All of them should run, say, SSH, NTP and Apache
why not? The files are slightly different between them, and so is the
method of starting/stopping/enabling services, but mostly we're doing
the same thing.
I've got a bundle in Cfengine that looks like this:
...and so on. We're basically setting up four hashes -- daemon,
start, enable and cfg -- and populating them with the appropriate
entries for Red Hat/Centos ssh and Apache configs; you can imagine
slightly different entries for Solaris and SuSE. The
cfg_file_prefix allows me to put CentOS' config files in a separate
directory from other OS.
Then there's this bundle:
bundle agent fix_service(service) {
files:
"$(services.cfg_file[$(service)])"
copy_from => secure_cp("$(g.masterfiles)/$(services.cfg_file_prefix)/$(services.cfg_file[$(service)])", "$(g.masterserver)"),
classes => if_repaired("$(service)_restart"),
comment => "Copy a stock configuration file template from repository";
processes:
"$(services.daemon[$(service)])"
comment => "Check that the server process is running, and start if necessary",
restart_class => canonify("$(service)_restart"),
ifvarclass => canonify("$(services.daemon[$(service)])");
commands:
"$(services.start[$(service)])"
comment => "Method for starting this service",
ifvarclass => canonify("$(service)_restart");
"$(services.enable[$(service)])"
comment => "Method for enabling this service",
ifvarclass => canonify("$(service)_restart");
}
This bundle takes a service name as an argument, and assigns it to the
local variable "service". It copies the OS-and-service-appropriate
config file into place if it needs to, and enables/starts the service
if it needs to. How does it know if it needs to? By setting the
class "$(service)_restart" if the service isn't running, or if the
config file had to be copied.
So far, so good. Well, except for the mess of brackets. All those
hashes are in the services bundle, so you need to be explicit about
the scope. (There are provisions for global variables, but I've kept
my use of 'em to a minimum.) And so what in Perl would be, say:
$services->start{$service}
becomes
"$(services.start[$(service)])"
Square brackets for the hash, round brackets for the string (and to
indicate that you're using a variable -- IOW, it's "$(variable)", not
"$variable" like you're used to), and dots to indicate scope
("services.start" == the start variable in the services bundle).
It's...well, it's an ugly mess o' brackets. But I can deal with
that. And this arrangement/pattern, which came from the Cfengine
documentation itself, has been pretty helpful to me for dealing with
single config file services.
But what about the case where a service has more than one config file?
Like autofs: you gotta copy around a map file but in SuSE you also
need /etc/sysconfig/autofs to set the LDAP variables.
Again, in Perl this would be an anonymous array on top of a hash --
something like:
An slist is a list of strings. All right, fine; different layout,
same idea, stick it in the services bundle and away we go. But:
remote scalars can be referenced; remote lists cannot without gymnastics.
From the docs:
During list expansion, only local lists can be expanded, thus global
list references have to be mapped into a local context if you want to
use them for iteration. Instead of doing this in some arbitrary way,
with possibility of name collisions, cfengine asks you to make this
explicit. There are two possible approaches.
The first of those two approaches is, I think, passing the list as a
parameter, whereupon it just works? maybe? (It's a not-so-minor
nitpick that there are lots of examples in the Cf3 handbook that are
not explained and don't make much sense. They apparently work, but
how is not at all clear, or discernible.) I think it's meant to be
like Perl's let's-flatten-everything-into-a-list approach to passing
variables.
The second is to just go ahead and redeclare the remote slist (array)
as a local one that's set to the remote value. Again, from the
docs:
which makes this prelude to all of that handwaving even more irritating:
Instead of doing this in some arbitrary way, with possibility of
name collisions...
...
...I mean...
...I mean, what is the point of requiring explicit paths to
variables in other scopes if you're just going to insert random
speedbumps to assauge needless worries about name collisions? What
the hell is with this let's-redeclare-it-AGAIN approach?
In Cfengine3, I had been setting up printers for people using lpadmin
commands. Among other things, it used a particular PPD file for the
local HP printer. It turns out that in Oneiric, those files are no
longer present, or even available; judging by what I found on my
laptop, the PPD file is (I think) generated automagically by
/usr/share/cups/ppd-updaters/hplip-cups.
It's possible that I could figure this out for my new workstation. But
right now, I don't think I can be bothered. I'm going to just set this
up by hand, and hope that either I'll get a print server or I'll
figure it out.
I've added a bundle that enables/disables booleans and have used
it on one machine; this is pretty trivial.
File contexts and restorecon appear to be mainly controlled by
plain old files in /etc/selinux/targeted/contexts/files, but there are
stern warnings about letting libselinux manage them. However,
this thread on the SELinux mailing list seems to say it's okay to copy
them around.
Puppet appears to be further ahead in this. This guy
compiles policy files locally using Puppet; this other dude has a
couple of posts on this. There are yet other other folks
using Puppet to do this, and it would be worth checking them out as a
source of ideas.
I need to improve my collection of collective pronouns.
At $WORK I'm moving a web server to a new machine. Natch, I'm getting
the new one ready, testing as I go, and when things are good I'll
change DNS to point to the new machine. I found myself testing a lot
of Apache Alias directives -- we've accumulated rather a lot -- my
usual routine:
Fire up my laptop
Edit /etc/hosts and add a record for the website pointing at the new server
...was getting damned tiresome. Perl to the rescue! WWW::Mechanize, Test::More and
Apache::Admin::Config are damned useful, and when the authors weren't
looking I bent them to my will.
So: thornhill, a Small but Useful(tm) script to check URLs
mentioned in Apache config files. Here's what it does:
Sucks in an httpd.conf file
Looks for all the Alias directives in the VirtualHost section
Verifies that the content of those URLs on the old and new server is identical
We'll call this release 0.1 ("works for me!"). You can download it
here; be sure to note the bugs and assumptions.
I tripped across this error today with Cfengine 3:
cf3:./inputs/promises.cf:1,22: Redefinition of body "control" for "common" is a broken promise, near token '{'
The weird thing was this was a stripped down promises.cf, and I could
not figure out why it was complaining about redefinitions. I finally
found the error:
This site has gone dark today in protest of the U.S. Stop Online
Piracy Act (SOPA) and PROTECT-IP Act (PIPA). The U.S. Congress is
about to censor the Internet, even though the vast majority of
Americans are opposed. We need to kill these bills to protect our
rights to free speech, privacy, and prosperity. Learn more at
AmericanCensorship.org!
I think I just tripped across this bug, though on Ubuntu rather
than Red Hat: disabling the public SNMP community on an HP printer
means that CUPS will no longer print to it. The CUPS error_log shows
this:
prnt/backend/hp.c 745: ERROR: open device failed stat=12: hp:/net/HP_Color_LaserJet_4700?ip=1.2.3.4
Re-enabling the public community got CUPS working again, but made Baby
Tom Limoncelli cry.
One of this points was that in this brave new world, we need to let go
of serialism ("A follows B follows C, and that's Just The Way It
Is(tm)"). That's the old way of thinking, he said, the Industrial
way; we can do much more in parallel than we ever could in serial.
It occurs to me that it might be better to say that needless
serialism can be let go of. Like a Makefile: the final executable
depends on all the object files; without them, there's no sense trying
to create it. But the object files typically depend on a file or two
each (a .c and .h file, say), and there's no reason they can't be
compiled in parallel ("make -j9"). Dependencies are there for a
reason, and it is no bad thing to hold on to them.
(Kinda like the misquoting of Emerson. Often, you hear "Consistency is
the hobgoblin of little minds." But the quote actually begins "A
foolish consistency..." And now, having demonstrated my superiority
by quoting Wikipedia, I will now disappear up my own ass.)
"The night of the fight, you may feel a slight sting. That's pride
fucking with you. Fuck pride. Pride only hurts, it never helps."
-- Pulp Fiction
Last week my wife's laptop started acting up -- unaccountably slow at
random intervals, programs crashing for no good reason, etc. Turned
out the hard drive was failing, so one of my jobs today is to pick up
a new one.
She mentioned later on that, when I told her, her first reaction was
"Okay, he'll just grab another hard drive from his pile." Because I
used to have a big pile of hard drives...and random memory, and
motherboards, and power supplies, and the odd case or two. These
things are pretty much mandatory if you're any kinda geek, and I was.
Still am, but now one without the space for all this crap.
More importantly, without the time. A while back I broke a hinge on
my laptop screen. I looked at the instructions for replacing it, and
passed; I took it to a repair shop and paid them a stupid amount of
money when I coulda bought it myself for $20 on eBay. It hurt, at
first...but I couldn't stop thinking that, realistically, it was going
to take me 6-8 hours to replace that hinge: figuring out how to take
it apart, a couple of false starts, finding the tools, actually doing
it, then reversing the process.
And 6-8 hours? is a LUXURY. Family, other hobbies...these are the fun
things I do now. The commute I do to do other fun things. Keeping
hard drives with uncertain histories/remaining life span, I've left.
Swapping out random components to see if a mchine can limp along for
another three months is no longer fun. I've learned enough about
economics to see what's happening here: my time has become more
valuable than money; I'll pay to have something Just Work.
It's been strange to let go of computers as The One Way to Have Fun.
I think back to a coworker at my first tech job -- he must have been
about as old then as I am now -- and how astounded I was when he told
me he no longer played with computers at home. "That's never gonna
happen to ME!" I thought. Heh.
It astounds me how much we can know about an exoplanet's atmosphere.
Like:
It's got sodium, hydrogen, oxygen and silicon
It hasn't got water
Estimates of its pressure and temperature
Detection of its thermosphere (the ISS rides around in Earth's
thermosphere; who knew?)
It has a relatively clear upper atmosphere
Its atmosphere might be being blown away by its host star
Constraints on the strength of its magnetic field
Many, many details at the Exoclimes blog; pretty picture and more
user-friendly summary of one set of observations by the Spitzer Space
Telescope here.
I've got a new workstation at $WORK. (Well, where else would it be?)
It's pretty sweet: i7 quad-core processor, clock speed > 3GHz
(honestly, I barely keep track anymore), and 8GB of RAM. 8GB!
Insane.
When I arrived in 2008, I used a -- not cast-off, but unused P4 with
4 GB of RAM. I didn't want to make a big fuss about it; I saved
the fuss, instead, for a nice business laptop from Dell that worked
well with Linux. Since 90% of my work is Firefox + Emacs + XTerms,
and my WM of choice at the moment is Awesome, speed was not a
problem and the memory was fine.
Lately, though, I've discovered Vagrant. It looks pretty sweet,
but my current machine is sloooow when I try to run a couple of VMs.
(So's my laptop, despite a better processor; I suspect the 5400RPM
drive.) I'm hoping that the new machine will make a big difference.
Just gotta install Ubuntu and move stuff over. Fortunately I've been
pretty good about keeping my machine config in Cfengine, so that'll
help. And then build some VMs. I'm always surprised at people who
feel comfortable downloading random VM images from the Internet.
Yeah, it's probably okay...but how do you know?
One thing that Vagrant is missing is integration with Cfengine.
Fortunately, the documentation for extending it seems pretty good
(plus, I can always kick things off with a shell script). This
might be an excuse to learn Ruby.
A long-standing project at $WORK is to move the website to a new
server. I'm also using it as a chance to get our website working
under SELinux, rather than just automatically turning it off.
There's already one site on this server, running Wordpress, and I
decided to get serious about migrating the other website, which runs
Drupal.
As documented here, the name_connect permission allows you to
name sockets ("these are the mysql sockets, these are the SMTP
sockets...") and set permissions that way. Okay, so now
we're getting a note that prevented Drupal from working because
SELinux has denied httpd access to the mysqld TCP port.
What suprised me is that the Wordpress site did not seem to be
encountering this error. The two relevant parts of the config files
are:
host: Can be either a host name or an IP address. Passing the NULL
value or the string "localhost" to this parameter, the local host is
assumed. When possible, pipes will be used instead of the TCP/IP
protocol.
See the difference? Without looking up the code for mysqli, I think
that an IP address -- even 127.0.0.1 -- makes mysqli just try TCP
connections; using "localhost" makes it try a named pipe first. Since
TCP connections to the MySQL port apparently aren't allowed by default
CentOS SELinux policy, the former fails.
Solution: make it "localhost" in both, and remember not to make
assumptions.
Trying to take care of the HP RFU vulnerability. Miss the bit
that says my printer doesn't have the ability to disable this built
into the web interface. Decide I need to download HP Jet WebAdmin.
Forced to register for an "HP Passport Account". Fill in country of
origin, among other details. Click to go back to download page, get
"Sorry, we can't do that" message. Navigate back to download page.
Fill in country of origin again. Fill in name of company. Download
-- 300 MB. Go to download documentation; I see "installation
instructions", "terms of use" and "post-sales support." What a
crock.
-- Oh, and now I discover that it's going to install Microsoft SQL
Server. Fucking hell. And that's not even including the rat's nest
of menus.
Don't get me wrong: I can see how this would be immensely useful for
a large number of printers. (And I strongly suspect that "large"
means "greater than one".) But for one printer, it's an amazing
overhead for such a small thing. Worse, I'm willing to bet that my
whole task could be reduced to a single SNMP set command. But I'm too
lazy to install Wireshark and figure out what that would be.
A while back I was looking for a script that would email warnings if a
user was over their disk quota. Surprisingly, I couldn't find one, so
I wrote one.
Here's quota_check.pl, for what it's worth: A Small but
Useful(tm) utility to check quotas and send emails periodically.
Depends on Perl's Quota module. Meant to be called from cron
like so:
for i in /filesystem/* ; do [ -d $i ] && quota_check.pl -u $(basename $i) -f $(dirname $i) ; done
It will check whether the user is at or above the warning level
(default: 80%) their quota for inodes or blocks. If so, it will see
if a warning has been sent recently (default: 7 days) by looking for
the file ".quotawarningsent" in their home directory. This file
is maintained by the script, and hold the time (in seconds since the
epoch) the last warning was sent. If it's time to send another
warning, or if one was never sent in the first place, it'll send it
and update this file.
When compiling CHARMM, I'll sometimes encounter errors like this:
charmm/lib/gnu/iniall.o: In function `stopch_':
iniall.f:(.text+0x1404): relocation truncated to fit: R_X86_64_PC32 against symbol `ldbia_' defined in COMMON section in charmm/lib/gnu/iniall.o
iniall.f:(.text+0x14af): relocation truncated to fit: R_X86_64_PC32 against symbol `seldat_' defined in COMMON section in charmm/lib/gnu/iniall.o
iniall.f:(.text+0x14d7): relocation truncated to fit: R_X86_64_32S against symbol `seldat_' defined in COMMON section in charmm/lib/gnu/iniall.o
iniall.f:(.text+0x151b): relocation truncated to fit: R_X86_64_32S against symbol `seldat_' defined in COMMON section in charmm/lib/gnu/iniall.o
iniall.f:(.text+0x1545): relocation truncated to fit: R_X86_64_PC32 against symbol `shakeq_' defined in COMMON section in charmm/lib/gnu/iniall.o
iniall.f:(.text+0x1551): relocation truncated to fit: R_X86_64_PC32 against symbol `shakeq_' defined in COMMON section in charmm/lib/gnu/iniall.o
iniall.f:(.text+0x1560): relocation truncated to fit: R_X86_64_PC32 against symbol `kspveci_' defined in COMMON section in charmm/lib/gnu/iniall.o
iniall.f:(.text+0x156e): relocation truncated to fit: R_X86_64_PC32 against symbol `kspveci_' defined in COMMON section in charmm/lib/gnu/iniall.o
iniall.f:(.text+0x16df): relocation truncated to fit: R_X86_64_PC32 against symbol `shpdat_' defined in COMMON section in charmm/lib/gnu/iniall.o
charmm/lib/gnu/iniall.o: In function `iniall_':
iniall.f:(.text+0x1cae): relocation truncated to fit: R_X86_64_PC32 against symbol `cluslo_' defined in COMMON section in charmm/lib/gnu/iniall.o
iniall.f:(.text+0x1cb8): additional relocation overflows omitted from the output
collect2: ld returned 1 exit status
What this means is that the full 64-bit address of foovar, which now
lives somewhere above 5 gigabytes, can't be represented within the
32-bit space allocated for it.
The reason for this error is the size of data that you are using. This
is seen to happen when your program needs more than 2GB of data. Well,
who needs such big data at compile time? I do for one and there are
other people in the HPC world who do that too. For all of them the
life saver or may be a day saver is the compiler option -mcmodel.
This is a known problem with CHARMM's "huge" keyword.
This page, from the University of Alberta, also has excellent
background information. (Oh, and also? They have a YouTube channel
on using Linux clusters.)
Yesterday I spent a couple hours helping one of the students. (I think
he's a grad student now, so maybe it's not kosher to still call him a
student.) He was trying to compile an add-on for CHARMM,
and was running into lots of problems. Some were because of poor
instructions from the add-on, and some were due to the (ahem)
Byzantine build process that CHARMM uses. (In the words of Bryan
Cantrill: "Oh, there are things I'm NOT saying! Believe me, I'm
holding back!")
I ended up talking to him for close to two hours -- not just solving
his problems, but trying to explain to him what I was doing and why.
This happened to be compilation problems, so we covered object files,
dynamic versus static libraries, compiling and linking -- but also
patching and why you should script things whenever possible. Odds are
he'll have more questions today when he comes across something else.
I'm happy to spend this time with him. He's interested (or at least
doing a damned good impression of being interested), and I want him
to understand what's going on. It'll serve him well to have the
background information needed to (say) understand what a compiler
error looks like and how to find out how to fix it. He's in
bioinformatics (or rather, he's doing computer simulations of protein
folding, so if he's not in bioinformatics precisely, he might as well
be) and odds are, he'll need to do this sort of thing himself someday.
Not everyone's interested in this level of detail, of course. But
even in my small department, there are usually one or two people each
year that want to learn. Sometimes it's the obvious geeks; sometimes
it's people I never would have guessed would care, or have such an
aptitude. There are a few that I think would make good sysadmins, and
I make a point of mentioning this to them.
This is one of the most enjoyable parts of my job. I love seeing the
light dawn in people's eyes when they realize what Unix is capable of,
or how packets flow from here to there, or how to write 50 lines in
Perl that save them 500 in C++. I've had compliments on my teaching
ability before, and my wife has suggested more than once that I
consider doing it formally, so I'm cautiously optimistic that, while
I'm probably ignorant of the huge amount of work it'd be, I wouldn't
entirely suck.
I've got friends (Hi Andy! Hi Victor!) who teach at universities --
system administration, programming -- and at some point I'd like to
follow in their footsteps. (And this is the point where my inner
Limoncelli says, "Great! Now write down the steps needed so it'll get
done!")
Inna meantime, it's informal: no marks, no tests, no structure...just,
"Hey, lookit this. Isn't that neat?"
She goes from expressing sympathy that your music player had its cover
come off to nodding knowingly when you say it gives you a chance to
resolder the headphone connections.
Last year (hah! last year!), by which I mean Xmas 2011, just two weeks
ago, I did all my updates and disruptive work in the week BEFORE Xmas,
rather than after. It's one of the perks of working for a university
that I get a week off between Xmas and the new year, and I decided to
actually take advantage of it this year.
(I could make that paragraph better, but it's beyond me right now.)
(The other advantage, of course, is free honey from honeybee
researchers.)
I allowed myself three days, but was actually done in two. That's
considerably better than last year, and in large part that's
because I learned a (not the) right way to uninstall/reinstall
proprietary ATI drivers. Unlike last year, in fact, it was
practically painless.
This might explain why it was not terribly surprising to come back to
a problem related to the upgrade: a user telling me that PyMOL
no longer worked. And sure enough, when I tried running it over SSH,
it crashed:
...which wasn't exactly what she was seeing, but I was pretty sure
that it was only the tip of the iceberg.
A backtrace (caught by the wonderful catchsegv, which I only just
found out about) showed that it was failing at XF86DRIQueryVersion,
which turns up in approximately 38% of all web pages. They're all
related to problems w/the proprietary ATI driver, how it eats ponies,
and how everything was sunshine and lollipops once they rolled back to
the Mesa-provided copy of libGL.
We are running the proprietary ATI driver -- we need the 3D
performance -- so this made sense. And after last year's fiasco I was
quite prepared to believe that ATI has a nasty appetite for BBQ. But
much searching showed that before the Xmas upgrade, everyone'd been
using the ATI-supplied libGL w/, presumably, no problems. I decided to
prove it by reinstalling Mesa on a test machine. Yep, now it works
fine. ATI hates the world!
...but I'd forgotten that I was running this over SSH. With X
forwarding. And this made a difference. The Truth Hammer smacked me
when I tried PyMOL on another workstation, from an actual
sit-at-the-keyboard-and-use-the-coffee-holder X session, and it worked
fine. I SSH'd to the original user's machine, and that worked fine.
I checked the version of libGL on my machine, and sure enough it was
different: 7.7.1 versus 7.8.2. My suspicion is that either the
XF86DRIQueryVersion routine has changed enough that this causes
problems, or there was some other difference (32-bit vs 64bit? could
be...) between my machine and theirs (mine runs a different distro, so
there's lots of chances for interesting differences).
I simply did not expect there to be any problem debugging X programs
over SSH; probably naive, but what the hell. Now I know.
Oh, and the user's problems? Wack PYTHONHOME. Unset that and all
is well.