Cfengine 3 syntax

Cfengine 3 has a lot of things going for it. But its syntax is not one of them.

Consider this situation: you have CentOS machines, SuSE machines and Solaris machines. All of them should run, say, SSH, NTP and Apache why not? The files are slightly different between them, and so is the method of starting/stopping/enabling services, but mostly we're doing the same thing.

I've got a bundle in Cfengine that looks like this:

bundle common services {
  vars:
    redhat|centos::
      "cfg_file_prefix"     string => "centos/5";

      "cfg_file[httpd]"     string => "/etc/httpd/conf/httpd.conf";
      "daemon[httpd]"       string => "httpd";
      "start[httpd]"        string => "/sbin/service httpd start";
      "enable[httpd]"       string => "/sbin/chkconfig httpd on";

      "cfg_file[ssh]"       string => "/etc/ssh/sshd_config";
      "daemon[ssh]"         string => "sshd";
      "start[ssh]"          string => "/sbin/service sshd restart";
      "enable[ssh]"         string => "/sbin/chkconfig sshd on";

...and so on. We're basically setting up four hashes -- daemon, start, enable and cfg -- and populating them with the appropriate entries for Red Hat/Centos ssh and Apache configs; you can imagine slightly different entries for Solaris and SuSE. The cfg_file_prefix allows me to put CentOS' config files in a separate directory from other OS.

Then there's this bundle:

bundle agent fix_service(service) {
  files:
    "$(services.cfg_file[$(service)])"
      copy_from     => secure_cp("$(g.masterfiles)/$(services.cfg_file_prefix)/$(services.cfg_file[$(service)])", "$(g.masterserver)"),
      classes       => if_repaired("$(service)_restart"),
      comment       => "Copy a stock configuration file template from repository";

  processes:
    "$(services.daemon[$(service)])"
      comment       => "Check that the server process is running, and start if necessary",
      restart_class => canonify("$(service)_restart"),
      ifvarclass    => canonify("$(services.daemon[$(service)])");

  commands:
    "$(services.start[$(service)])"
      comment       => "Method for starting this service",
      ifvarclass    => canonify("$(service)_restart");

    "$(services.enable[$(service)])"
      comment       => "Method for enabling this service",
      ifvarclass    => canonify("$(service)_restart");
}

This bundle takes a service name as an argument, and assigns it to the local variable "service". It copies the OS-and-service-appropriate config file into place if it needs to, and enables/starts the service if it needs to. How does it know if it needs to? By setting the class "$(service)_restart" if the service isn't running, or if the config file had to be copied.

So far, so good. Well, except for the mess of brackets. All those hashes are in the services bundle, so you need to be explicit about the scope. (There are provisions for global variables, but I've kept my use of 'em to a minimum.) And so what in Perl would be, say:

$services->start{$service}

becomes

"$(services.start[$(service)])"


Square brackets for the hash, round brackets for the string (and to indicate that you're using a variable -- IOW, it's "$(variable)", not "$variable" like you're used to), and dots to indicate scope ("services.start" == the start variable in the services bundle).

It's...well, it's an ugly mess o' brackets. But I can deal with that. And this arrangement/pattern, which came from the Cfengine documentation itself, has been pretty helpful to me for dealing with single config file services.

But what about the case where a service has more than one config file? Like autofs: you gotta copy around a map file but in SuSE you also need /etc/sysconfig/autofs to set the LDAP variables.

Again, in Perl this would be an anonymous array on top of a hash -- something like:

$services->cfg_file{"autofs"}[0] = "/etc/auto.master
$services->cfg_file{"autofs"}[1] = "/etc/sysconfig/aufofs"

and you'd walk it like so:

foreach my $i in ($services->cfg_file{"autofs"}) { # something with $i }

or even:

while ($services->cfg_file{"autofs"}) { # something with $_ }

(I think...I'm embarrassed sometimes at how rusty my Perl is.)

In Cfengine, you pile an anonymous array on top of a has like so:

  "cfg_file[autofs]" slist => { "/etc/auto.master", "/etc/sysconfig/autofs" };

An slist is a list of strings. All right, fine; different layout, same idea, stick it in the services bundle and away we go. But: remote scalars can be referenced; remote lists cannot without gymnastics. From the docs:

During list expansion, only local lists can be expanded, thus global list references have to be mapped into a local context if you want to use them for iteration. Instead of doing this in some arbitrary way, with possibility of name collisions, cfengine asks you to make this explicit. There are two possible approaches.

The first of those two approaches is, I think, passing the list as a parameter, whereupon it just works? maybe? (It's a not-so-minor nitpick that there are lots of examples in the Cf3 handbook that are not explained and don't make much sense. They apparently work, but how is not at all clear, or discernible.) I think it's meant to be like Perl's let's-flatten-everything-into-a-list approach to passing variables.

The second is to just go ahead and redeclare the remote slist (array) as a local one that's set to the remote value. Again, from the docs:

bundle common va {
  vars:
   "tmpdirs"  slist => { "/tmp", "/var/tmp", "/usr/tmp"  };
}

bundle agent hardening {
  classes:
    "ok" expression => "any";

  vars:
   "other"    slist => { "/tmp", "/var/tmp" };
   "x"        slist => { @(va.tmpdirs) };

  reports:
    ok::
      "Do $(x)";
      "Other: $(other)";
}

which makes this prelude to all of that handwaving even more irritating:

Instead of doing this in some arbitrary way, with possibility of name collisions...

...

...I mean...

...I mean, what is the point of requiring explicit paths to variables in other scopes if you're just going to insert random speedbumps to assauge needless worries about name collisions? What the hell is with this let's-redeclare-it-AGAIN approach?

The rage, it fills me.

Did you just tell me to go fuck myself?

Tags: cfengine didyoujusttellmetogofuckmyself

PPD changes in Oneiric

In Cfengine3, I had been setting up printers for people using lpadmin commands. Among other things, it used a particular PPD file for the local HP printer. It turns out that in Oneiric, those files are no longer present, or even available; judging by what I found on my laptop, the PPD file is (I think) generated automagically by /usr/share/cups/ppd-updaters/hplip-cups.

It's possible that I could figure this out for my new workstation. But right now, I don't think I can be bothered. I'm going to just set this up by hand, and hope that either I'll get a print server or I'll figure it out.

Tags: cfengine ubuntu

Well, that didn't take long

Megaupload is back.

Tags: politics

Cfengine 3 and SELinux

  • No native support in Cf3 for SELinux.

  • I've added a bundle that enables/disables booleans and have used it on one machine; this is pretty trivial.

  • File contexts and restorecon appear to be mainly controlled by plain old files in /etc/selinux/targeted/contexts/files, but there are stern warnings about letting libselinux manage them. However, this thread on the SELinux mailing list seems to say it's okay to copy them around.

  • Puppet appears to be further ahead in this. This guy compiles policy files locally using Puppet; this other dude has a couple of posts on this. There are yet other other folks using Puppet to do this, and it would be worth checking them out as a source of ideas.

  • I need to improve my collection of collective pronouns.

Tags: cfengine selinux

Introducing Thornhill

At $WORK I'm moving a web server to a new machine. Natch, I'm getting the new one ready, testing as I go, and when things are good I'll change DNS to point to the new machine. I found myself testing a lot of Apache Alias directives -- we've accumulated rather a lot -- my usual routine:

  • Fire up my laptop
  • Edit /etc/hosts and add a record for the website pointing at the new server
  • Fire up the browser on the laptop
  • Fire up the browser on my desktop
  • Look up the Alias record in httpd.conf
  • Type it into both
  • vgrep for differences

...was getting damned tiresome. Perl to the rescue! WWW::Mechanize, Test::More and Apache::Admin::Config are damned useful, and when the authors weren't looking I bent them to my will.

So: thornhill, a Small but Useful(tm) script to check URLs mentioned in Apache config files. Here's what it does:

  • Sucks in an httpd.conf file
  • Looks for all the Alias directives in the VirtualHost section
  • Verifies that the content of those URLs on the old and new server is identical

We'll call this release 0.1 ("works for me!"). You can download it here; be sure to note the bugs and assumptions.

Share and enjoy!

Update: Problems have arisen.

Tags: software

Cfengine 3 error: Redefinition of body \"control\" for \"common\" is a broken promise, near token '{'

I tripped across this error today with Cfengine 3:

cf3:./inputs/promises.cf:1,22: Redefinition of body "control" for "common" is a broken promise, near token '{'

The weird thing was this was a stripped down promises.cf, and I could not figure out why it was complaining about redefinitions. I finally found the error:

body common control {
bundlesequence => { "test" };
inputs => { "promises.cf", "cfengine_stdlib.cf" };
}

Yep, including the promises.cf file itself in the inputs section borked everything; removing it fixed things right away.

Tags: cfengine

Blacked out for SOPA

This site has gone dark today in protest of the U.S. Stop Online Piracy Act (SOPA) and PROTECT-IP Act (PIPA). The U.S. Congress is about to censor the Internet, even though the vast majority of Americans are opposed. We need to kill these bills to protect our rights to free speech, privacy, and prosperity. Learn more at AmericanCensorship.org!

Tags: politics

HP printing won't work with SNMP public community disabled

I think I just tripped across this bug, though on Ubuntu rather than Red Hat: disabling the public SNMP community on an HP printer means that CUPS will no longer print to it. The CUPS error_log shows this:

prnt/backend/hp.c 745: ERROR: open device failed stat=12: hp:/net/HP_Color_LaserJet_4700?ip=1.2.3.4

Re-enabling the public community got CUPS working again, but made Baby Tom Limoncelli cry.

Tags:

Mark Burgess' talk at LISA 11

I've bene catching up on the talks at LISA last year, and one of them was Mark Burgess' talk "3 Myths and 3 Challenges to Bring System Administration out of the Dark Ages". (Anyone else reminded of "7 things about lawyers the occult can't explain?") If I was there, I'd've made this comment; as it is, I'll leave it here.

One of this points was that in this brave new world, we need to let go of serialism ("A follows B follows C, and that's Just The Way It Is(tm)"). That's the old way of thinking, he said, the Industrial way; we can do much more in parallel than we ever could in serial.

It occurs to me that it might be better to say that needless serialism can be let go of. Like a Makefile: the final executable depends on all the object files; without them, there's no sense trying to create it. But the object files typically depend on a file or two each (a .c and .h file, say), and there's no reason they can't be compiled in parallel ("make -j9"). Dependencies are there for a reason, and it is no bad thing to hold on to them.

(Kinda like the misquoting of Emerson. Often, you hear "Consistency is the hobgoblin of little minds." But the quote actually begins "A foolish consistency..." And now, having demonstrated my superiority by quoting Wikipedia, I will now disappear up my own ass.)

Tags: lisa

I'll take failing hard drives for a thousand, Alex

"The night of the fight, you may feel a slight sting. That's pride
fucking with you. Fuck pride. Pride only hurts, it never helps."
-- Pulp Fiction

Last week my wife's laptop started acting up -- unaccountably slow at random intervals, programs crashing for no good reason, etc. Turned out the hard drive was failing, so one of my jobs today is to pick up a new one.

She mentioned later on that, when I told her, her first reaction was "Okay, he'll just grab another hard drive from his pile." Because I used to have a big pile of hard drives...and random memory, and motherboards, and power supplies, and the odd case or two. These things are pretty much mandatory if you're any kinda geek, and I was. Still am, but now one without the space for all this crap.

More importantly, without the time. A while back I broke a hinge on my laptop screen. I looked at the instructions for replacing it, and passed; I took it to a repair shop and paid them a stupid amount of money when I coulda bought it myself for $20 on eBay. It hurt, at first...but I couldn't stop thinking that, realistically, it was going to take me 6-8 hours to replace that hinge: figuring out how to take it apart, a couple of false starts, finding the tools, actually doing it, then reversing the process.

And 6-8 hours? is a LUXURY. Family, other hobbies...these are the fun things I do now. The commute I do to do other fun things. Keeping hard drives with uncertain histories/remaining life span, I've left. Swapping out random components to see if a mchine can limp along for another three months is no longer fun. I've learned enough about economics to see what's happening here: my time has become more valuable than money; I'll pay to have something Just Work.

It's been strange to let go of computers as The One Way to Have Fun. I think back to a coworker at my first tech job -- he must have been about as old then as I am now -- and how astounded I was when he told me he no longer played with computers at home. "That's never gonna happen to ME!" I thought. Heh.

Tags:

Exoplanet atmospheric research

It astounds me how much we can know about an exoplanet's atmosphere. Like:

  • It's got sodium, hydrogen, oxygen and silicon
  • It hasn't got water
  • Estimates of its pressure and temperature
  • Detection of its thermosphere (the ISS rides around in Earth's thermosphere; who knew?)
  • It has a relatively clear upper atmosphere
  • Its atmosphere might be being blown away by its host star
  • Constraints on the strength of its magnetic field

Many, many details at the Exoclimes blog; pretty picture and more user-friendly summary of one set of observations by the Spitzer Space Telescope here.

Tags: astronomy

New workstation

I've got a new workstation at $WORK. (Well, where else would it be?) It's pretty sweet: i7 quad-core processor, clock speed > 3GHz (honestly, I barely keep track anymore), and 8GB of RAM. 8GB! Insane.

When I arrived in 2008, I used a -- not cast-off, but unused P4 with 4 GB of RAM. I didn't want to make a big fuss about it; I saved the fuss, instead, for a nice business laptop from Dell that worked well with Linux. Since 90% of my work is Firefox + Emacs + XTerms, and my WM of choice at the moment is Awesome, speed was not a problem and the memory was fine.

Lately, though, I've discovered Vagrant. It looks pretty sweet, but my current machine is sloooow when I try to run a couple of VMs. (So's my laptop, despite a better processor; I suspect the 5400RPM drive.) I'm hoping that the new machine will make a big difference.

Just gotta install Ubuntu and move stuff over. Fortunately I've been pretty good about keeping my machine config in Cfengine, so that'll help. And then build some VMs. I'm always surprised at people who feel comfortable downloading random VM images from the Internet. Yeah, it's probably okay...but how do you know?

One thing that Vagrant is missing is integration with Cfengine. Fortunately, the documentation for extending it seems pretty good (plus, I can always kick things off with a shell script). This might be an excuse to learn Ruby.

Tags: virtualization hardware cfengine

It is *so* *weird*...

...to see the LinkedIn page for one of your rock heroes.

Tags:

SELinux and Apache - MySQL connections

A long-standing project at $WORK is to move the website to a new server. I'm also using it as a chance to get our website working under SELinux, rather than just automatically turning it off. There's already one site on this server, running Wordpress, and I decided to get serious about migrating the other website, which runs Drupal.

First time I fired up Drupal, I got this error:

avc:  denied  { name_connect } for  pid=30789 comm="httpd" dest=3306
scontext=system_u:system_r:httpd_t:s0
tcontext=system_u:object_r:mysqld_port_t:s0 tclass=tcp_socket

As documented here, the name_connect permission allows you to name sockets ("these are the mysql sockets, these are the SMTP sockets...") and set permissions that way. Okay, so now we're getting a note that prevented Drupal from working because SELinux has denied httpd access to the mysqld TCP port.

What suprised me is that the Wordpress site did not seem to be encountering this error. The two relevant parts of the config files are:

  • Drupal:
$db_url = 'mysqli://user:password@127.0.0.1/database';

  • Wordpress:
define('DB_NAME', 'wp_db');
define('DB_USER', 'wp_db_user');
define('DB_PASSWORD', 'password');
define('DB_HOST', 'localhost');

Hm, the only difference is that localhost-vs-127.0.0.1 thing...

After some digging, it appears to be PHP's mysqli at work. From the documentation:

host: Can be either a host name or an IP address. Passing the NULL value or the string "localhost" to this parameter, the local host is assumed. When possible, pipes will be used instead of the TCP/IP protocol.

See the difference? Without looking up the code for mysqli, I think that an IP address -- even 127.0.0.1 -- makes mysqli just try TCP connections; using "localhost" makes it try a named pipe first. Since TCP connections to the MySQL port apparently aren't allowed by default CentOS SELinux policy, the former fails.

Solution: make it "localhost" in both, and remember not to make assumptions.

Tags: selinux mysql

Yes, more steps are definitely needed.

Trying to take care of the HP RFU vulnerability. Miss the bit that says my printer doesn't have the ability to disable this built into the web interface. Decide I need to download HP Jet WebAdmin. Forced to register for an "HP Passport Account". Fill in country of origin, among other details. Click to go back to download page, get "Sorry, we can't do that" message. Navigate back to download page. Fill in country of origin again. Fill in name of company. Download -- 300 MB. Go to download documentation; I see "installation instructions", "terms of use" and "post-sales support." What a crock.

-- Oh, and now I discover that it's going to install Microsoft SQL Server. Fucking hell. And that's not even including the rat's nest of menus.

Don't get me wrong: I can see how this would be immensely useful for a large number of printers. (And I strongly suspect that "large" means "greater than one".) But for one printer, it's an amazing overhead for such a small thing. Worse, I'm willing to bet that my whole task could be reduced to a single SNMP set command. But I'm too lazy to install Wireshark and figure out what that would be.

Tags: security rant

quota_check.pl

A while back I was looking for a script that would email warnings if a user was over their disk quota. Surprisingly, I couldn't find one, so I wrote one.

Here's quota_check.pl, for what it's worth: A Small but Useful(tm) utility to check quotas and send emails periodically. Depends on Perl's Quota module. Meant to be called from cron like so:

for i in /filesystem/* ; do [ -d $i ] && quota_check.pl -u $(basename $i) -f $(dirname $i) ; done

It will check whether the user is at or above the warning level (default: 80%) their quota for inodes or blocks. If so, it will see if a warning has been sent recently (default: 7 days) by looking for the file ".quotawarningsent" in their home directory. This file is maintained by the script, and hold the time (in seconds since the epoch) the last warning was sent. If it's time to send another warning, or if one was never sent in the first place, it'll send it and update this file.

Released under the GPLv3. Share and enjoy!

Tags: software

CHARMM and "Relocation truncated" errors

When compiling CHARMM, I'll sometimes encounter errors like this:

charmm/lib/gnu/iniall.o: In function `stopch_':
iniall.f:(.text+0x1404): relocation truncated to fit: R_X86_64_PC32 against symbol `ldbia_' defined in COMMON section in charmm/lib/gnu/iniall.o
iniall.f:(.text+0x14af): relocation truncated to fit: R_X86_64_PC32 against symbol `seldat_' defined in COMMON section in charmm/lib/gnu/iniall.o
iniall.f:(.text+0x14d7): relocation truncated to fit: R_X86_64_32S against symbol `seldat_' defined in COMMON section in charmm/lib/gnu/iniall.o
iniall.f:(.text+0x151b): relocation truncated to fit: R_X86_64_32S against symbol `seldat_' defined in COMMON section in charmm/lib/gnu/iniall.o
iniall.f:(.text+0x1545): relocation truncated to fit: R_X86_64_PC32 against symbol `shakeq_' defined in COMMON section in charmm/lib/gnu/iniall.o
iniall.f:(.text+0x1551): relocation truncated to fit: R_X86_64_PC32 against symbol `shakeq_' defined in COMMON section in charmm/lib/gnu/iniall.o
iniall.f:(.text+0x1560): relocation truncated to fit: R_X86_64_PC32 against symbol `kspveci_' defined in COMMON section in charmm/lib/gnu/iniall.o
iniall.f:(.text+0x156e): relocation truncated to fit: R_X86_64_PC32 against symbol `kspveci_' defined in COMMON section in charmm/lib/gnu/iniall.o
iniall.f:(.text+0x16df): relocation truncated to fit: R_X86_64_PC32 against symbol `shpdat_' defined in COMMON section in charmm/lib/gnu/iniall.o
charmm/lib/gnu/iniall.o: In function `iniall_':
iniall.f:(.text+0x1cae): relocation truncated to fit: R_X86_64_PC32 against symbol `cluslo_' defined in COMMON section in charmm/lib/gnu/iniall.o
iniall.f:(.text+0x1cb8): additional relocation overflows omitted from the output
collect2: ld returned 1 exit status

The problem is that the linker is running out of room:

What this means is that the full 64-bit address of foovar, which now lives somewhere above 5 gigabytes, can't be represented within the 32-bit space allocated for it.

Another explanation:

The reason for this error is the size of data that you are using. This is seen to happen when your program needs more than 2GB of data. Well, who needs such big data at compile time? I do for one and there are other people in the HPC world who do that too. For all of them the life saver or may be a day saver is the compiler option -mcmodel.

This is a known problem with CHARMM's "huge" keyword.

There are a couple of solutions:

This page, from the University of Alberta, also has excellent background information. (Oh, and also? They have a YouTube channel on using Linux clusters.)

Tags: toptip charmm cluster

Teaching

Yesterday I spent a couple hours helping one of the students. (I think he's a grad student now, so maybe it's not kosher to still call him a student.) He was trying to compile an add-on for CHARMM, and was running into lots of problems. Some were because of poor instructions from the add-on, and some were due to the (ahem) Byzantine build process that CHARMM uses. (In the words of Bryan Cantrill: "Oh, there are things I'm NOT saying! Believe me, I'm holding back!")

I ended up talking to him for close to two hours -- not just solving his problems, but trying to explain to him what I was doing and why. This happened to be compilation problems, so we covered object files, dynamic versus static libraries, compiling and linking -- but also patching and why you should script things whenever possible. Odds are he'll have more questions today when he comes across something else.

I'm happy to spend this time with him. He's interested (or at least doing a damned good impression of being interested), and I want him to understand what's going on. It'll serve him well to have the background information needed to (say) understand what a compiler error looks like and how to find out how to fix it. He's in bioinformatics (or rather, he's doing computer simulations of protein folding, so if he's not in bioinformatics precisely, he might as well be) and odds are, he'll need to do this sort of thing himself someday.

Not everyone's interested in this level of detail, of course. But even in my small department, there are usually one or two people each year that want to learn. Sometimes it's the obvious geeks; sometimes it's people I never would have guessed would care, or have such an aptitude. There are a few that I think would make good sysadmins, and I make a point of mentioning this to them.

This is one of the most enjoyable parts of my job. I love seeing the light dawn in people's eyes when they realize what Unix is capable of, or how packets flow from here to there, or how to write 50 lines in Perl that save them 500 in C++. I've had compliments on my teaching ability before, and my wife has suggested more than once that I consider doing it formally, so I'm cautiously optimistic that, while I'm probably ignorant of the huge amount of work it'd be, I wouldn't entirely suck.

I've got friends (Hi Andy! Hi Victor!) who teach at universities -- system administration, programming -- and at some point I'd like to follow in their footsteps. (And this is the point where my inner Limoncelli says, "Great! Now write down the steps needed so it'll get done!")

Inna meantime, it's informal: no marks, no tests, no structure...just, "Hey, lookit this. Isn't that neat?"

Tags:

You know you've got a good partner when...

She goes from expressing sympathy that your music player had its cover come off to nodding knowingly when you say it gives you a chance to resolder the headphone connections.

Tags:

Why ATI drivers did not eat my pony after all (this time)

Last year (hah! last year!), by which I mean Xmas 2011, just two weeks ago, I did all my updates and disruptive work in the week BEFORE Xmas, rather than after. It's one of the perks of working for a university that I get a week off between Xmas and the new year, and I decided to actually take advantage of it this year.

(I could make that paragraph better, but it's beyond me right now.)

(The other advantage, of course, is free honey from honeybee researchers.)

I allowed myself three days, but was actually done in two. That's considerably better than last year, and in large part that's because I learned a (not the) right way to uninstall/reinstall proprietary ATI drivers. Unlike last year, in fact, it was practically painless.

This might explain why it was not terribly surprising to come back to a problem related to the upgrade: a user telling me that PyMOL no longer worked. And sure enough, when I tried running it over SSH, it crashed:

$ pymol
/usr/bin/pymol: line 2: 17723 Segmentation fault      /usr/bin/python //usr/lib64/python2.6/site-packages/pymol/__init__.py "$@"

...which wasn't exactly what she was seeing, but I was pretty sure that it was only the tip of the iceberg.

A backtrace (caught by the wonderful catchsegv, which I only just found out about) showed that it was failing at XF86DRIQueryVersion, which turns up in approximately 38% of all web pages. They're all related to problems w/the proprietary ATI driver, how it eats ponies, and how everything was sunshine and lollipops once they rolled back to the Mesa-provided copy of libGL.

We are running the proprietary ATI driver -- we need the 3D performance -- so this made sense. And after last year's fiasco I was quite prepared to believe that ATI has a nasty appetite for BBQ. But much searching showed that before the Xmas upgrade, everyone'd been using the ATI-supplied libGL w/, presumably, no problems. I decided to prove it by reinstalling Mesa on a test machine. Yep, now it works fine. ATI hates the world!

...but I'd forgotten that I was running this over SSH. With X forwarding. And this made a difference. The Truth Hammer smacked me when I tried PyMOL on another workstation, from an actual sit-at-the-keyboard-and-use-the-coffee-holder X session, and it worked fine. I SSH'd to the original user's machine, and that worked fine.

I checked the version of libGL on my machine, and sure enough it was different: 7.7.1 versus 7.8.2. My suspicion is that either the XF86DRIQueryVersion routine has changed enough that this causes problems, or there was some other difference (32-bit vs 64bit? could be...) between my machine and theirs (mine runs a different distro, so there's lots of chances for interesting differences).

I simply did not expect there to be any problem debugging X programs over SSH; probably naive, but what the hell. Now I know.

Oh, and the user's problems? Wack PYTHONHOME. Unset that and all is well.

Happy new year, everyone!

Tags: beer sysadmin