Zypper_problems

02 Sep 2009

title: zypper problems date: Wed Sep 2 10:21:53 PDT 2009 tags: opensuse, packagemanagement

I recently ran into problems with a home-made repo for OpenSuSE. (Weirdly enough, this seems to have cropped up after the repo was already in use.) When I tried to install a package from the repo, I got this error:

Retrieving repository 'foo' metadata [error]
Repository 'foo' is invalid.
File /var/tmp/TmpFile.0aLr5H doesn't contain public key data
Please check if the URIs defined for this repository are pointing to a
valid repository.
Warning: Disabling repository 'foo' because of the above error.

There wasn't much to find about this problem; even re-installing the key didn't help. Finally I thought to look in the webserver logs, where I found this:

[Wed Sep 02 09:59:59 2009] [error] [client 10.0.0.1] File
does not exist:
/var/www/repo/opensuse/11.1/x86_64/repodata/repomd.xml.key

That led to this article, and the solution was easy:

gpg -a --export "Repository Key" > /var/www/repo/opensuse/11.1/x86_64/repodata/repomd.xml.key

Sweet!

In other news: my tortilla filled with fried rice is falling apart. Film at 11.

Migratin_

21 Aug 2009

title: Migratin' date: Fri Aug 21 13:58:30 PDT 2009 tags: meta, lisa

Heyo...I've finally migrated to Chronicle and switched the website to ikiwki. Things should be working, aside from a few links I'll be cleaning up as time goes on...however, if you notice anything truly wrong please drop a line. (The comment system is no longer email-based, btw.)

And in the interest of keeping this on-topic...looks like work may be sending me to LISA! Here's hoping...

Emacs_tip_o_the_day

12 Aug 2009

title: Emacs tip o' the day date: Wed Aug 12 09:42:15 PDT 2009 tags: emacs

Thanks to Planet Emacs, I came across this tip about tramp: turns out there's a sudo method for tramp. C-c C-r will now re-open a file using sudo. Sweet!

Registration_for_lisa_09_now_open

07 Aug 2009

title: Registration for LISA '09 now open! date: Fri Aug 7 12:44:34 PDT 2009 tags: lisa

The details on LISA '09 are finally up, and it looks good. Let's hope I can convince $WORK to send me there...

Waste_2_of_n

06 Aug 2009

title: Waste (2 of n) date: Thu Aug 6 21:05:14 PDT 2009 tags: secondthoughts

Tuesday I wrote (and yesterday I posted):

It's a nice machine, if a bit large for my tastes and a bit cheap-looking. But for the price I can't complain.

which segues nicely into my recent feelings of guilt about my choice of profession. This was prompted by two things:

Tim O'Reilly's mention of Fake Steve Jobs post entitled "I'm really thinking maybe I shouldn't have yelled at that Chinese guy so much" which is a righteous, needed kick in the ass. You should read it.
Meeting with a vendor about a possible purchase of a bunch of blades for a small cluster. When I asked him about the possibility of upgrading in a couple years, he shook his head and waved his hand dismissively, saying, "In two years it's toxic waste anyway." This was from a vendor whose website trumpets the fact that their blade chassis still fits almost every blade they've ever made. But he's right.

Right now it seems to me that in a very important sense, the best thing you can say about what I do for a living is that, in geological time scales, it really doesn't matter.

I don't know what the answer is. I'm not entirely sure what the question is. But with the new laptop, and mine back (it had been doing double-duty for the two of us), this is the first chance I've had in a while to spend my commute writing. I missed it.

Lame_joke_of_the_day

06 Aug 2009

title: Lame joke of the day date: Thu Aug 6 12:15:34 PDT 2009

"You're using the time machine for backups?"

from "A Sysadmin At CERN"

Catchup_1_of_n

05 Aug 2009

title: Catchup (1 of n) date: Wed Aug 5 20:34:13 PDT 2009 tags: ubuntu, dell

My wife used to use a Mac G4 iBook; we bought it about five years ago. It's been through two hard drive replacements, one it-might-catch-fire-so-it's-free battery replacement, one it-might-catch-fire-so-suck-it power adapter upgrade, and one OS-and-app upgrade. (This is the first time I've paid for an OS where it hurt. We lost the install disk for Panther or Mud Leopard or whatever it was, and had to buy a replacement plus a copy of iLife.) (I've also bought Slackware 96, as part of the Slackware Bible; Slackware 7, when I was amazed to see it at Chapters and figured I should support their sudden smart thinking; and OpenBSD.)

Finally, the hard drive (I think) started failing again, and we'd had enough. I'm not sure what The Right Thing To Do(tm) is for figuring out when to replace vs. when to invest in upgrades, but I'm starting to think that half its replacement cost is about right. And that's what we were up to, not least because it's a damn Mac and if you were meant to open up the case your name would be Cthulhu Morlock instead of John Doe Eloi and it turns out the Morlocks charge a lot (deservedly so, what with avoiding the sunlight and all) to do things like spend twelve hours with a team of four opening up your iBook to replace a hard drive.

So we got a Dell. My parents visited recently with their new Inspiron 13, and I installed Ubuntu on it and was surprised that a) everything worked except maybe suspend and b) holy CRAP it's easy to install Ubuntu beside Windows these days. The form factor was nice, the specs are wonderful (thank you, Vista, for making notebook specs so nice, as someone else said), so that was that.

I settled on an Inspiron 14 as it was slightly cheaper and almost the same size and seemed like it would do the trick. I came home early from work yesterday, picked it up and while Clara took the kids out shopping I wiped Vista and threw on Ubuntu (Jaunty). The hardest part was when I insisted on setting up partitions (I can see the reasons for One Bigass Partition but I'll be damned if I'll like it); that GUI is just awkward. But it was only once and it all worked.

After that, I installed Cheese for the webcam, Thunderbird for email (damn Evolution! damn it to hell, I say!), flash, set up an account for myself, ran updates and...that was it. Even suspend seems to work. Hell, at this point I can't even remember who made the wireless card; it was probably Broadcom but I didn't notice any restricted driver warning so maybe not.

It's a nice machine, if a bit large for my tastes and a bit cheap-looking. But for the price I can't complain.

In honour of the day...

31 Jul 2009

I am a) going for beer and b) actually blogging. Yay me!

Two x \"aha!\" re: Bacula

21 Jul 2009

First, it occurred to me today that the problems I've been having with bacula-sd dying or becoming unresponsive may be because of the way Nagios has monitored it. I've been using the check_tcp plugin, and when I looked on the backup machine there were, at one point, 21 connections to the sd port. Half were from the monitoring machine and were in the CLOSE_WAIT state. The max concurrent jobs for -sd is set to 20. I've turned off Nagios monitoring for now; we'll see how that does.

Second -- edit: sorry, stupid error. I withdraw the point.

Mailman_nameerror_global_name_dumperswitchboard_is_not_defined

20 Jul 2009

title: Mailman: NameError: global name 'DumperSwitchboard' is not defined date: Mon Jul 20 14:15:12 PDT 2009

I came across a problem today trying to recover a subscribers list from an old version of mailman, using a new-ish (2.1.9) version. I dug up the config.db file for the list, then ran dumpdb on it:

$ /usr/lib/mailman/bin/dumpdb -n config.db
Traceback (most recent call last):
  File "/usr/lib/mailman/bin/dumpdb", line 159, in ?
```
msg = main()
```

  File "/usr/lib/mailman/bin/dumpdb", line 126, in main
```
d = DumperSwitchboard().read(filename)
```

NameError: global name 'DumperSwitchboard' is not defined

After a bit of digging, I found this mailing list post that gave the solution:

--- bin/dumpdb  2007-06-18 08:35:57.000000000 -0700
+++ bin/dumpdb  2007-08-02 17:45:42.187500000 -0700
@@ -49,6 +49,7 @@
 import sys
 import getopt
 import pprint
+import marshal
 from cPickle import load
 from types import StringType

@@ -121,9 +122,7 @@
```
 # Handle dbs
 pp = pprint.PrettyPrinter(indent=4)
 if filetype == 1:
```

-        # BAW: this probably doesn't work if there are mixed types of .db
-        # files (i.e. some marshals, some bdbs).
-        d = DumperSwitchboard().read(filename)
+        d = marshal.load(open(filename))
```
     if doprint:
         pp.pprint(d)
     return d

```

I copied dumpdb to my home directory, patched it, then ran it like so:

PYTHONPATH=/usr/lib/mailman/bin/ ./dumpdb config.db

Bingo!

Cacti debugging

16 Jul 2009

The saga of the UPS continues. Yesterday I got the SNMP card set up and working. I also found this Cacti template, which promised lots of pretty graphs. But there were a few bumps along the way.

First, Cacti was convinced that the UPS was down. Actually, it took me a while to figure this out because the logs didn't say anything abou this host at all. Eventually I tracked it down to Cacti using SNMP queries to see if it was up; turns out that this machine doesn't like being queried at the OID 0.1, and just doesn't respond. Changing the upness-detecting algorithm (heh) to TCP ping did the trick nicely.

Next, the graphs for the UPS were still not being produced, even though the RRDs were now being updated. I got the debug info for a graph and ran the rrdtool command by hand. The RRD does not contain an RRA matching the chosen CF was the response.

This thread showed a lot of people having the same problem. Since some of these problems were fixed by an upgrade, I did so; there were a few CentOS updates waiting for that machine anyhow. That made it worse: no graphs were being shown now. rrdtool said that there were no fonts present, so maybe fontconfig was out of order. Installing dejavu-lgc-fonts did the trick nicely, and I got my graphs back.

Well, all except the UPS ones I was after in the first place. I was still getting the error about not containing the chosen CF. Well, when all else fails keep reading the forum, right?

The rrdtool command used the LAST function; this was the culprit. If I ran s/LAST/AVERAGE/g on the command, it worked a treat. Thus, one option would have been to edit the template. However, I decided on an alternate approach, suggested in the forum: I removed the UPS RRDs, went to Data Sources -> RRAs in the Console menu, selected each RRA in turn and added LAST to the consolidation function.

Finally! Whee! Except for one: the graph of voltage vs input frequency. I still don't know what this means to me, but I wasn't about to give up now.

Again, rrdtool provided the error: "For a logarithmic yaxis you must specify a lower limit > 0". Bug reports to the rescue: Console -> Graph Templates -> Voltage/Freq, and set Lower Limit to 0.1.

All that and I'm still the only one looking at these graphs. Man, I should frame them.

Zombie bacula-sd and open port

06 Jul 2009

Weird...Just ran into a problem with restarting bacula-sd. For some reason, the previous instance had died badly and left a zombie process. I restarted bacula-sd but was left with an open port:

# sudo netstat -tupan | grep 9103
tcp        0      0 0.0.0.0:9103                0.0.0.0:* LISTEN      -

which meant that bconsole hung every time it tried to get the status of bacula-sd. Unsure what to do, I tried telnetting to it for fun and then quit; after that the port was freed up and grabbed by the already-running storage daemon:

tcp        0      0 0.0.0.0:9103                0.0.0.0:* LISTEN      16254/bacula-sd

and bconsole was able to see it just fine:

Connecting to Storage daemon tape at bacula.example.com:9103

example-sd Version: 3.0.1 (30 April 2009) x86_64-example-linux-gnu example
Daemon started 06-Jul-09 10:18, 0 Jobs run since started.
 Heap: heap=180,224 smbytes=25,009 max_bytes=122,270 bufs=94 max_bufs=96
Sizes: boffset_t=8 size_t=8 int32_t=4 int64_t=8

GPT and MBR

03 Jul 2009

I've run into an interesting problem with the new backup machine.

It's a Sun X4240 with 10 x 15k disks in it: 2 x 73GB (mirrored for the OS) and 8 x, um, a bunch (250GB?), RAID0 for Bacula spooling. (I want fast disk access, so RAID0 it is.) RAID is taken care of by an onboard RAID card, so these look like regular disks to Linux.

Now the spool disk works out to about 2.2TB or so — which is big enough to make baby fdisk cry:

WARNING: The size of this disk is 2.4 TB (2391994793984 bytes).
DOS partition table format can not be used on drives for volumes
larger than 2.2 TB (2199023255040 bytes). Use parted(1) and GUID
partition table format (GPT).

Well, okay, haven't used parted before but that's no reason to hold back. I follow directions and eventually figure out that mkpart gpt ext3 0 2392G will do what I want. GPT? Piece of cake! And then I rebooted, and I couldn't boot up again. Blank screen after the POST. Crap!

The first time this happened, the reboot also coincided with some additional problems during the POST where too many cards were trying to shove their ROM into the BIOS memory (or some such); I thought the two were connected. But then I did it again today, and I finally started digging.

The problem is that parted overwrites the MBR when setting up a GPT disklabel. This has been noted and argued over. My understanding of the two sides of the debate is:

the MBR is not part of the EFI standard, so it's entirely rational that it should be erased;
but very few x86 machines are EFI-only;
and traditional disklabels don't support partitions over 2TB, so what's a brother gonna do?;
and an MBR-GPT hybrid seems a nice way out of this.

Meanwhile, the parted camp has a number of bugs dealing with this very issue, two opened a year ago, and none have any response in them.

This enterprising soul submitted a patch back in December 2008, which appears to have fallen to the floor.

As for me, I was able to convince the BIOS to boot from the smaller disk, and then get a rescue CentOS image going via PXE booting, and then reinstall grub on the smaller disk. Sorted. All I had to do was change root (hd1,0) to `root (hd0,0) in grub.conf.

A touch anti-climactic after all that, perhaps. But it was interesting a) to learn about all this (I hadn't really thought about successors to the DOS partition format before), and b) to see what a slender thread we (okay, I) hang our hopes on sometimes. It's a necessary, sobering thing to realize how much of what I use, depend on, believe in is created by volunteers who are smart, hard-working people — they argue and and focus and forget just like real people, not inhabitants of some shining city on a hill I sometimes take them for ("Next beer in Jerusalem!").

Bacula, gossip, advice

02 Jul 2009

Bacula config coming along; figured out today that /dev/nst0 corresponds to what mtx sees as Data Transfer Element 1 (as opposed to DTE 0), which explains why previous attempts to run label barcode just failed miserably. (Neat command that.) And I had thought that DTE meant the arm, but no: upon reflection, it's a subtle/obtuse (not the right word, but oh well) way of referring to the tape drive itself.
Rather interesting comment, if you like that sort of thing, from Mark Burgess (originator of Cfengine on Puppet and Luke Kanies. I know, I should remain above, but it is weirdly fascinating.
And to go out on a high note, some excellent advice from Tom Limoncelli on setting priorities as a sysadmin:

This sounds like when I was at my previous employer and they asked if I could develop a web-based system to take surveys. I nearly said, "yes" because, well, I know perl, I know CGI, and I could do it. However, I was smart enough to say "no, but surveymonkey.com will do it for cheap." Best of all it was self-service and the HR person was able to do it entirely without me. If I had said I could write such a program, it would have been days of back-and-forth changes which would have driven me crazy. Instead, she was happy to be empowered to do it herself. In fact, doing it herself without any help became a feather in her cap.

The lesson I learned is that "can I do it?" includes "do I want to do it?". If I can do something but don't want to, the answer is, "No, I don't know how" not "I know how but don't want to". The first makes you look like you know your limits. The latter sounds like you are just being difficult.

1246317421 seconds since the epoch...

29 Jun 2009

I'm back at work after a week off. The UPS control panel continues to work (!), but there is no word back from the manufacturer (says the contractor who installed the thing and filed the ticket). I find this troubling; either the manufacturer really hasn't got back to us yet (bad), or I should have insisted on being a contact for the ticket. I'll have tos ort this out tomorrow.

Spent much of my day tearing my hair out over mod_proxy_html. Turns out that, by default, it strips the DTD from the HTML it proxies; this is a problem for one app that we're proxying. Not only that, the DTDs it does support are HTML, XHTML, and either with a "Transitional"/Legacy flag — but no URI to a DTD, like the one pointing to the Loose DTD that our app uses and the damned thing threw to the floor. (Sorry, brain cells on strike today and my ability to write clearly is going downhill.)

You can specify your own DTD, including a URI (undocumented feature, whee!), and thus put back in the original — but it doesn't append a newline, there's no way to append a newline that I could figure out, and so it mushes the DTD together with the first html opening tag and makes baby Firefox cry and render the page badly.

My rule of thumb for a long time was that if I start lppooking at source code, I'm in over my head. I'm starting to think that may not be entirely true anymore, that I've advanced to the point where I can read C (say) and generally understand what's going on. But when I start looking for API documentation for Apache 2.2 (surprisingly hard to find) to find out if, say, ap_fputs or apr_pstrdup chomp newlines or something (near as I can tell, they don't), or just what AP_INIT_TAKE12 takes as arguments…well, then I am in over my head. If nothing else, I don't want to make some silly error because I don't know what the hell I'm doing. (That's not a slam against the Debian folks; I just mean that I felt shivers when I read about that, because I dread making the same sort of highly-visible, catastrophic error) (unlike the rest of the planet, you understand).

Busyness

18 Jun 2009

Full day:

Prepare new network map
Take stand-in techie around server room and explain new network setup
Check UPS; still not crashed
New Sun 4240 server unable to get past POST after hooking up fibre cable yesterday to SL-500 library. Try various things, no luck. Fortunately installers coming back next week to finish the job.
Over to server room w/boss to take pictures for website
Get programmer familiar w/the server she'll be using, how to set up services, etc. Arguably my job, but a) she'll want to learn and b) I'm off on vacation next week.
Unless of course the UPS folks need to schedule downtime to make it work. But then I'll just use it as an excuse to show my dad and kids around the server room.
Gotta pick out an IPA recipe to brew with my dad. Leaning toward the Cream IPA from Radical Brewing. May need to get a cooler to use as a lauter tun, since I think it's around 13 pounds of grain — more than I can comfortably do in my paint bag strainer setup.
Still got out to walk around at lunch time, which was nice; I have a bad habit of skipping that.

Now \that's\ irritating...

16 Jun 2009

Just discovered, while trying to test the mail server at $WORK, that my ISP filters outgoing port 25. I'd give them a call but I can't dig up my account info at the moment.

Once more, with feeling:

15 Jun 2009

Dress rehearsal includes checking to see if you can, in fact, unrack something. I was uanble to move a switch this morning because it was stuck behind a PDU. Arghh.

The saga of our crashing UPS continues. The techs came out to visit this morning, which meant I needed to schedule downtime so they could bypass the UPS manually. They were unable to find any smoking gun (or capacitors), and need to confer with HQ again. Best case: the UPS control panel continues to work, and they can do the next round of work w/o a manual bypass. Worst case: the control panel crashes again, and we schedule another round of downtime.

Rack tip

12 Jun 2009

If you have space for two PDUs and you put one on each side of the rack, you will have no separate space for network cables and you'll get interference. If you put those two PDUs on one side of the rack, you'll put it on the wrong side and your power cords will interfere with your network cables. If you put those two PDUs on the correct side of the rack, you'll find that racking new items is a pain because the cords block the post holes on that side.

Tour, FC

11 Jun 2009

Gave a tour of the new server room today to about 30-odd people in the department. Ended on a bit of a low note ("and that's the end! Any questions?") but other than that it went well. Even got an ounce of champagne at the end of it.

Oh, and yesterday I found out that our SL-500 has three fibre channel interfaces, compared to the one interface in the server we bought. I think the sales folks assumed we had a fibre switch, and I didn't realize it all (data + control) wouldn't go over one cable. Arghh.

Just saw a character named Terence on "Entourage" who was not Terrance Stamp. Now I want to see "Bowfinger" and "The Limey", in that order.