07 Sep 2009
Just spent the better part of five hours cleaning up four old,
out-of-date Wordpress installations after they got infected with this
worm. I host nine sites on my home server for friends and
family; I'm cutting that down to three (just family), and maybe
looking at mu-wordpress, as of Real Soon Now.
Happy Labour Day, everyone!
Update: I meant to add in here a few things I looked for, because
this info was hard to track down.
I found extra admin-level users in the wp_users table; some had
their email address set to "www@www.com", some had random made-up or
possibly real addresses, and some had the same email address as
already-existing users.
On one blog (possibly infected much earlier) I found 42,000 (!!)
approved, spammy comments.
I searched for infected posts using a query from here:
SELECT * FROM wp_posts WHERE post_content LIKE '%iframe%'
UNION
SELECT * FROM wp_posts WHERE post_content LIKE '%noscript%'
UNION
SELECT * FROM wp_posts WHERE post_content LIKE '%display:%'
Tags:
bug
crackers
spam
04 Sep 2009
title: mmm_mysql
date: Fri Sep 4 15:09:07 PDT 2009
tags: bugs, mysql
I've spent many hours today at $WORK banging my head against the
keyboard, trying to figure out why MMM-MySQL didn't work. MMM is
meant to switch write roles, or master-slave roles, among different
database servers for failover and such.
While the task as a whole is complex, the steps are simple enough: the
monitor daemon accepts commands from a client, then forwards those
commands to agents on the different MySQL servers. At its heart it's
a bunch of Perl scripts that do the things this task entails:
switching IP addresses, sending arp packets, toggling write-only
status on the databases, and so on.
The problem came when, for example, the monitor would tell everyone to
change their IP addresses and report success -- only I could see that
wasn't working. Or the agent would run the command to turn the
database write-only and report success, yet I could see that it wasn't
working.
There were two factors at work here.
In the latter example, the agent would run the command
bin/mysql_allow_write
. Here's the relevant bit of code, edited for clarity:
# Read config file and status
our $config = ReadConfig("mmm_agent.conf");
print MySqlAllowWrite();
exit(0);
sub MySqlAllowWrite($) {
[snip]
# connect to server
my $dsn = "DBI:mysql:host=$host;port=$port";
my $dbh = DBI->connect($dsn, $user, $pass, { PrintError => 0 });
return "ERROR: Can't connect to MySQL (host = $host:$port, user = $user)!" unless ($dbh);
# set read_only to OFF
(my $read_only) = $dbh->selectrow_array(q{select @@read_only});
return "ERROR: SQL Query Error: " . $dbh->errstr unless (defined $read_only);
return "OK" unless ($read_only);
my $sth = $dbh->prepare("set global read_only=0");
my $res = $sth->execute;
return "ERROR: SQL Query Error: " . $dbh->errstr unless($res);
$sth->finish;
$dbh->disconnect();
$dbh = undef;
return "OK";
}
The subroutine is reporting errors but nothing watches for them. The
code that calls the script itself just uses backticks and does no checking:
sub ExecuteBin {
my $command = shift;
my $params = shift;
my $return_all = shift;
my $path = "$config->{bin_path}/$command";
return undef unless (-x $path);
LogDebug("Core: Execute_bin('$path $params')");
my $res = `$path $params`;
unless ($return_all) {
my @lines = split /\n/, $res;
return pop(@lines);
}
return $res
}
The code to change IP address is much the same:
sub AddInterfaceIP($$) {
my $if = shift;
my $ip = shift;
if ($^O eq 'linux') {
`/sbin/ip addr add $ip/32 dev $if`;
} elsif ($^O eq 'solaris') {
`/usr/sbin/ifconfig $if addif $ip`;
my $logical_if = FindSolarisIF($ip);
unless ($logical_if) {
print "ERROR: Can't find logical interface with IP = $ip\n";
exit(1);
}
`/usr/sbin/ifconfig $logical_if up`;
} else {
print "ERROR: Unsupported platform!\n";
exit(1);
}
}
Needless to say I'll be filing bug reports.
The other factor that was going on was my ignorance about the tools I
was using. I couldn't figure out why the ip addr add
and ip addr
del
commands weren't working. The agent would report success adding
addresses, yet ifconfig
didn't show them. What I didn't realize was
that ip
can manipulate addresses that ifconfig
doesn't seem to
see. With ifconfig
, you add an additional address to an interface
like so:
and you see a new device called eth0:0
. But with ip
, you do that
like so:
ip add 10.0.0.2/32 dev eth0
and you don't see additional devices and ifconfig
doesn't see
the additional address. I wasn't thinking hard enough about what I
meant by "I can see that it doesn't work" -- something I'm all to
prone to take other people to task for (or at least act smugly about).
Ah well...the good news is that I learned something. The other good
news is that, since at least a couple of these errors are in the
latest versions of mmm_control, I should be able to spend some time at
work improving them. Hasta la source, baby! (Or something like that...)
Tags:
02 Sep 2009
title: zypper problems
date: Wed Sep 2 10:21:53 PDT 2009
tags: opensuse, packagemanagement
I recently ran into problems with a home-made repo for OpenSuSE.
(Weirdly enough, this seems to have cropped up after the repo was
already in use.) When I tried to install a package from the repo, I
got this error:
Retrieving repository 'foo' metadata [error]
Repository 'foo' is invalid.
File /var/tmp/TmpFile.0aLr5H doesn't contain public key data
Please check if the URIs defined for this repository are pointing to a
valid repository.
Warning: Disabling repository 'foo' because of the above error.
There wasn't much to find about this problem; even re-installing
the key didn't help. Finally I thought to look in the webserver logs,
where I found this:
[Wed Sep 02 09:59:59 2009] [error] [client 10.0.0.1] File
does not exist:
/var/www/repo/opensuse/11.1/x86_64/repodata/repomd.xml.key
That led to this article, and the solution was easy:
gpg -a --export "Repository Key" > /var/www/repo/opensuse/11.1/x86_64/repodata/repomd.xml.key
Sweet!
In other news: my tortilla filled with fried rice is falling apart.
Film at 11.
Tags:
21 Aug 2009
title: Migratin'
date: Fri Aug 21 13:58:30 PDT 2009
tags: meta, lisa
Heyo...I've finally migrated to Chronicle and switched the
website to ikiwki. Things should be working, aside from a few
links I'll be cleaning up as time goes on...however, if you notice
anything truly wrong please drop a line. (The comment system is no
longer email-based, btw.)
And in the interest of keeping this on-topic...looks like work may be
sending me to LISA! Here's hoping...
Tags:
12 Aug 2009
title: Emacs tip o' the day
date: Wed Aug 12 09:42:15 PDT 2009
tags: emacs
Thanks to Planet Emacs, I came across this tip about tramp:
turns out there's a sudo
method for tramp. C-c C-r
will now
re-open a file using sudo. Sweet!
Tags:
07 Aug 2009
title: Registration for LISA '09 now open!
date: Fri Aug 7 12:44:34 PDT 2009
tags: lisa
The details on LISA '09 are finally up, and it looks good. Let's
hope I can convince $WORK to send me there...
Tags:
06 Aug 2009
title: Waste (2 of n)
date: Thu Aug 6 21:05:14 PDT 2009
tags: secondthoughts
Tuesday I wrote (and yesterday I posted):
It's a nice machine, if a bit large for my tastes and a bit
cheap-looking. But for the price I can't complain.
which segues nicely into my recent feelings of guilt about my
choice of profession. This was prompted by two things:
Tim O'Reilly's mention of Fake Steve Jobs post entitled
"I'm
really thinking maybe I shouldn't have yelled at that Chinese guy so
much" which is a righteous, needed kick in the ass. You should
read it.
Meeting with a vendor about a possible purchase of a bunch of blades
for a small cluster. When I asked him about the possibility of
upgrading in a couple years, he shook his head and waved his hand
dismissively, saying, "In two years it's toxic waste anyway." This
was from a vendor whose website trumpets the fact that their blade
chassis still fits almost every blade they've ever made. But he's
right.
Right now it seems to me that in a very important sense, the best
thing you can say about what I do for a living is that, in geological
time scales, it really doesn't matter.
I don't know what the answer is. I'm not entirely sure what the
question is. But with the new laptop, and mine back (it had been
doing double-duty for the two of us), this is the first chance I've
had in a while to spend my commute writing. I missed it.
Tags:
06 Aug 2009
title: Lame joke of the day
date: Thu Aug 6 12:15:34 PDT 2009
"You're using the time machine for backups?"
from "A Sysadmin At CERN"
Tags:
05 Aug 2009
title: Catchup (1 of n)
date: Wed Aug 5 20:34:13 PDT 2009
tags: ubuntu, dell
My wife used to use a Mac G4 iBook; we bought it about five years ago.
It's been through two hard drive replacements, one
it-might-catch-fire-so-it's-free battery replacement, one
it-might-catch-fire-so-suck-it power adapter upgrade, and one
OS-and-app upgrade. (This is the first time I've paid for an OS where
it hurt. We lost the install disk for Panther or Mud Leopard or
whatever it was, and had to buy a replacement plus a copy of iLife.)
(I've also bought Slackware 96, as part of the Slackware Bible;
Slackware 7, when I was amazed to see it at Chapters and figured I
should support their sudden smart thinking; and OpenBSD.)
Finally, the hard drive (I think) started failing again, and we'd had
enough. I'm not sure what The Right Thing To Do(tm) is for figuring
out when to replace vs. when to invest in upgrades, but I'm starting
to think that half its replacement cost is about right. And that's
what we were up to, not least because it's a damn Mac and if you were
meant to open up the case your name would be Cthulhu Morlock instead
of John Doe Eloi and it turns out the Morlocks charge a lot
(deservedly so, what with avoiding the sunlight and all) to do things
like spend twelve hours with a team of four opening up your iBook to
replace a hard drive.
So we got a Dell. My parents visited recently with their new Inspiron
13, and I installed Ubuntu on it and was surprised that a) everything
worked except maybe suspend and b) holy CRAP it's easy to install
Ubuntu beside Windows these days. The form factor was nice, the specs
are wonderful (thank you, Vista, for making notebook specs so nice, as
someone else said), so that was that.
I settled on an Inspiron 14 as it was slightly cheaper and almost the
same size and seemed like it would do the trick. I came home early
from work yesterday, picked it up and while Clara took the kids out
shopping I wiped Vista and threw on Ubuntu (Jaunty). The hardest part
was when I insisted on setting up partitions (I can see the reasons
for One Bigass Partition but I'll be damned if I'll like it); that
GUI is just awkward. But it was only once and it all worked.
After that, I installed Cheese for the webcam, Thunderbird for email
(damn Evolution! damn it to hell, I say!), flash, set up an account
for myself, ran updates and...that was it. Even suspend seems to
work. Hell, at this point I can't even remember who made the wireless
card; it was probably Broadcom but I didn't notice any restricted
driver warning so maybe not.
It's a nice machine, if a bit large for my tastes and a bit
cheap-looking. But for the price I can't complain.
Tags:
31 Jul 2009
I am a) going for beer and b) actually blogging. Yay me!
Tags:
meta
21 Jul 2009
First, it occurred to me today that the problems I've been having with bacula-sd dying or becoming unresponsive may be because of the way Nagios has monitored it. I've been using the check_tcp
plugin, and when I looked on the backup machine there were, at one point, 21 connections to the sd port. Half were from the monitoring machine and were in the CLOSE_WAIT
state. The max concurrent jobs for -sd is set to 20. I've turned off Nagios monitoring for now; we'll see how that does.
Second -- edit: sorry, stupid error. I withdraw the point.
Tags:
backups
20 Jul 2009
title: Mailman: NameError: global name 'DumperSwitchboard' is not defined
date: Mon Jul 20 14:15:12 PDT 2009
I came across a problem today trying to recover a subscribers list from an old version of mailman, using a new-ish (2.1.9) version. I dug up the config.db file for the list, then ran dumpdb on it:
$ /usr/lib/mailman/bin/dumpdb -n config.db
Traceback (most recent call last):
File "/usr/lib/mailman/bin/dumpdb", line 159, in ?
```
msg = main()
```
File "/usr/lib/mailman/bin/dumpdb", line 126, in main
```
d = DumperSwitchboard().read(filename)
```
NameError: global name 'DumperSwitchboard' is not defined
After a bit of digging, I found this mailing list post that gave the solution:
--- bin/dumpdb 2007-06-18 08:35:57.000000000 -0700
+++ bin/dumpdb 2007-08-02 17:45:42.187500000 -0700
@@ -49,6 +49,7 @@
import sys
import getopt
import pprint
+import marshal
from cPickle import load
from types import StringType
@@ -121,9 +122,7 @@
```
# Handle dbs
pp = pprint.PrettyPrinter(indent=4)
if filetype == 1:
```
- # BAW: this probably doesn't work if there are mixed types of .db
- # files (i.e. some marshals, some bdbs).
- d = DumperSwitchboard().read(filename)
+ d = marshal.load(open(filename))
```
if doprint:
pp.pprint(d)
return d
```
I copied dumpdb to my home directory, patched it, then ran it like so:
PYTHONPATH=/usr/lib/mailman/bin/ ./dumpdb config.db
Bingo!
Tags:
16 Jul 2009
The saga of the UPS continues. Yesterday I got the SNMP card set up
and working. I also found this Cacti template, which promised
lots of pretty graphs. But there were a few bumps along the way.
First, Cacti was convinced that the UPS was down. Actually, it took me
a while to figure this out because the logs didn't say anything abou
this host at all. Eventually I tracked it down to Cacti using SNMP
queries to see if it was up; turns out that this machine doesn't like
being queried at the OID 0.1, and just doesn't respond. Changing the
upness-detecting algorithm (heh) to TCP ping did the trick
nicely.
Next, the graphs for the UPS were still not being produced, even
though the RRDs were now being updated. I got the debug info for a
graph and ran the rrdtool
command by hand. The RRD does not contain
an RRA matching the chosen CF
was the response.
This thread showed a lot of people having the same problem. Since
some of these problems were fixed by an upgrade, I did so; there were
a few CentOS updates waiting for that machine anyhow. That made it
worse: no graphs were being shown now. rrdtool
said that there were
no fonts present, so maybe fontconfig was out of order. Installing
dejavu-lgc-fonts
did the trick nicely, and I got my graphs back.
Well, all except the UPS ones I was after in the first place. I was
still getting the error about not containing the chosen CF. Well, when
all else fails keep reading the forum, right?
The rrdtool
command used the LAST function; this was the culprit. If
I ran s/LAST/AVERAGE/g
on the command, it worked a treat. Thus, one
option would have been to edit the template. However, I decided on an
alternate approach, suggested in the forum: I removed the UPS RRDs,
went to Data Sources -> RRAs in the Console menu, selected each RRA
in turn and added LAST to the consolidation function.
Finally! Whee! Except for one: the graph of voltage vs input
frequency. I still don't know what this means to me, but I wasn't
about to give up now.
Again, rrdtool
provided the error: "For a logarithmic yaxis you must
specify a lower limit > 0". Bug reports to the rescue: Console
-> Graph Templates -> Voltage/Freq, and set Lower Limit to 0.1.
All that and I'm still the only one looking at these graphs. Man, I
should frame them.
Tags:
web
monitoring
06 Jul 2009
Weird...Just ran into a problem with restarting bacula-sd
. For some reason, the previous instance had died badly and left a zombie process. I restarted bacula-sd
but was left with an open port:
# sudo netstat -tupan | grep 9103
tcp 0 0 0.0.0.0:9103 0.0.0.0:* LISTEN -
which meant that bconsole
hung every time it tried to get the status of bacula-sd
. Unsure what to do, I tried telnetting to it for fun and then quit; after that the port was freed up and grabbed by the already-running storage daemon:
tcp 0 0 0.0.0.0:9103 0.0.0.0:* LISTEN 16254/bacula-sd
and bconsole
was able to see it just fine:
Connecting to Storage daemon tape at bacula.example.com:9103
example-sd Version: 3.0.1 (30 April 2009) x86_64-example-linux-gnu example
Daemon started 06-Jul-09 10:18, 0 Jobs run since started.
Heap: heap=180,224 smbytes=25,009 max_bytes=122,270 bufs=94 max_bufs=96
Sizes: boffset_t=8 size_t=8 int32_t=4 int64_t=8
Tags:
networking
backups
03 Jul 2009
I've run into an interesting problem with the new backup machine.
It's a Sun X4240 with 10 x 15k disks in it: 2 x 73GB (mirrored for the
OS) and 8 x, um, a bunch (250GB?), RAID0 for Bacula spooling. (I want
fast disk access, so RAID0 it is.) RAID is taken care of by an onboard
RAID card, so these look like regular disks to Linux.
Now the spool disk works out to about 2.2TB or so — which is big
enough to make baby fdisk
cry:
WARNING: The size of this disk is 2.4 TB (2391994793984 bytes).
DOS partition table format can not be used on drives for volumes
larger than 2.2 TB (2199023255040 bytes). Use parted(1) and GUID
partition table format (GPT).
Well, okay, haven't used parted before but that's no reason to hold
back. I follow directions and eventually figure out that mkpart gpt
ext3 0 2392G
will do what I want. GPT? Piece of cake! And then I
rebooted, and I couldn't boot up again. Blank screen after the
POST. Crap!
The first time this happened, the reboot also coincided with some
additional problems during the POST where too many cards were
trying to shove their ROM into the BIOS memory (or some such); I
thought the two were connected. But then I did it again today, and I
finally started digging.
The problem is that parted overwrites the MBR when setting up a GPT
disklabel. This has been noted and argued over. My
understanding of the two sides of the debate is:
- the MBR is not part of the EFI standard, so it's entirely rational that it should be erased;
- but very few x86 machines are EFI-only;
- and traditional disklabels don't support partitions over 2TB, so what's a brother gonna do?;
- and an MBR-GPT hybrid seems a nice way out of this.
Meanwhile, the parted camp has a number of bugs
dealing with this very issue, two opened a year ago, and none
have any response in them.
This enterprising soul submitted a patch back in December 2008,
which appears to have fallen to the floor.
As for me, I was able to convince the BIOS to boot from the smaller
disk, and then get a rescue CentOS image going via PXE booting, and
then reinstall grub on the smaller disk. Sorted. All I had to do was
change root (hd1,0)
to `root (hd0,0) in grub.conf.
A touch anti-climactic after all that, perhaps. But it was interesting
a) to learn about all this (I hadn't really thought about successors
to the DOS partition format before), and b) to see what a slender
thread we (okay, I) hang our hopes on sometimes. It's a necessary,
sobering thing to realize how much of what I use, depend on, believe
in is created by volunteers who are smart, hard-working people —
they argue and and focus and forget just like real people, not
inhabitants of some shining city on a hill I sometimes take them for
("Next beer in Jerusalem!").
Tags:
backups
hardware
linux
02 Jul 2009
- Bacula config coming along; figured out today that /dev/nst0 corresponds to what mtx sees as Data Transfer Element 1 (as opposed to DTE 0), which explains why previous attempts to run
label barcode
just failed miserably. (Neat command that.) And I had thought that DTE meant the arm, but no: upon reflection, it's a subtle/obtuse (not the right word, but oh well) way of referring to the tape drive itself.
- Rather interesting comment, if you like that sort of thing, from Mark Burgess (originator of Cfengine on Puppet and Luke Kanies. I know, I should remain above, but it is weirdly fascinating.
- And to go out on a high note, some excellent advice from Tom Limoncelli on setting priorities as a sysadmin:
This sounds like when I was at my previous employer and they asked if
I could develop a web-based system to take surveys. I nearly said,
"yes" because, well, I know perl, I know CGI, and I could do it.
However, I was smart enough to say "no, but surveymonkey.com will do
it for cheap." Best of all it was self-service and the HR person was
able to do it entirely without me. If I had said I could write such a
program, it would have been days of back-and-forth changes which would
have driven me crazy. Instead, she was happy to be empowered to do it
herself. In fact, doing it herself without any help became a feather
in her cap.
The lesson I learned is that "can I do it?" includes "do I want to do
it?". If I can do something but don't want to, the answer is, "No, I
don't know how" not "I know how but don't want to". The first makes
you look like you know your limits. The latter sounds like you are
just being difficult.
Tags:
backups
reading
cfengine
29 Jun 2009
I'm back at work after a week off. The UPS control panel continues to
work (!), but there is no word back from the manufacturer (says the
contractor who installed the thing and filed the ticket). I find this
troubling; either the manufacturer really hasn't got back to us yet
(bad), or I should have insisted on being a contact for the
ticket. I'll have tos ort this out tomorrow.
Spent much of my day tearing my hair out over
mod_proxy_html. Turns out that, by default, it strips the DTD
from the HTML it proxies; this is a problem for one app that we're
proxying. Not only that, the DTDs it does support are HTML, XHTML,
and either with a "Transitional"/Legacy flag — but no URI to a DTD,
like the one pointing to the Loose DTD that our app uses and the
damned thing threw to the floor. (Sorry, brain cells on strike today
and my ability to write clearly is going downhill.)
You can specify your own DTD, including a URI (undocumented
feature, whee!), and thus put back in the original — but it doesn't
append a newline, there's no way to append a newline that I could
figure out, and so it mushes the DTD together with the first html
opening tag and makes baby Firefox cry and render the page badly.
My rule of thumb for a long time was that if I start lppooking at
source code, I'm in over my head. I'm starting to think that may not
be entirely true anymore, that I've advanced to the point where I can
read C (say) and generally understand what's going on. But when I
start looking for API documentation for Apache 2.2 (surprisingly hard
to find) to find out if, say, ap_fputs
or apr_pstrdup
chomp
newlines or something (near as I can tell, they don't), or just what
AP_INIT_TAKE12
takes as arguments…well, then I am in over my
head. If nothing else, I don't want to make some silly error
because I don't know what the hell I'm doing. (That's not a slam
against the Debian folks; I just mean that I felt shivers when I read
about that, because I dread making the same sort of highly-visible,
catastrophic error) (unlike the rest of the planet, you understand).
Tags:
hardware
web
programming
18 Jun 2009
Full day:
- Prepare new network map
- Take stand-in techie around server room and explain new network setup
- Check UPS; still not crashed
- New Sun 4240 server unable to get past POST after hooking up fibre cable yesterday to SL-500 library. Try various things, no luck. Fortunately installers coming back next week to finish the job.
- Over to server room w/boss to take pictures for website
- Get programmer familiar w/the server she'll be using, how to set up services, etc. Arguably my job, but a) she'll want to learn and b) I'm off on vacation next week.
- Unless of course the UPS folks need to schedule downtime to make it work. But then I'll just use it as an excuse to show my dad and kids around the server room.
- Gotta pick out an IPA recipe to brew with my dad. Leaning toward the Cream IPA from Radical Brewing. May need to get a cooler to use as a lauter tun, since I think it's around 13 pounds of grain — more than I can comfortably do in my paint bag strainer setup.
- Still got out to walk around at lunch time, which was nice; I have a bad habit of skipping that.
Tags:
networking
hardware
beer
16 Jun 2009
Just discovered, while trying to test the mail server at $WORK, that
my ISP filters outgoing port 25. I'd give them a call but I can't dig
up my account info at the moment.
Tags:
networking
15 Jun 2009
Dress rehearsal includes checking to see if you can, in fact, unrack
something. I was uanble to move a switch this morning because it was
stuck behind a PDU. Arghh.
The saga of our crashing UPS continues. The techs came out to visit
this morning, which meant I needed to schedule downtime so they could
bypass the UPS manually. They were unable to find any smoking gun (or
capacitors), and need to confer with HQ again. Best case: the UPS
control panel continues to work, and they can do the next round of
work w/o a manual bypass. Worst case: the control panel crashes again,
and we schedule another round of downtime.
Tags:
hardware
serverroom