$USER1$/check_snmp -H $HOSTADDRESS$ -o UCD-SNMP-MIB::dskErrorMsg.$ARG1$ -s ""
After a lot of faffing about, I've accomplished the following on the backup server at $WORK:
Broken out /var/lib/mysql to a separate, mirrored, Linux software
raid-1 275 GB partition; it's using about 36 GB of that at the
moment, which is 15% -- the lowest it's been in a long, LONG-ass
time.
Migrated the Bacula catalog db to Innodb.
Shrunk the raid-0 spool partition to about 1.6 TB, down from 2
TB; did this to free up the two disks for the mirrored partition
Ensured that MySQL will use /dev/shm as a temporary area
Sped up the restoration of files (which was mostly because of
earlier "analyze" commands on the File table while it was still
MyISAM)
innodbfilepertable is on; innodbbufferpoolsize=10G; ``` defaultstorageengine=InnoDB
I encountered the following problems:
* The stupid raid card in the backup server only supports two RAID drives --
thus, the mirrored drive for /var/lib/mysql is Linux software raid. I'd have preferred to keep things consistent, but it was not to be. ```
The many "analyze" and "repair" steps took HOURS...only to turn
out to be deadlocked because it was running out of tmp space.
I had to copy the mysql files to the raid-0 drive to have enough
space to do the conversion.
Knock-on effects included lack of sleep and backups not being run
last night
Basically, this took a lot of tries to get right, and about all ``` of my time for the last 24 hours.
I learned:
* The repair of the File table while MyISAM, with tmp files in
/dev/shm, took about 15 minutes. That compares with leaving it overnight and still not having it done. ```
You have to watch the mysql log file for errors about disk space,
and/or watch df -h to see /tmp or whatever fill up.
You can interrupt a repair and go back to it afterward if you
have to. At least, I was able to...I wouldn't do it on a regular
basis, but it gives me cautious optimism that it's not an
automatic ticket to backups.
Importing the File.sql file (nominally 18 GB but du shows 5 ``` GB...sparse?), which converted it to InnoDB, took 2.5 hours.
I still have to do these things:
* Update documentation.
* Update Bacula checks to include /var/lib/mysql.
* Perhaps up pool_size to 20 GB.
* Set up a slave server again.
* A better way of doing this might've been to set up LVM on md0, then use snapshots for database backup.
* Test with a reboot! Right now I need to get some sleep.
A while back I upgraded the MySQL for Bacula at $WORK. I tested it afterward to make sure it worked well, but evidently not thoroughly enough: I discovered today that the query needed to restore a file was taking forever. I lost patience at 30 minutes and killed the query; fortunately, I was able to find another way to get the file from Bacula. This is a big slowdown from previous experience. Time for some benchmarking and exploratory tweaking...
Incidentally, the faster way to get the file was to select "Enter a list of files to restore", rather than let it build the directory tree for the whole thing. The fileset in question is not that big, so I think I must be doing something pretty damned wrong to get MySQL so slow.
On Tuesday I attempted migrating the Bacula database at work from MyISAM to InnoDB. In the process, I was also hoping to get the disk space down on the /var partition where the tables resided; I was runnig out of room. My basic plan was:
Here was the shell script I used to split the dump file, change the engine, and reload the tables:
csplit -ftable /net/conodonta-private/export/bulk/bacula/bacula.sql '/DROP TABLE/' {*}
sed -i 's/ENGINE=MyISAM/ENGINE=InnoDB/' table*
for i in table* ; do mv $i $(head -1 $i | awk '{print $NF}' | tr -d '`' | sed -e's/;/.sql/') ; done
for i in $(du *sql | sort -n | awk '{print $NF}') ; do echo $i; mysql -u bacula -ppassword bacula < $i ; done
(This actually took a while to settle on, and I should have done this part of things well in advance.)
Out of 31 tables, all but three were trivially small; the big ones are Path, Filename and File. File in turn is huge compared with the others.
I had neglected to turn off binary logging, so the partition kept filling up with binary logs...which took me more than a few runs through to figure out. Eventually, late in the day, I switched the engines back to MyISAM and reloaded. That brought disk space down to 64% (from 85%). This was okay, but it was a stressful day and one that I'd brought on myself for now preparing well.
When next I do this, I will follow this sequence:
How to find out which MySQL engine you're using for a particular database or table? Run this query:
SELECT table_schema, table_name, engine FROM INFORMATION_SCHEMA.TABLES;
Thanks to Electric Toolbox for the answer.
A long-standing project at $WORK is to move the website to a new server. I'm also using it as a chance to get our website working under SELinux, rather than just automatically turning it off. There's already one site on this server, running Wordpress, and I decided to get serious about migrating the other website, which runs Drupal.
First time I fired up Drupal, I got this error:
avc: denied { name_connect } for pid=30789 comm="httpd" dest=3306
scontext=system_u:system_r:httpd_t:s0
tcontext=system_u:object_r:mysqld_port_t:s0 tclass=tcp_socket
As documented here, the name_connect
permission allows you to
name sockets ("these are the mysql sockets, these are the SMTP
sockets...") and set permissions that way. Okay, so now
we're getting a note that prevented Drupal from working because
SELinux has denied httpd access to the mysqld TCP port.
What suprised me is that the Wordpress site did not seem to be encountering this error. The two relevant parts of the config files are:
$db_url = 'mysqli://user:password@127.0.0.1/database';
define('DB_NAME', 'wp_db');
define('DB_USER', 'wp_db_user');
define('DB_PASSWORD', 'password');
define('DB_HOST', 'localhost');
Hm, the only difference is that localhost-vs-127.0.0.1 thing...
After some digging, it appears to be PHP's mysqli at work. From the documentation:
host: Can be either a host name or an IP address. Passing the NULL value or the string "localhost" to this parameter, the local host is assumed. When possible, pipes will be used instead of the TCP/IP protocol.
See the difference? Without looking up the code for mysqli, I think that an IP address -- even 127.0.0.1 -- makes mysqli just try TCP connections; using "localhost" makes it try a named pipe first. Since TCP connections to the MySQL port apparently aren't allowed by default CentOS SELinux policy, the former fails.
Solution: make it "localhost" in both, and remember not to make assumptions.
Just compacted the Bacula catalog, which we keep in MySQL, as the partition it was on was starting to run out of space. (Seriously, 40 GB isn't enough?)
First thing I tried was running "optimize table" on the File table; that saved me 3 GB and took about 15 minutes. After that, I ran mysqldump and reloaded the db; that saved me another 300 MB and took closer to 30 minutes. Lesson: "optimize table" does just fine.
Thursday morning was the keynote from Dr. Irving Wladawsky-Berger at IBM. His memories of Linux ascendancy were interesting...possibly because of the cheerleading/"We would simply prevail" feeling I felt. But his speculation on what would come was fuzzy and handwavy...slides with things like "Smart retail / Smart traffic/ Smart cities / Smart regions / Smart planet / Intelligent oil field technology" (wait, what happened to smart?) and graphs of Efficiency vs. Transformation, with a handy downward-sloping line delineating "Reinventing Business" from "Rethinking IT", just made THE RAGE come on.
The HP speech that came after wasn't much better, so I ducked out after five minutes...perhaps a mistake, in retrospective. I will say, though, that it amazes me that multitasking, in 2011, is something to brag about.
Next up was the presentation from IBM on "Improving Storage in KVM-based clouds". Despite teh buzzwords, it boiled down to an interesting war story about debugging crappy FS performance, from verifying ("Yes, the users are right when they say it sucks") to fixes ("This long-term kernel project will add the feature we need to stop sucking!"). If I can find the slides, I highly recommend reading them...there's a lot of practical advice in there.
Next up was a presentation by the mysteriously-employed Christoph on Linux in the world of finance. It was a short presentation -- a lot of presentations at LinuxCon have been short -- but he made up for it with a lively Q&A afterward. (To be fair, he explained at the beginning that he was used to a much more hostile/loud audience and a much more interactive presentation style, and actively solicited questions.)
Right, so: Linux is used in finance a lot, because it's fast and very, very tweakable. He describes this as "Linux hotrodding", that seems to capture the attitude very well. Sadly, a lot of this stays in-house because these tweaks are considered part of the "secret sauce" that makes them money.
I asked if the traders were involved in the technical side of things, or if it was more like "Let me know when my brilliant algorithm is sufficiently fast." Answer: no, traders are very, very technical (some give keynotes at tech conferences), and there is very tight integration between the two. I asked if the culture was as loud, macho and aggressive as the stereotype. Answer: yes. Someone asked why Solaris usage had declined. Answer: neither traders ("You got bought! You're a loser!") nor techies ("Oracle kills MySQL and puppies!") liked Oracle buying Sun.
And now for an opposing view.
I spoke after the talk to three sysadmins from the same trading company, and they disputed some of Christoph's points. First, their company contributes back to open source/Free software; their CTO says it's a moral imperative. They've open-sourced their own trading software, though not the algorithms ("algos" if you're a trader type) that make them money. They admit that this makes their company unusual; in their industry, secrecy is the rule.
Second, they said the culture varies from company to company, and that anyhow it's very different now that MIT PhDs and such are being hired. It's not all "Wall Street".
And one bit they confirmed: hotrodding. Things like overclocking their chips -- but to the degree that the vendors phone them up to say "You'll burn out your CPU in a week!" Response: "Okay." Because it'll make more money in the first hour it's running than the CPU costs.
I had lunch with Chris, who I used to work with, and caught up on everything. Then I hung out in the vendor area a bit. The PandaBoard was neat: Ubuntu 10.10, playing a 1080p movie trailer and drawing less than two watts. Incredible.
I buttonholed the FreeIPA guy; complimented him on the talk, and asked some questions. Master-slave in FreeIPA LDAP server? No, multi-master only. Doesn't that make you nervous? No. Doesn't keeping config information for the LDAP server in LDAP, rather than a plain text file, make you nervous? Shrug; if you can't read LDAP, you're probably hosed anyway. Oh, and btrfs is coming to Fedora 17, probably RHEL 7. Doesn't that make you nervous? No. (Conclusion for the home listeners: I am a misinformed worrywart.)
And Rik van Riel was there, but I forgot to hug him.
In the afternoon I went to a two-hour introduction to KVM-based virtualization. This was excellent; while I'm using KVM at the moment, I'm not familiar with the tools available. (Which probably means I shouldn't be using it....) He covered tools like virt-p2v, KSM, and how to monitor performance of VMs from the host, even if you don't have root privileges. Good stuff.
Xmas vacation is when I get to do big, disruptive maintenance with a fairly free hand. Here's some of what I did and what I learned this year.
I made the mistake of rebooting one machine first: the one that held the local CentOS mirror. I did this thinking that it would be a good guinea pig, but then other machines weren't able to fetch updates from it; I had to edit their repo files. Worse, there was no remote console on it, and no time (I thought) to take a look.
Last year I tried getting machines to upgrade using Cfengine like so:
centos.some_group_of_servers.Hr14.Day29.December.Yr2009::
"/usr/bin/yum -q -y clean all"
"/usr/bin/yum -q -y upgrade"
"/usr/bin/reboot"
This didn't work well: I hadn't pushed out the changes in advance, because I was paranoid that I'd miss something. When I did push it out, all the machines hit on the cfserver at the same time (more or less) and didn't get the updated files because the server was refusing connections. I ended up doing it by hand.
This year I pushed out the changes in advance, but it still didn't work because of the problems with the repo. I ran cssh, edited the repos file and updated by hand.
This worked okay, but I had to do the machines in separate batches -- some needed to have their firewall tweaked to let them reach a mirror in the first place, some I wanted to watch more carefully, and so on. That meant going through a list of machines, trying to figure out if I'd missed any, adding them by hand to cssh sessions, and so on.
I may need to give in and look at RHEL, or perhaps func or better Cfengine tweaking will do the job.
Quick and dirty way to make sure you don't overload your PDUs:
sleep $(expr $RANDOM / 200 ) && reboot
Rebooting one server took a long time because the ILOM was not working well, and had to be rebooted itself.
Upgrading the database servers w/the 3 TB arrays took a long time: stock MySQL packages conflicted with the official MySQL rpms, and fscking the arrays takes maybe an hour -- and there's no sign of life on the console while you're doing it. Problems with one machine's ILOM meant I couldn't even get a console for it.
Holy mother of god, what an awful time this was. I spent eight hours on upgrades for just nine desktop machines. Sadly, most of it was my fault, or at least bad configuration:
Graphics drivers: awful. Four different versions, and I'd used the local install scripts rather than creating an RPM and installing that. (Though to be fair, that would just rebuild the driver from scratch when it was installed, rather than do something sane like build a set of modules for a particular kernel.) And I didn't figure out where the uninstall script was 'til 7pm, meaning lots of fun trying to figure out why the hell one machine wouldn't start X.
Lesson: This really needs to be automated.
Lesson: The ATI uninstall script is at /usr/share/ati/fglrx-uninstall.sh. Use it.
Lesson: Next time, uninstall the driver and build a goddamn RPM.
Lesson: A better way of managing xorg.conf would be nice.
Lesson: Look for prefetch options for zypper. And start a local mirror.
Lesson: Pick a working version of the driver, and commit that fucker to Subversion.
These machines run some scientific software: one master, three slaves. When the master starts up at boot time, it tries to SSH to the slaves to copy over the binary. There appears to be no, or poor, rate throttling; if the slaves are not available when the master comes up, you end up with the following symptoms:
The problem is that umpty scp processes on the slave are holding open the binary, and the kernel gets confused trying to run it.
I also ran into problems with a duff cable on the master; confusingly, both the kernel and the switch said it was still up. This took a while to track down.
It turned out that a couple of my kvm-based VMs did not have jumbo frames turned on. I had to use virt-manager to shut down the machines, turn on virtio on the drivers, then reboot. However, kudzu on the VMs then saw these as new interfaces and did not configure them correctly. This caused problems because the machines were LDAP clients and hung when the network was unavailable.
Growing up was wall-to-wall excitement, but I don't recall
Another who could understand at all...
-- Sloan
Monday: day two of tutorials. I found Beth Lynn in the lobby and congratulated her on being very close to winning her bet; she's a great deal closer than I would have guessed. She convinced me to show up at the Fedora 14 BoF tomorrow.
First tutorial was "NASes for the Masses" with Lee Damon, which was all about how to do cheap NASes that are "mostly reliable" -- which can be damn good if your requirements are lower, or your budget smaller. You can build a multi-TB RAID array for about $8000 these days, which is not that bad at all. He figures these will top out at around 100 users...200-300 users and you want to spend the money on better stuff.
The tutorial was good, and a lot of it was stuff I'd have liked to know about five years ago when I had no budget. (Of course, the disk prices weren't nearly so good back then...) At the moment I've got a good-ish budget -- though, like Damon, Oracle's ending of their education discount has definitely cut off a preferred supplier -- so it's not immediately relevant for me.
QOTD:
Damon: People load up their file servers with too much. Why would you put MSSQL on your file server?
Me: NFS over SQL.
Matt: I think I'm going to be sick.
Damon also told us about his experience with Linux as an NFS server: two identical machines, two identical jobs run, but one ran with the data mounted from Linux and the other with the data mounted from FreeBSD. The FreeBSD server gave a 40% speed increase. "I will never use Linux as an NFS server again."
Oh, and a suggestion from the audience: smallnetbuilder.com for benchmarks and reviews of small NASes. Must check it out.
During the break I talked to someone from a movie studio who talked about the legal hurdles he had to jump in his work. F'r example: waiting eight weeks to get legal approval to host a local copy of a CSS file (with an open-source license) that added mouseover effects, as opposed to just referring to the source on its original host.
Or getting approval for showing 4 seconds of one of their movies in a presentation he made. Legal came back with questions: "How big will the screen be? How many people will be there? What quality will you be showing it at?" "It's a conference! There's going to be a big screen! Lots of people! Why?" "Oh, so it's not going to be 20 people huddled around a laptop? Why didn't you say so?" Copyright concerns? No: they wanted to make sure that the clip would be shown at a suitably high quality, showing off their film to the best effect. "I could get in a lot of trouble for showing a clip at YouTube quality," he said.
The afternoon was "Recovering from Linux Hard Drive Disasters" with Ted T'so, and this was pretty amazing. He covered a lot of material, starting with how filesystems worked and ending with deep juju using debugfs. If you ever get the chance to take this course, I highly recommend it. It is choice.
Bits:
ReiserFS: designed to be very, very good at handling lots of little files, because of Reiser's belief that the line between databases and filesystems should be erased (or at least a lot thinner than it is). "Thus, ReiserFS is the perfect filesystem if you want to store a Windows registry."
Fsck for ReiserFS works pretty well most of the time; it scans the partition looking for btree nodes (is that the right term?) (ReiserFS uses btrees throughout the filesytem) and then reconstructs the btree (ie, your filesystem) with whatever it finds. Where that falls down is if you've got VM images which themselves have ReiserFS filesystems...everything gets glommed together and it is a big, big mess.
BtrFS and ZFS both very cool, and nearly feature-identical though they take very different paths to get there. Both complex enough that you almost can't think of them as a filesystem, but need to think of them in software engineering terms.
ZFS was the cure for the "filesystems are done" syndrome. But it took many, many years of hard work to get it fast and stable. BtrFS is coming up from behind, and still struggles with slow reads and slowness in an aged FS.
Copy-on-write FS like ZFS and BtrFS struggle with aged filesystems and fragmentation; benchmarking should be done on aged FS to get an accurate idea of how it'll work for you.
Live demos with debugfs: Wow.
I got to ask him about fsync() O_PONIES; he basically said if you run bleeding edge distros on your laptop with closed-source graphics drivers, don't come crying to him when you lose data. (He said it much, much nicer than that.) This happens because ext4 assumes a stable system -- one that's not crashing every few minutes -- and so it can optimize for speed (which means, say, delaying sync()s for a bit). If you are running bleeding edge stuff, then you need to optimize for conservative approaches to data preservation and you lose speed. (That's an awkward sentence, I realize.)
I also got to ask him about RAID partitions for databases. At $WORK we've got a 3TB disk array that I made into one partition, slapped ext3 on, and put MySQL there. One of the things he mentioned during his tutorial made me wonder if that was necessary, so I asked him what the advantages/disadvantages were.
Answer: it's a tradeoff, and it depends on what you want to do. DB vendors benchmark on raw devices because it gets a lot of kernel stuff out of the way (volume management, filesystems). And if you've got a SAN where you can a) say "Gimme a 2.25TB LUN" without problems, and b) expand it on the fly because you bought an expensive SAN (is there any other kind?), then you've got both speed and flexibility.
OTOH, maybe you've got a direct-attached array like us and you can't just tell the array to double the LUN size. So what you do is hand the raw device to LVM and let it take care of resizing and such -- maybe with a filesystem, maybe not. You get flexibility, but you have to give up a bit of speed because of the extra layers (vol mgt, filesystem).
Or maybe you just say "Screw it" like we have, and put a partition and filesystem on like any other disk. It's simple, it's quick, it's obvious that there's something important there, and it works if you don't really need the flexibility. (We don't; we fill up 3TB and we're going to need something new anyhow.)
And that was that. I called home and talked to the wife and kids, grabbed a bite to eat, then headed to the OpenDNS BoF. David Ulevitch did a live demo of how anycast works for them, taking down one of their servers to show the routing tables adjust. (If your DNS lookup took an extra few seconds in Amsterdam, that's why.) It was a little unsettling to see the log of queries flash across the screen, but it was quick and I didn't see anything too interesting.
After that, it was off to the Gordon Biersch pub just down the street. The food was good, the beer was free (though the Marzen tasted different than at the Fairmont...weird), and the conversation was good. Matt and Claudio tried to set me straight on US voter registration (that is, registering as a Democrat/Republican/Independent); I think I understand now, but it still seems very strange to me.
define command{
command_name check_wp_admins
command_line $USER1$/check_mysql_query -q 'SELECT COUNT(wp_users.user_login) AS "Admins"
FROM wp_users, wp_usermeta
WHERE wp_usermeta.meta_value LIKE "%administrator%" AND
wp_usermeta.user_id=wp_users.ID' -H $HOSTADDRESS$ $ARG1$
}
define command{
command_name check_wp_nasty_posts
command_line $USER1$/check_mysql_query -q 'SELECT COUNT(*)
FROM wp_posts
WHERE post_content REGEXP "iframe|noscript|display"' -H $HOSTADDRESS$ $ARG1$
}
The more I work with Python, the more I don't just like it but admire it.
Ugh...not much more right now. I've got a blocked eustachian tube that I'm self-medicating with a Python script^W^Wcold medicine, and the acetominiphen in it is making me hazy.
While trying to figure out why Nagios was suddenly unable to check up on our databases, I suddenly realized that the permissions on /dev/null were wrong: 0600 instead of 0666. What the hell? I've had this problem before, and I was in the middle of something, so I set them back and went on with my life. Then in happened again, not half an hour later. I was in the same shell, so I figured it had to have been a command I'd run that had inadvertantly done this.
Yep: don't run the MySQL client as root. Yes yes yes, it's bad
anyway, I'll go to sysadmin hell, but this is an interesting bug. The
environment variable MYSQL_HISTFILE
is set to /dev/null
for
root...and when you exit the client, it sets the permissions for the
history file to 0600. So, you know, don't do that then. (Still no
fix committed, btw...)
I can get really, really focussed sometimes. Every now and then that happens with Nagios.
Yesterday I had some time to kill before I went home, so I looked over my tickets in RT. (I work in a small shop, so a lot of the time the tickets in RT are a way of adding things to my to-do list.) There was one that said to watch for changes in our web site's main page; I'd added that one after MySQL'd had problems one time -- ran out of connections, I think -- and Mambo had displayed a nice "Whoops! Can someone please tell the sysadmin?" page (a nice change from the usual cryptic error when there's no database connection). Someone did, but it would've been nice to be paged about it.
At home I use WebSec to keep track of some pages that don't change very often (worse luck…), and I thought of using that. It sends you the new web page with the different bits highlighted, which is a nice touch. But I wanted something tied in with Nagios, rather than another separate and special system.
So I started looking at the Nagios plugins I had, and I was surprised
to find that check_http
has a raft of different options, including
the ability to check for regexes in the content. Sweet! I added a
couple strings that'll almost certainly be there until The Next Big
Redesign(tm), and done.
I started looking at the other plugins, and noticed check_hpjd
. A
few minutes later I was checking our printers for errors...just in
time to notice a weird error that someone had emailed me about 30
seconds before. Nice!
This morning (I work from home on Saturdays in return for getting
Wednesdays off to take care of Arlo) I was checking Cacti
(which rocks even if they do call it a solution). /home/visitors
with no free space? Wha'? Someone had run a job that'd managed to fill
the whole damned partition.
Well, there's check_disk
, but that's only for mounted disks — and I
don't want the monitoring machine freezing if there's a problem with
NFS. SNMP should do this, right? Right — the net-snmp project has
the ability to throw errors if there's less than a certain amount of
free space on a disk. For some reason I'd never set that up before,
nor got Nagios to monitor for it. A few minutes later and check_snmp
was looking for non-empty error messages:
$USER1$/check_snmp -H $HOSTADDRESS$ -o UCD-SNMP-MIB::dskErrorMsg.$ARG1$ -s ""
I looked ahead in snmpd.conf
and noticed the process section. Well,
hell! It's all very good to check that the web server is running, but
what if there are too many Apache processes? Or too few of MySQL? Or
no Postfix? Can't believe I never set this up before…
I've finally come up for breath. This wasn't what I planned on doing this morning, but I love it when a plan will come together next time.
Reminder for myself.
So you've got some backed-up MySQL table files (if that's the right
term), rather than a proper dump. Untar them somewhere, and note the
path to the data files -- say, /home/foo/mysql_recovery/data
. Copy
/etc/my.cnf
to your home directory. Edit it and change the port to
something different -- say, 3307. Run:
/usr/local/mysql/bin/mysqld --defaults-file=/home/foo/my.cnf --datadir=/home/foo/mysql_recover/data
Then run:
mysqldump -P 3307 --opt -u foo -p database > recovery.sql
Of course, all this could be prevented if you were running mysqldump nightly instead of just copying the data directories...
A while back I set up greylisting on Postfix for my home
server. It works well, but I have the same concerns now that I did
then. The script (smtpd-policy.pl
from the examples section of
Postfix' source) feels like a bit of a crock; yes, it's just the
example script, but I don't like the Berkeley DB files, and comments
in the code like "DO NOT create the greylist database in a file system
that can run out of space" make me nervous. It hasn't been a problem
-- in, oh, six months of running the file is only up to about 5.5
MB. But still: there's no provision for removing old entries, which
means an awful soul-searching battle with the database if you ever
need to trim it.
I had a brief look at the script tonight, hoping to find a way to maybe hack in MySQL support, but decided to check with Saint Google first. Sure enough, there's gps, the Greylist Policy Service for Postfix. Uses C++ for speed and MySQL/PostgreSQL for the backend, which is nice. I should be able to hack up a migration script for the old entries (just as soon as I hack up a migration script for all the old journal entries...), and all should be good.
One thing I'm noticing with greylisting, though, is just how many attempts are being made from multiple IP addresses within a short time; one attempt, today, had attempts from four different IP addresses within five minutes, all from the same made-up email address. The original Perl script has the advantage that I can change it easily -- I know Perl, and I'd be pretty much starting from scratch with C++ -- and maybe add the ability to track this sort of thing. It'd be nice to be able to tarpit attempts to do this, say on the third attempt.
Tarpitting...another problem with Linux. The TARPIT module for
netfilter has yet to be updated to work with the 2.6 kernel, and I
really don't want to switch back to 2.4 just for this. LaBrea is
nice, and I'm running a lashed-together natd
configuration on my
FreeBSD firewall box in conjunction with LaBrea running on my desktop
on a second interface. It works, but it doesn't work in the case of a
Linux webserver running on its own, outside the main firewall. I'm
even less a kernel hacker than I am a C++ programmer, and figuring out
the compiling problems and changed skbuff
route structures (say) is
beyond me. It's things like this that make me want to move to
OpenBSD. Yeah, rebuilding a server and learning a new firewall
language is a pain in the ass, but at least it's one I can handle.
So I had a bit of a brainstorm the other day. I've got two servers: Here and There. There's some stuff Here that needs to move There. The problem is that the server Here is in use a fair bit, and part of that use involves INSERTing things in MySQL and then SELECTing them back again. It's a pain to shut down things Here altogether in preparation for moving There, particularly as the move is liable to take, oh, twenty-four hours or so. The database needs to be consistent between the two, but the length of the move makes that impractical unless Special Measures are taken.
Dark server room. Midnight. We see THE SUPERVISOR talking to THE SYSADMIN.
SUPERVISOR: That database needs to be consistent, dammit!
SYSADMIN: (tightly) I can't do that without taking...special measures.
SUPERVISOR grimaces.
SUPERVISOR: Whatever it takes, dammit. I don't want to know.
SYSADMIN: All right, then. I'll do your dirty work.
SYSADMIN turns slowly and walks out the door.
SUPERVISOR: Dammit!
I will conced that's a little dramatic. But what else would you call
MILITARY-GRADE ENCRYPTION, i.e. SSH tunnels from Here to There? (It
must be military grade; it's developed in Canada.) Okay, so it's not
that big a deal for you people what think all the time. But it was
pretty clever, I thought, and would ensure that the everything was,
like, cool and stuff because -- this is the good part, see -- we would
tunnel the MySQL connection from Here to There over SSH! Brilliant!
It only needs a short break in the service from Here, then all the
database updates that might come from Here go There! Yeah! So I began
trying that out today. It's was a bit of a pain to set up. I had to do
some funky firewall-fu There to get SSH in in the first place. Then I
had to figure out the right syntax for netmasks for hosts.allow
(for
the record, it's 255.255.255.0
, not /24
). Then I had to figure out
how to get the MySQL client to connect to an arbitrary port. That took
a while. I offer you this hard-won piece of knowledge in the spirit of
Free Knowledge:
When using the MySQL client, do not confuse the
-H
option (output in HTML, please) with the-h
option (connect to the specified host, please). That's a silly mistake to make.
However, what's not a silly mistake is expecting -h localhost
to
do the right thing and connect. This is either an omission in the
otherwise-excellent MySQL, or else a case of our nameserver not having
a record for localhost
. I strongly suspect the latter.
That said, it appears to be working: I can now be refused a connection to the MySQL server There from Here. Truly, I am a golden god.
Except maybe when it comes to backups or SCSI or something. I ran into
some problems with AMANDA's backups last night. I saw these rather
frightening messages this morning in dmesg
. After sticking my tongue
cutely out the side of my mouth to indicate fierce concentration and
colouring in some printed log files in different flourescent colours,
I was left with this series of messages:
Aug 23 23:46:57
localhost /kernel: (sa0:ahc0:0:3:0): SCB 0xe - timed out Aug 23
23:46:57 localhost /kernel:
>>>>>>>>>>>>>>>>>>
Dump Card State Begins <
<<<<<<<<<<<<<<<< Aug
23 23:46:57 localhost /kernel:
<<<<<<<<<<<<<<<< Dump
Card State Ends
>>>>>>>>>>>>>>>>>>
Aug 23 23:46:58 localhost /kernel: (sa0:ahc0:0:3:0): Queuing a BDR SCB
Aug 23 23:46:58 localhost /kernel: (sa0:ahc0:0:3:0): Bus Device Reset
Message Sent Aug 23 23:46:59 localhost /kernel: (sa0:ahc0:0:3:0): SCB
0xe - timed out Aug 23 23:46:59 localhost /kernel:
>>>>>>>>>>>>>>>>>>
Dump Card State Begins <
<<<<<<<<<<<<<<<< Aug
23 23:46:59 localhost /kernel:
<<<<<<<<<<<<<<<< Dump
Card State Ends
>>>>>>>>>>>>>>>>>>
Aug 23 23:46:59 localhost /kernel: (sa0:ahc0:0:3:0): no longer in
timeout, status = 34b Aug 23 23:46:59 localhost /kernel: ahc0: Issued
Channel A Bus Reset. 1 SCBs aborted Aug 23 23:46:59 localhost /kernel:
(sa0:ahc0:0:3:0): failed to write terminating filemark(s) Aug 23
23:47:59 localhost /kernel: (sa0:ahc0:0:3:0): SCB 0xe - timed out Aug
23 23:47:59 localhost /kernel:
>>>>>>>>>>>>>>>>>>
Dump Card State Begins <
<<<<<<<<<<<<<<<<
...and on it goes.
Saint Google asserts that this is probably a case of SCSI cables not being terminated properly, or getting too close to the power supply. Sure enough, the latter may be a problem. I made what adjustments I could without taking down the server, and we'll see what happens tomorrow. Weird. I am having the strangest sense of deja vu right now looking at that log entry in vi. Huh.
What else? I'm typing this right now at a local coffee shop where I was able to pick up wireless service; unfortunately, the cheap bastards want money. I tried pinging various addresses for a while, thinking about setting up an IP-over-ICMP-or-possibly-over-DNS proxy from my home network, then gave up and turned off the wireless card. It's good to know that it works, and it's good to know that there are places left where you can hear both Lisa Stansfield and Rick Astley in the space of five minutes. And there was much rejoicing.
Cool bit of the day from the PHP docs:
<directory /var/www/html/mydatabase>
php_value mysql.default_user fred
php_value mysql.default_password secret
php_value mysql.default_host server.example.com
</directory>
Graham Rule at ed dot ac dot uk, you rule.
When you have:
you DO NOT need me to install phpMyAdmin in order to manipulate tables. Nor do you get bonus points for asking me how to connect to MySQL without phpMyAdmin. No, thank you.
Getting closer to getting MySQL working. I came across this post
today which seemed to be nearly identical to what was happening to
me. I followed the suggestion and took out the --enable-static
option I'd been putting into configure
. Result: much happier, with
hardly any crashing at all. Now if I can just get it to find the
user.frm
table, I'll be a happy monkey. All this to pick up a copy
of libmysqlclient.so
. I must be on crack.