WARNING: The size of this disk is 2.4 TB (2391994793984 bytes). DOS partition table format can not be used on drives for volumes larger than 2.2 TB (2199023255040 bytes). Use parted(1) and GUID partition table format (GPT).
So a while ago, I wrote about the li'l ol' laptop under the TV; an old, old Dell with a P3 processor that was finally coming to an end. Oliver Hookins, bless his heart, recommended the Zotac ZBox; after a bit of research, I agreed and bought the CI-320. (Sorry, Wout, but I wanted something with a bit more horsepower than a Banana Pi.) I bought 4GB of RAM for it, and I already had a 64GB SSD lying around. Debian installed on it w/o any problems whatsoever, and I migrated everything over a couple of weeks ago.
It's pretty wonderful, not least because it's completely silent. It's passively cooled, and with the SSD there are no moving parts. I've got an external HD attached to it via USB (though this thing has also got eSATA, GigE, wireless, HDMI...), and it does backup for the house. I finally got rid of my crappy, crappy-ass rsync wrapper and set up rsnapshot; I've been told to check out elkarbackup, a nice-looking web interface for it. (Now if I can only get off my butt and set up duply and offsite encrypted storage...)
And the name? Zombie.saintaardvarkthecarpeted.com. Zbox...what're you gonna do?
I'm frugal (which sounds nicer than "cheap"). My laptop is a refurbished Chromebook I got from US Walmart for $350. It's not bad at all -- runs Linux, 4GB RAM, decent processor and turns out the 16GB SSD hasn't been a limit so far. But it's a Yugo, all right; sound is fussy, the screen is small (which I haven't valued as much as I thought I would), and the trackpad is for shit. But it was cheap!
My former laptop was a monster of a Dell; it had been my wife's before the wireless crapped out and we (sort of) upgraded her. I stuck in a USB wireless stick that mostly works, and went with that 'til the monitor started flashing angry red unless you tilted it just right. Now it belongs to the kids, and they use it to send email and play Minecraft.
And the laptop before that? Refurbished Dell (there's a pattern here) I bought off eBay for the 12" screen. It's a P3 (remember those?) with 500 MB of RAM. The battery doesn't work any more, of course, but it still serves my website, catches my mail and runs Emacs & Mutt. And until today, it held music and backups too, on an external hard drive my dad sent me long ago: an old 500 GB PATA drive in a Ferrari case that came from his town library. It died today, but I had been in the habit of rsyncing it to a couple other external drives (newer vintage), plus the kids' laptop, so it was easy enough to recover things.
It turns out it was a good match for this old laptop: 500GB is small enough that it didn't strain its tiny little brain trying to count too high. Hooking up the 3 TB hard drive just caused all sorts of problems. I might have a smaller hard drive that'll work...but of course, the laptop was old when I bought it in 2007. The one USB port on it is 1.0, for heaven's sake. It's amazing that it's kept running this long.
So what now? I'm not sure. I hate dealing with hardware anymore -- who has the time? -- but I also don't want to host my website & mail externally. I also still need some kinda server to play music and such. I want something silent, Linux/BSD compatible, already assembled, and durable -- I don't want to do this again for another 5 years. And did I mention I'm cheap^Wfrugal?
Intel NUC might be an idea. I've got a Raspberry Pi that has been nothing but trouble because of USB problems; ditto a PogoPlug before that. I love the idea of low-power boxen, but they're turning out to be a PITA. I could go to the local FreeGeek and try another laptop -- $100 would probably get me something decent and considerably more recent.
Sigh, dunno...snappy conclusion goes here.
After some upgrades (kernel and otherwise) to an Ubuntu 12 workstation, a user reported one of their monitors insisted on displaying at low resolution (800x600, instead of the 1920x1024 it had previously). I eventually figured out that X and/or the driver (both Radeon and proprietary ATI) could not get EDID info from the monitor anymore. This lead down a few rabbit holes, including a bug in Intel's driver and reflashing EDID info on the affected monitor.
In the end, though? Replacing the goram cable (analog, if that makes a difference) did the trick. I now have the cable, cut in half, hanging over my desk as a trophy.
Last night I noticed that my MythTV box wasn't up, so I turned it on. I decided it would be a good time to do some work on it that I've been putting off for a while. Ten minutes later, I rebooted to test it...and spent the next ten minutes listening to whiiiir whiiiir beep click click click, over and over again.
Digging through this blog, turns out I actually bought this thing in April 2005...so it has lasted a good long time. (Cue guilt about burning through natural resources and other people's pain.) I'll have to see if I can get it going again.
I've got a new workstation at $WORK. (Well, where else would it be?) It's pretty sweet: i7 quad-core processor, clock speed > 3GHz (honestly, I barely keep track anymore), and 8GB of RAM. 8GB! Insane.
When I arrived in 2008, I used a -- not cast-off, but unused P4 with 4 GB of RAM. I didn't want to make a big fuss about it; I saved the fuss, instead, for a nice business laptop from Dell that worked well with Linux. Since 90% of my work is Firefox + Emacs + XTerms, and my WM of choice at the moment is Awesome, speed was not a problem and the memory was fine.
Lately, though, I've discovered Vagrant. It looks pretty sweet, but my current machine is sloooow when I try to run a couple of VMs. (So's my laptop, despite a better processor; I suspect the 5400RPM drive.) I'm hoping that the new machine will make a big difference.
Just gotta install Ubuntu and move stuff over. Fortunately I've been pretty good about keeping my machine config in Cfengine, so that'll help. And then build some VMs. I'm always surprised at people who feel comfortable downloading random VM images from the Internet. Yeah, it's probably okay...but how do you know?
One thing that Vagrant is missing is integration with Cfengine. Fortunately, the documentation for extending it seems pretty good (plus, I can always kick things off with a shell script). This might be an excuse to learn Ruby.
Been busy lately:
3 new workstations with OpenSuSE. Can't figure out the autoinstall, so it's checklist time, baby.
Software upgrade for a fairly important server + 3 slave nodes. Natch, after rebooting one of the ILOMs for the servers just...went away. Can't ping it from the network. Works fine with an interactive ilom shell from Linux. Sometimes I really hate Dell software.
Got a call from the reseller for a major hardware vendor who just got taken over by a major database vendor; said db vendor has just turned off educational discounts we'd spent THREE MONTHS negotiating/waiting to have approved. I am unimpressed. Strongly tempted to call up random hardware vendors and throw money at them 'til they give us stuff.
Finally got leak detection working in the server room. Stupidly long time, it took.
Working on a "Lessons Learned" presentation for LISA that'll include mention of the leak detection (among other things). Not sure how it'll be received, but I figure it's their job to tell me it sucks, not mine.
New term coming, so about six new people coming. But at least I know about them in advance.
And this...and this...just amuse me. (Warning: Flash eats babies and sells them to Chinese hackers.)
Taxes.
But hey! Turns out we live in a constitutional democracy after all. There was some debate about this at 24 Sussex Drive, I understand. Score one for the good guys.
The bge driver for OpenBSD says that the Broadcom BCM5700 series of interfaces has two MIPS R4000 cpus. And you can run Linux on an R4000, or NetBSD.
Must...stop...recursion...
At the risk of tempting fate, I just realized that my web server is five years old (and a bit). Happy birthday, Thornhill!
This is irritating...
We've got four new Dell R410 servers at work. Natch, I want 'em working with serial consoles so I don't have to sit in the server room. Three of them worked; the fourth did not, despite having identical BIOS/Grub settings.
The symptom was quite maddening: After getting past the various BIOS checks, the Grub menu would not appear unless you sat there and typed something. After that, you'd get the usual Grub entries and could boot as usual. If you did not hit a key, the machine would just hang -- no response to keypresses at all, and you'd have to power cycle.
I spent a stupid amount of time comparing BIOS and Grub settings but was unable to find anything different. Finally today I typed "grub console timeout serial dell" into Google and found this bug in Launchpad, with this comment as the last one:
Having the same hanging issue at the Grub 1.5 stage on brand new R200 Dell servers running OpenSuse 10.3. The terminal timeout is set to 10 and we get 10 press any key to continue messages and then a full system hang requiring a hard reboot.
If we do press any key on a connected console (using Dell's Serial Over Lan) or locally before then end of the timeout then it boots fine so seems to be a bug in continuing at the end of the wait time.
Removing the terminal line from /boot/grub/menu.1st seems to fix the issue on our servers. The console in this case is sent by BMC to both the local screen and the remote console with no timeout so works a treat. This may only work with Dell's BMC/SOL but thought I'd mention it in case anyone else has spent a day getting frustrated with this like we have.
This worked a treat, with the added bit of weirdness that I had two "terminal" lines:
terminal --timeout=2 serial console
serial --unit=0 --speed=9600
default=0
timeout=5
serial --unit=1 --speed=115200
terminal --timeout=5 serial console
and now I have one:
terminal --timeout=2 serial console
serial --unit=0 --speed=9600
default=0
timeout=5
serial --unit=1 --speed=115200
# terminal --timeout=5 serial console
Yes, I know that's redundant, but again: it worked on the other three machines.
I don't know if this is a problem with Grub, with Dell's firmware or something else, but Gott in himmell I hate bugs like this.
As recycled by Bradley M. Kuhn on identi.ca, here's another tool for recovering a dead hard drive: a toaster oven.
I'm not sure exactly where I saw that DRAC6 Express does not do console redirection -- it was on a mailing list somewhere -- but that turns out to be just wrong:
(For the record, it was the "External Serial Connector" in BIOS that got me; it should be "serial device 1", not "Remote Access Device".)
I can now SSH to the DRAC and get a console just fine. I wish to apologize to Dell, the people of Monaco and the constellation Sagitarrius.
Ran into a little problem this week when I tried to do a restore from a backup at work. Bacula loaded the tape, then said it couldn't read the label. Wha?
After much investigation, during which I completely neglected to cut-n-paste the error messages, I think I've figured out what happened:
Ack. Needless to say, this was not good. Fortunately, the file in question was not a terribly important one; unfortunately, that's about the last 2 weeks of incrementals gone. Lesson learned: don't assume your backup program knows what's going on when hardware reboots from under it.
In other news: on Thursday I got 5 new Dell servers. Woot! One of 'em will be our new LDAP/web/email/FTP server (Xen ftw!); the rest are going to be running protein search engines for various researchers across BC. They're racked and I'm stoked, except that it turns out the difference between the DRAC6 Express and Enterprise, besides a few hundred dollars, is that the Enterprise does console redirection and the Express doesn't. Dammit.
I'm going to see if there's any trickery that can be done, but I'm not holding out hope. I have got a 32-port console server, but it's two racks away...might have to run a small batch o' cables up and over to make this work.
I've run into an interesting problem with the new backup machine.
It's a Sun X4240 with 10 x 15k disks in it: 2 x 73GB (mirrored for the OS) and 8 x, um, a bunch (250GB?), RAID0 for Bacula spooling. (I want fast disk access, so RAID0 it is.) RAID is taken care of by an onboard RAID card, so these look like regular disks to Linux.
Now the spool disk works out to about 2.2TB or so — which is big
enough to make baby fdisk
cry:
WARNING: The size of this disk is 2.4 TB (2391994793984 bytes). DOS partition table format can not be used on drives for volumes larger than 2.2 TB (2199023255040 bytes). Use parted(1) and GUID partition table format (GPT).
Well, okay, haven't used parted before but that's no reason to hold
back. I follow directions and eventually figure out that mkpart gpt
ext3 0 2392G
will do what I want. GPT? Piece of cake! And then I
rebooted, and I couldn't boot up again. Blank screen after the
POST. Crap!
The first time this happened, the reboot also coincided with some additional problems during the POST where too many cards were trying to shove their ROM into the BIOS memory (or some such); I thought the two were connected. But then I did it again today, and I finally started digging.
The problem is that parted overwrites the MBR when setting up a GPT disklabel. This has been noted and argued over. My understanding of the two sides of the debate is:
Meanwhile, the parted camp has a number of bugs dealing with this very issue, two opened a year ago, and none have any response in them.
This enterprising soul submitted a patch back in December 2008, which appears to have fallen to the floor.
As for me, I was able to convince the BIOS to boot from the smaller
disk, and then get a rescue CentOS image going via PXE booting, and
then reinstall grub on the smaller disk. Sorted. All I had to do was
change root (hd1,0)
to `root (hd0,0) in grub.conf.
A touch anti-climactic after all that, perhaps. But it was interesting a) to learn about all this (I hadn't really thought about successors to the DOS partition format before), and b) to see what a slender thread we (okay, I) hang our hopes on sometimes. It's a necessary, sobering thing to realize how much of what I use, depend on, believe in is created by volunteers who are smart, hard-working people — they argue and and focus and forget just like real people, not inhabitants of some shining city on a hill I sometimes take them for ("Next beer in Jerusalem!").
I'm back at work after a week off. The UPS control panel continues to work (!), but there is no word back from the manufacturer (says the contractor who installed the thing and filed the ticket). I find this troubling; either the manufacturer really hasn't got back to us yet (bad), or I should have insisted on being a contact for the ticket. I'll have tos ort this out tomorrow.
Spent much of my day tearing my hair out over mod_proxy_html. Turns out that, by default, it strips the DTD from the HTML it proxies; this is a problem for one app that we're proxying. Not only that, the DTDs it does support are HTML, XHTML, and either with a "Transitional"/Legacy flag — but no URI to a DTD, like the one pointing to the Loose DTD that our app uses and the damned thing threw to the floor. (Sorry, brain cells on strike today and my ability to write clearly is going downhill.)
You can specify your own DTD, including a URI (undocumented feature, whee!), and thus put back in the original — but it doesn't append a newline, there's no way to append a newline that I could figure out, and so it mushes the DTD together with the first html opening tag and makes baby Firefox cry and render the page badly.
My rule of thumb for a long time was that if I start lppooking at
source code, I'm in over my head. I'm starting to think that may not
be entirely true anymore, that I've advanced to the point where I can
read C (say) and generally understand what's going on. But when I
start looking for API documentation for Apache 2.2 (surprisingly hard
to find) to find out if, say, ap_fputs
or apr_pstrdup
chomp
newlines or something (near as I can tell, they don't), or just what
AP_INIT_TAKE12
takes as arguments…well, then I am in over my
head. If nothing else, I don't want to make some silly error
because I don't know what the hell I'm doing. (That's not a slam
against the Debian folks; I just mean that I felt shivers when I read
about that, because I dread making the same sort of highly-visible,
catastrophic error) (unlike the rest of the planet, you understand).
Full day:
Dress rehearsal includes checking to see if you can, in fact, unrack something. I was uanble to move a switch this morning because it was stuck behind a PDU. Arghh.
The saga of our crashing UPS continues. The techs came out to visit this morning, which meant I needed to schedule downtime so they could bypass the UPS manually. They were unable to find any smoking gun (or capacitors), and need to confer with HQ again. Best case: the UPS control panel continues to work, and they can do the next round of work w/o a manual bypass. Worst case: the control panel crashes again, and we schedule another round of downtime.
Gave a tour of the new server room today to about 30-odd people in the department. Ended on a bit of a low note ("and that's the end! Any questions?") but other than that it went well. Even got an ounce of champagne at the end of it.
Oh, and yesterday I found out that our SL-500 has three fibre channel interfaces, compared to the one interface in the server we bought. I think the sales folks assumed we had a fibre switch, and I didn't realize it all (data + control) wouldn't go over one cable. Arghh.
Just saw a character named Terence on "Entourage" who was not Terrance Stamp. Now I want to see "Bowfinger" and "The Limey", in that order.
Given the recent hoo-ha about abandoned blogs, and my own tendency to lose interest in writing about something the longer I put it off (I haven't graphed it, but I suspect it's a nice exponential decay), I figured I should finally write up what I've been doing the last week: the move at $WORK to our new server room.
So: construction finally got finished on our new server room. Our UPS was installed, our racks set up, and the keys handed over (though they were to be changed again twice). Our new netblock was assigned, the Internet access at the new location was in place, and movers were booked.
Things I did in advance which helped immensely:
Last Thursday morning, it all started. I got the machines shut down (thank you, SSH and ubiquitous wireless access at UBC) before the two volunteers who were helping me showed up. We started getting machines unracked; since it was only about 20 machines, I figured it wouldn't take too long. While that was true, I had not counted on the rat's nest of power cables (our power requirements were such that we had to connect machines to PDUs in adjacent racks), or the fact that we wouldn't be able to disassemble that 'til we'd got the machines out.
There was one heartstopping moment: a 1U server, while extended on its rails, came off one of the rails while no one was supporting it. Amazingly the other rail held on while it rotated quickly through 90 degrees to bang loudly against the rack. "You swear quickly," the movers remarked. (Doubly amazingly, the machine seems to be fine, though the rails for the thing are shot.)
The movers were big and burly, which was wonderful when it came to moving the Thumper. I weigh more than it does, but not by much, and I'd had the bad fortune to screw up my back a week before the move. It was tricky trying to figure out how to remove it from the rails, but the movers' trick of supporting it with a couple of big blankets, while fully extended from the rack, made such considerations less urgent. Eventually we got it figured out. I don't know how that could have gone smoother, since we'd got Sun to rack the thing and, frankly, it's not like you spend a lot of time un- and re-racking something like that. Anyhow, a minor point.
The new location was right around the corner, which was handy. The movers had put the servers in these big laundry-like carts on wheels; in the end, we only had four of em. We got the machines unloaded, racked the Thumper with the movers help, signed the paper, then went off for lunch where we picked up two more volunteers.
After that, we started racking servers. Having only one sysadmin around (me) proved to be a bottleneck; the volunteers had not worked with rackmounted machines before, and I kept having to stop what I was doing to explain something to them. It would have been a great help to have another admin around; in fact, I think this is the biggest move I'd want to make without some other admin around.
Problems we ran into:
Things that went well:
I'm going to post this now because if I don't, it'll never get done. I may come back and revise it later, but better this than nothing at all.
This has been one of those days where all I've done is stare at monitors too closely.
I know, I'm a sysadmin, what do I expect? But some days I get up, move around; I'm sedentary (and introverted) by nature but I try to talk to people, stare off into the distance, get away from my desk. Going to the server room is always a good break.
Not today, though. My carefully-chosen ATI video card (the Radeon 4550) is giving me headaches, metaphorical and real:
Dual monitors is important. My own damn fault for not getting something old enough...
NetSNMP uses 32-bit counters for disk sizes. Guess what happens when you've got one of these?
Due to be fixed in the next release, so at least that's something.
Actually for a whole office. Excellent reading. Wish I'd known about this at $JOB-2...
With the move to the server room coming up in a couple months, I've been spending some time trying to lay out the racks we'll have there. My current layout is in an OpenOffice spreadsheet; I thought I'd try some other tools and see how they shape up.
Still sticking with a spreadsheet for now; it's not the best, but it is flexible and quick. Any other tools I missed?
We've got a new server room being built right now; it should be done in about six weeks, so I'm putting together an order for bits and pieces that I'll need.
I've mentioned before that cable management is one thing I get obsessed about, so this site is like porn for me. I'm not shilling for them; haven't ordered from them, no idea if they kill puppies in their spare time or what, but holy CRAP this is all the stuff I've ever wanted: RipWrap (so that's what it's called!), label printers, 87 varieties of zap straps, and I don't know what all.
Wow. Just wow.
Edit: Okay, seriously. There's some really good stuff in here among the advertisements.
Okay, I feel like a bit of a tool for never realizing how cool suspend-to-ram is in a laptop. My new laptop for work is a Dell D630, which I'd got 'cos its hardware is pretty much completely compatable w/Linux. However, I've also figured out that a) Ubuntu does suspend-to-ram quite nicely (aside from a couple times when the keyboard doesn't work, but closing/reopening the lid makes it work), and b) it just sips — sips, I tell you! — from the battery.
Now to try and make it work on my own laptop, which is currently sitting at the shop waiting for me to pick it up.
Today's agenda:
See? I am still a sysadmin.
Last week was reading week here at UBC. Monday I was off sick. Tuesday we got an email from the folks at the building where we've got guest access to one of their server rooms: the cooling was being shut down from 7am on Wednesday to 3pm on Thursday, so we'd have to turn off our servers. We're guests, so it's not like we've got a lot of say in the matter.
Natch, Thursday 3pm came and went. We got an email at 3:45pm from a manager there, saying that unexpected problems had arisen; they were hoping to have things back up by the weekend. That night I pointed our website at a backup server; it was not serving my boss' big web app, as there was no way to make that tiny little box serve a nearly 1TB database.
Friday I obsessed over the ambient temperature on our firewall (which I'd left turned on); it hovered around 35C. Around 10am we were told that they were hoping to have it on later that day, but that another shutdown might need to be scheduled for the next week (this week). At noon we were told that things were looking hopeful, but they couldn't guarantee cooling over the weekend.
At 2pm I found a local A/C rental agency who told us they'd be out to look at the room on Monday. 4pm I emailed my contact at the other department, plus his manager, to ask for updates and whether any further shutdowns could be scheduled after we'd arranged for cooling.
Over the weekend I obsessed over the temperature some more; it had dropped to 21C and stayed there, but without feedback from the facilities people I was reluctant to trust it.
Monday (yesterday; wow, time flies) we were told that the cooling system should perform well; however, a part still needed to be replaced. It was on order and would be coming in late this week or early next, and would require a four-hour outage.
This morning the cooling guy visited (he was at a funeral yesterday, so fair enough) and said that, yep, we could get a nice portable unit in for around $400 for a week.
I'm not writing this down because I'm proud of how I handled this. I'm writing this down so that someone else can maybe learn the things I should've known:
I have a habit of thinking "There's not much that can be done about that." Actually, it goes even further than that; it doesn't occur to me sometimes to think about what can be done. I'm not sure if this is lack of confidence, or trying too hard to get along, or just sheer laziness, but I'm trying hard to stop doing that.
"Phycicists are fun to be around. I was watching TV with one, and a commercial came on for OxyClean. The announcer's voice comes in, strong and deep, and says, What's the most powerful force in the universe? The guy I'm with starts pumping his fist and chanting, Strong nuclear force! Strong nuclear force! The announcer comes back and says, That's right, oxygen! Poor bastard looked like someone just shat in his ear."
(Conversation with a friend just now.)
Two things that didn't work:
Explanation: there's ou=Smith and ou=Jones, both of which are under ou=People,dc=example,dc=org. Smith wants to offer Jones the use of a few of his machines, which means setting up accounts for Jones and a few of his folks (cn=Alice, cn=Bob, and cn=Charlie). Obviously, these should be in ou=Jones, right? But if Smith's machines, through the wonders of pam_ldap, are set to check ou=Smith, how do Jones' folks log in?
(Digression: actually, Smith's machines right now check under ou=People — not ou=Smith,ou=People. Smith is the first one to use LDAP, so I stuck with that. I was going to change that at some point anyway, and I thought this might be a good chance to do just that.)
I thought I could try adding an alias, under ou=Smith, that'd point to cn=Alice,ou=Jones. And if I told LDAP that it was a posixAccount as well, then I could look at the account details with id and getent. But the logs showed that it just didn't work:
pam_ldap: error trying to bind as user "uid=Alice,ou=Jones,ou=People,dc=example,dc=org" (Inappropriate authentication)
Couldn't track down the error quickly, so went to plan B: stick with the current setup (machines checking ou=People) and put 'em under ou=Jones. I can always add host restrictions later on.
Explanation: Smith had a bunch of these machines at another location before getting server room space at UBC (and new servers). My access to them previously was via SSH only — there was no console access at all (sigh). Now they're at UBC, and one of 'em's gonna be my monitoring machine/second LDAP server ("The new server room: now with redundancy!") But while it was simple to turn on console redirection and choose PXE booting from the comfort of my office, I ended up borking the kickstart process and having to come back here anyway to set up the install. There's the BMC, which apparently I can access via the serial console if I so choose, but I'm still trying to figure out what that'll get me — ie, I can't find a manual in 11 seconds, so I'm putting that off for now.
Oh, and my new (work) laptop is in. Yay! It's a Dell D630, and aside from it's obscene footprint compared to my (ailing) C400, it's great. Ubuntu (Hardy for compatibility with the desktops here) is on so far, and CentOS (server work) and OpenBSD (instant firewall) aren't far behind.
I can't believe it...my youngest son, after nearly three weeks of being up four or five times each night, slept nearly all the way through without a break: he only woke up at 1am and 5:15am, which is close enough to my usual wakeup time as makes no difference. It was wonderful to have a bit of sleep.
This comes after staying up late (11pm!) on Sunday bottling the latest batch of beer, a Grapefruit Bitter recipe from the local homebrew shop. You know, it really does taste like grapefruit, and even this early I'm really looking forward to this beer.
My laptop has a broken hinge, dammit. I carry it around in my backpack without any padding, so I guess I'm lucky it's lasted this long. Fortunately the monitor still works and mostly stays upright. I've had a look at some directions on how to replace it; it looks fiddly, but spending $20 on a new set of hinges from eBay is a lot more attractive than spending $100. Of course, the other consideration is whether I can get three hours to work on it….But in the meantime, I've got it on the SkyTrain for the first time in a week; it's been hard to want to do anything but sleep lately.
Work is still busy:
I'm trying to get tinyMCE and img_assist to work with Drupal
Contacting vendors to look at backup hardware. So far we're looking at the Dell ML6010 and the Sun SL500. They're both modular, which is nice; we've got (low) tens of TB now but that'll ramp up quickly. The SL500 seems to have some weird things; according to this post, it takes up to 30 minutes to boot (!) and you can't change its IP address without a visit from the service engineer (!!). Those posts are two years old, so perhaps things have changed.
Trying to figure out what we want for backup software, too. I'm used to Bacula (which works well with the ML6010) and Amanda, but I've been working a little bit with Tivoli lately. One of the advantages of Tivoli is the ease of restoring it gives to the users…very nice. I'm reading Backup and Recovery again, trying to get a sense of what we want, and reviewing Preston's presentation at LISA06 called "Seriously, tape-only backup systems are dead". So what do we put in front of this thing? Not sure yet…
Speaking of Tivoli, it's suddenly stopped working for us: it backed up filesystems on our Thumper just fine (though we had to point it at individual ZFS filesystems, rather than telling it to just go), then stopped; it hangs on files over a certain size (somewhere around 500kb or so) and just sits there, trying to renew the connection over and over again. I've been suspecting firewall problems, but I haven't changed anything and I can't see any logged blocked packets. Weird.
Update: turned out to be an MTU problem:
I had no idea there were GigE NICs that did not support Jumbo frames. Though maybe that's just the OpenBSD driver for it. Hm.
I knew I didn't like Vaio's very much, but I had no idea they were so awful — to the point of requiring hacking on your goddamn BIOS to enable VMX.
The flash demo for Dell's ML6000 tape library boasts that it's "completely self-aware". Not sure I want SkyNet running my backups…
O'Reilly has an upcoming webcast on -- deep breath -- "Advanced Twitter for Business". (At least they didn't call it a webinar. When I told my wife about this, she said "So...you and O'Reilly break up yet?"
Obviously not, because I've just ordered Backup and Recovery and Linux Clusters with Oscar, Rocks, OpenMosix and MPI. I had purchased B&R at my last job, but this is for me.
And did I mention the dream I had a while back about a Sun laptop that looked like an X4200 server folded in half? In the dream it ran nearly perfectly, except when you tried to go to a web page with flash; then it would crash, and a movie of Matt Stone would play, apologizing on behalf of Jonathan Schwartz and everyone else at Sun.
I'm playing with the CVS version of Emacs after reading about some of the new features in what will become Emacs 23. It's nice, but the daemon mode isn't quite multi-tty — you can run Emacs server, detached from any TTY, but if you try connecting to it with multiple emacsclient instances, the first one is where all the TTY action goes. Not sure what I'm missing.
I'm in the process of setting up a bunch of new servers for $job_2. All but one are CentOS 5.2, kickstart installed and managed with cfengine. This is the third time I've goen thorugh a cfengine setup, and it always feels like starting from scratch each time. It seems -- and I'm not at all sure this is fair or accurate -- that each time I set up one of these systems, there's a lot that I've lost from the last time and have to relearn. I'm fortunate this time that I can refer to $job_1's setup to see how I did things last time, but if I didn't have that I'd be significantly further behind than I am.
I'm not sure what the solution is. Part of me thinks I should just be more aggressive about taking notes, or committing stuff to a private repository, or writing it down here more; part of me thinks that this might be a clue that cfengine is too low-level for my head. It feels like when I was trying to learn C, and couldn't believe that I had to remember all this stuff just to print something, or read a file, or connect to another machine over the Internet. By contrast, Perl (or any other scripted language) was such a relief...just print, or open, or use the Net::Telnet module, or whatever. The details are there and they are important, sometimes very much so; that doesn't mean I want to learn more metallurgy every time I need a fork. (No, I don't think that metaphor's tortured; why do you ask?)
Another thing is that I'm trying to get multipath connections working for the first time. We've got two database servers, each of which is connected via dual SAS HBAs to outboard disk arrays. (I don't think anyone else calls them "outboard", but I like the sound of it. See this hard drive? It's outboard, baby!) The arrays are from Sun and come with drivers, but the documentation is confusing: it says it's available for RHEL 5 (aka CentOS 5), but the actual download says it's only for RHEL 4.
As a temporary respite, I'm trying to see if I can get these working using Linux's own multipath daemon, and it's also confusing. The documentation for it is tough to track down, and I just don't understand the different device names: am I meant to put /dev/dm-2 in fstab, or /dev/mpath/mpath2p1? If the latter, why does the name sometimes change to the WWUID (/dev/mpath/$(cat /dev/random)) when I restart multipathd? (use_friendly_names is uncommented in the config file.) If the whole point of multipath is failover, why does this sequence:
(where /mnt is where I've got this array mounted, obvs) sometimes work, and sometimes end with "I/O error" being logged, and the filesystem being read-only? Is this the sort of thing that the Sun driver will fix? I can't find anything about this.
And I mentioned electrical problems. When we got our servers installed, the Sun guys told us they'd tripped breakers on the PDU and/or breakers in the room's electrical cabinet. Since it had a sign on it saying "100A", I figured we might be running up against power limtis -- either in the room as a whole, if my figures were 'way out, or on individual PDUs. Turns out I was probably wrong: I missed the bit on the sign that said 3-phase, which means (deep breath) we probably have 3 x 100A power available (I think).
It's more complicated than that, because some of it is in 120V, some of it is in twist-lock 220V 30A circuits, and so on. But I should've checked before emailing the faculty member who, in a year or two, will be going into this room (we're there as guests of the department) and happens to sit on the facilities committee. He had asked how we were doing, so I sent him an email -- nice, polite, and including a bit about how grateful we were for the room and the help of the local sysadmins (all of which is true).
I was under the impression that he was asking for info now, so that he could bring it up for action in a few months when we were out. Instead, two hours later when I'm swearing at multipath, in come the facilities manager and one of the sysadmins I was dealing with, looking to find out just how much power we were using anyhow. I apologized profusely, and they were very cool about it. But when the committee guy asks questions, people jump. I had not anticipated this. Welcome to University Politics 101. I emailed again and explained my mistake.
There are lots of remedial courses I could take. However, today I would most like to take "Electricity and wiring for sysadmins".
And on another note: Ack! My laptop's home partition is 93% full! How the hell did that happen?
And again: How did I not know about apt-file? This is perfect!
(Touch o' the hat to Tears For Fears and Steve Kemp; I'm moving closer every day to switching to Chronicle.)
The last few weeks, I've been setting up a small (5 racks) server room with the purchases that $OTHER_JOB recently made: 10 Sun X4140s, 2 — wait, 4 — X4240s, and one Thumper.
It's occupied a lot of my time, and before I lose the impulse, or fall asleep on my feet (second kid up at 4:30am for the last week or so; simultaneous discovery that at 4:30am I have a hard time getting back to sleep), I want to put down the things I learned.
ldap_cachemgr
does not like being told to connect to an IP address (via an entry in ou=Profiles) via SSL, and have the CN be a hostname instead. This took me a while to figure out.But...my first batch of homebrew beer has been bottled, and a second brew day is coming up on Saturday. And apparently I'm not the only sysadmin who brews...though I'm not nearly ready to do all-grain just yet.
Seen while applying software updates to a new Mac at $WORK:
The Aluminum Keyboard Firmware Update will update the keyboard firmware on your aluminum Apple Keyboard. Important: Do not interupt the update, your keyboard will not function while it is being updated.
I guess a mouse crashing is not entirely out of the question...
The good thing about being up at 3am is that, with a laptop, you can keep yourself entertained by whipping up a quick spreadsheet of the rack, switch and console server layout for the new server room.
The bad thing is that you may not trip over Sun's handy-dandy power calculators (like for the X4140 or the X4440 until the next day, leaving you twelve hours to wonder blearily if you've blown your server room's power budget all in one go.
Work...hell, life is busy these days.
At work, our (only) tape drive failed a couple of weeks ago; Bacula asked for a new tape, I put it in, and suddenly the "Drive Error" LED started blinking and the drive would not eject the tape. No combination of power cycling, paperclips or pleading would help. Fortunately, $UNIVERSITY_VENDOR had an external HP Ultrium 960 tape drive + 24 tapes in a local warehouse. Hurray for expedited shipping from Richmond!
Not only that, the Ultrium 3 drive can still read/write our Ultrium 2 media. By this I mean that a) I'd forgotten that the LTO standard calls for R/W for the last generation, not R/O, and b) the few tests I've been able to do with reading random old backups and reading/writing random new backups seem to go just fine.
Question for the peanut gallery: Has anyone had an Ultrium tape written by one drive that couldn't be read by another? I've read about tapes not being readable by drives other than the one that wrote it, but haven't heard any accounts first-hand for modern stuff.
Another question for the peanut gallery: I ended up finding instructions from HP that showed how to take apart a tape drive and manually eject a stuck tape. I did it for the old Ultrium 2. (No, it wasn't an HP drive, but they're all made in Hungary...so how many companies can be making these things, really?) The question is, do I trust this thing or not? My instinct is "not as far as I can throw it", but the instructions didn't mention anything one way or the other.
In other news, $NEW_ASSIGNMENT is looking to build a machine room in the basement of a building across the way, and I'm (natch) involved in that. Unfortunately, I've never been involved in one before. Fortunately, I got training on this when I went to LISA in 2006, and there's also Limoncelli, Hogan and Chalup to help out. (That link sends the author a few pennies, BTW; if you haven't bought it yet, get your boss to buy it for you.)
As part of the movement of servers from one data centre across town to new, temporary space here (in advance of this new machine room), another chunk of $UNIVERSITY has volunteered to help out with backups by sucking data over the ether with Tivoli. Nice, neighbourly think of them to do!
I met with the two sysadmins today and got a tour of their server room. (Not strictly necessary when arranging for backups, but was I gonna turn down the chance to tour a 1500-node cluster? No, I was not.) And oh, it was nice. Proper cable management...I just about cried. :-) Big racks full of blades, batteries, fibre everywhere, and a big-ass robotic Ultrium 2 tape cabinet. (I was surprised that it was 2, and not U3 or U4, but they pointed out that this had all been bought about four or five years ago…and like I've heard about other government-funded efforts, there's millions for capital and little for maintenance or upgrades.)
They told me about assembling most of it from scratch...partly for the experience, partly because they weren't happy with the way the vendor was doing it ("learning as they went along" was how they described it). I urged them to think about presenting at LISA, and was surprised that they hadn't heard of the conference or considered writing up their efforts.
Similarly, I was arranging for MX service for the new place with the university IT department, and the guy I was speaking to mentioned using Postfix. That surprised me, as I'd been under the impression that they used Sendmail, and I said so. He said that they had, but they switched to Postfix a year ago and were quite happy with it: excellent performance as an MTA (I think he said millions of emails per day, which I think is higher than my entire career total :-) and much better Milter performance than Sendmail. I told him he should make a presentation to the university sysadmin group, and he said he'd never considered it.
Oh, and I've completely passed over the A/C leak in my main job's server room…or the buttload of new servers we're gonna be getting at the new job…or adding the Sieve plugin for Dovecot on a CentOS box...or OpenBSD on a Dell R300 (completely fine; the only thing I've got to figure out is how it'll handle the onboard RAID if a drive fails). I've just been busy busy busy: two work places, still a 90-minute commute by transit, and two kids, one of whom is about to wake up right now.
Not that I'm complaining. Things are going great, and they're only getting better.
Last note: I'm seriously considering moving to Steve Kemp's Chronicle engine. Chris Siebenmann's note about the attraction of file-based systems for techies is quite true, as is his note about it being hard to do well. I haven't done it well, and I don't think I've got the time to make it good. Chronicle looks damn nice, even if it does mean opening up comments via the web again…which might mean actually getting comments every now and then. Anyhow, another project for the pile.
Just had a repeat of the weird mouse-X disconnect I've encountered before. This time though, I'm running Debian Etch — so no more blaming the problem on SuSE (as I secretly always did :-).
One noticeable problem this time was that the middle button did not
work, making click-to-paste impossible; I even ran xev
and saw no
events for middle-clicking. (This in addition to clicking being
inconsistent, the client receiving the click being inconsistent,
etc). Running cat /dev/input/mouse0
did not work. What did work
was disconnecting the mouse (a USB 3-button optical jobbie), then
plugging it back in. Sure, coulda been the mouse driver, or X, or
something, but I wonder if the hardware itself — whatever little
controller chip is in there — maybe got wedged. Interesting to think
about…
Yesterday I spent the day at work testing our installation of APCUPd and tidying up the goram rat's nest of network and electrical cables my predecessor left me.
APCUPSd worked with only a few hitches:
/etc/nologin
, stuck there from the shutdown. This prevented a login prompt from coming up even in the console, without any sort of warning. Arghh.As for the cleanup: satisfying. I'm no longer quite so ashamed of the server room.
New Dell 2950 server. 2 x quad-core Xeons, 2 x 6MB cache on each die, 16GB RAM, 6 x 300GB SAS 10K SCSI drives in a RAID-6 array using the PERC/6 controller.
/usr/src/linux-source-2.6.18# time make -j 9 bzImage
[snip]
Root device is (8, 3)
Boot sector 512 bytes.
Setup is 7295 bytes.
System is 1222 kB
Kernel: arch/i386/boot/bzImage is ready (#1)
real 0m22.668s
user 2m20.425s
sys 0m14.537s
That's just insane.
I agree completely with Chris Siebenmann's entry on the utility of keeping a notebook. I've done this almost as long as I've been working in IT, and it's saved my ass repeatedly. Also, the way I keep my journal — random notes at the front working toward the back, daily summary at the back working toward the front — means that it's fairly simple to search for my notes on a particular task, or explain to management just what I do with my time.
I love paper. I tried a PDA for a while; hated it, didn't trust it, and gave it up promptly. Scribbling with a pen is faster, more satisfying, and doesn't make me wait for something to reboot or awaken, or force me to learn a different way to scribble. At the best of times, it forces me to think a bit about what I'm doing or seeing, rather than just typing blindly at the problem. (What, you never do that?)
But while a paper notebook is wonderful, it's not perfect. Here's what would be perfect:
Let me paste screen captures right into my notebook. (I'm talking both screenshots and the log files from GNU screen.)
Let me paste sections of my .history file into my notebook complete with timestamps.
Let me cut-and-paste from my notebook to Emacs (or vi, you heathens), and vice-versa.
Let everything I write or paste be timestamped automagically.
Let everything I write or paste be sync'd automagically to some plain text-like format, suitable for grepping, munging, merging into a database, pushing to syslogd, or what have you.
Matthew Garret's presentation on Suspend-to-Disk make fun reading.
Arlo's sick with flu or something; I was up 'til 1am last night rocking him to sleep. Haven't done that in a while…
Telling detail: I'm about to blow away Debian testing on my desktop machine and install Ubuntu's Gutsy Gibbon. Partly it's because I'm tired of installing 80MB worth of updates every two weeks, and partly it's because it'll make setting up the printer a breeze.
I'll probably leave half the drive aside for good ol' Debian stable, but Ubuntu'll stay there for experimenting and so my parents, on their next visit, will not have to bring out their 4-tonne laptop.
I'll be reinstalling Ubuntu on my laptop as well; due to a stupid
error, I installed Dapper, not Gutsy. I tried updating in one fell
swoop, and after three days of apt-get -f install
I finally got
things working…except for the boot artwork, and GDM doesn't start one
time out of three. Interesting experiment, but I think I'll take a
do-over.
I may even install it twice, so that I can try out The Depenguinator, which appears to be a lot easier than trying to figure out PXE booting for FreeBSD. Unlike OpenBSD, there's no readily apparent "official way" of doing it, and the handful of HOWTOs I've found have contradicted each other. At this point I'm just too lazy to keep trying and seeing what I'm doing wrong.
My workplace just got me a new cel phone: the Sony Ericsson W200a Sony Walkman Phone. The provider is Rogers; minus two points for not letting me make an MP3 into a ring tone, but plus three for letting MidpSSH work. It was a lark to be able to check mail on my firewall box; Mutt was surprisingly useful. No idea how much data costs on the plan I've got, and I don't plan on actually SSHing around very much, if at all…but still, fun. And, as mentioned elsewhere, kudos for including a USB cable and making it show up as an ordinary mass storage device.
My laptop hard drive started giving scary errors a couple days ago on the way to work (I've got a 90-minute commute by public transit [uck] so I fill the time by reading, listening to podcasts, or working on Project U-13). Fortunately, working at a university means that there are two computer stores on campus. I ran out at lunch, picked up a 100GB drive, and had things back to normal by the next morning.
Well, normal modulo one false start with Debian; I decided to try encrypted filesystems just for fun. But then I suspended, came back with a newere kernel, and it could not read the encrypted LVM group anymore. Whoops.
Still lots of free space on this thing, and I'm thinking of installing Ubuntu, FreeBSD and maybe NetBSD just for fun. Of course, I've got to do it all via PXE since this thing doesn't have any CDROM drive, but that just adds to the geek points.
Project U-13 is coming up on 0.0.3, btw; Andy suggested adding Rackmonkey, which looks quite cool. There's no package for it, so I'm having to do some rather ugly scripted installation…but I can stand it for now. And I've got the barest skeleton of a cfengine file in there too. Watch the skies!
E280R takes different SCSI drives than the E220R. Serial ports and SCSI connectors: A Study in Nemesisssysadminss. Discuss.
At work, our mail server is an aging E220R. While underpowered for all it does, it has behaved well, more or less, until recently.
A couple of months ago it power cycled itself for no apparent reason. This weekend, it did the same thing. This is exactly the same behaviour I saw from another E220R at $other_university, and in that case it got progressively worse. Another sysadmin here says he's seen the same behaviour with two in his care. I'm preparing for the worst.
Part of that has meant preparing to move its functionality to another machine; this has been an excellent chance to delve into the bowels of our mail and list system. I've been steadily improving (read: creating) this for some time now, but this points out some bits I hadn't. So that's good.
Plan C is a loaner E280R from the other sysadmin (op cit.). I ran into trouble getting it working, though. First, I couldn't get a serial console working. (Getting a serial port working always seems to be a pain for me, no matter what the machine.) It has two of the old DB-25 ports; no problem, since I had a splitter and had got that working on the E220R. Except that it didn't work: no matter which port I hooked it up to, I couldn't see any output. I tried flipping the key around to diagnostic mode, but I still didn't see anything. (The manual said that you should be able to force output to ttyA by power-cycling the machine and hitting the power button twice when the amber service LED started blinking…but I never saw the blinking.)
This was especially weird to me because I had been able to get output from the RSC card using the same setup: OpenBSD laptop -> usb serial adapter -> DB-9 to RJ-45 adapter -> Cat 5 cable -> RJ-45 on RSC card. (The only difference was that, with the DB-25 port, the Cat5 cable had fit into the back of the DB-25 splitter.) But I couldn't log into the RSC card, and a quick Google turned up no easy way of resetting its password. (Putting it into the other E280 I have, which runs our database and website, was not an option.)
Out of desperation I finally hooked up the Cat5 to the DB-25 splitter on one side, and the console server on the other…and that worked. Damned if I know what was going on.
But then I had another problem: when it booted, I kept seeing line
after line of I2C reset error
; after a while, it would power-cycle
itself and the pattern would start again. I remembered that op
cit. had slotted the second CPU for me, so what the hell: I reseated
it, and that did the trick.
Next up is detaching $failing_machine's second hard drive from the mirror and seeing if I can get it to boot in the 280. Let's hope.
In other news, LinuxFest Northwest is calling for papers. Were that not right around the due date of Project U-14, I might try submitting something and see what happens. Oh well...next beer in Jerusalem!
And there's the laptop battery...shoulda charged it at work.
We had a power outage today at work. The good news is, the UPS' worked. The bad news is, the servers were not set to shut themselves down automatically, and the UPS' ran out literally two minutes before the power came back on. Arghh.
Having a flashlight in the server room is a good thing. So is making sure that your servers are all connected to switches powered by the UPS. So is making sure that you have a laptop with a charged battery and a ready-to-use serial cable connected to your otherwise-accessible-through-SSH console server. So is Sun making an x86-based OS that doesn't hang every time it reboots badly.
In other news: as mentioned on the Dragonfly BSD digest, ICANN
blogs (!). They've taken this moment to let us know that the
address of L.ROOT-SERVERS.NET
has changed. Now you know.
Dude, my laptop screen just turned blue. I'd booted into OpenBSD (4.2) and was trying to figure out how to turn off the audible bell. I'd gone from X to a virtual console to see if the problem happened there (it did), then tried ctrl-alt-f5 to get back to X.
My laptop screen turned from black with white text to grey with grey text to light blue with dark blue text, over the course of a minute or so. I thought I'd suddenly borked the LCD screen, but when I rebooted to Debian it was all fine. Just tried switching to a console, then back to X (alsoin Debian), and that's fine too. Bizarre.
Just checked the logs in OpenBSD and found a series of entries like this:
Nov 1 16:47:17 laptop /bsd: agp_release_helper: mem 0 is bound Nov 1 16:47:17 laptop /bsd: agp_release_helper: mem 1 is bound Nov 1 16:47:17 laptop /bsd: agp_release_helper: mem 2 is bound Nov 1 16:47:17 laptop /bsd: agp_release_helper: mem 3 is bound Nov 1 16:47:17 laptop /bsd: agp_release_helper: mem 4 is bound Nov 1 16:47:24 laptop /bsd: agp_release_helper: mem 5 is bound Nov 1 16:47:24 laptop /bsd: agp_release_helper: mem 6 is bound Nov 1 16:47:24 laptop /bsd: agp_release_helper: mem 7 is bound Nov 1 16:47:24 laptop /bsd: agp_release_helper: mem 8 is bound Nov 1 16:47:24 laptop /bsd: agp_release_helper: mem 9 is bound Nov 1 16:47:31 laptop /bsd: agp_release_helper: mem 10 is bound Nov 1 16:47:31 laptop /bsd: agp_release_helper: mem 11 is bound Nov 1 16:47:31 laptop /bsd: agp_release_helper: mem 12 is bound Nov 1 16:47:31 laptop /bsd: agp_release_helper: mem 13 is bound Nov 1 16:47:31 laptop /bsd: agp_release_helper: mem 14 is bound Nov 1 16:47:38 laptop /bsd: agp_release_helper: mem 15 is bound Nov 1 16:47:38 laptop /bsd: agp_release_helper: mem 16 is bound Nov 1 16:47:38 laptop /bsd: agp_release_helper: mem 17 is bound Nov 1 16:47:38 laptop /bsd: agp_release_helper: mem 18 is bound Nov 1 16:47:38 laptop /bsd: agp_release_helper: mem 19 is bound
Very weird. On the bus, so Googling that'll have to wait. Although I do have the code on that partition…here we go: says it's the AGPIOC_RELEASE
ioctl for agp. Aha! Maybe I'll explain money laundering while I'm at it.
And btw, here's a memo for the world: if you're on the toilet, don't take a phone call. It's really not that important.
Update, October 15 2008: Still happening with OpenBSD 4.3. And for the record, this is a Dell C300 laptop.
Turns out you can get the built-in Broadcom wireless card in my laptop (Dell C400) to work, but it did take me a bit of effort.
First off, I'd been looking at the wrong web page for the BCM43XX project — the right one, as Prakash pointed out, is much more up-to-date.
Second, again at Prakash's suggestion (thanks for that!), I downloaded the drivers for the Dell 1370. Running the .exe in Wine extracted the .sys file successfully. However, when I pointed fwcutter at them I got this message:
Sorry, the input file is either wrong or not supported by b43-fwcutter. This file has an unknown MD5sum 8d49f11238815a320880fee9f98b2c92.
So that .sys file was one not supported…at least, not for a while now. That commit message was one of the few I could find that mentioned this number. So I checked out revision 396 from the Subversion repo, compiled it and pointed at the sys file…success! Extraction!
Except that it still didn't work:
bcm43xx: Error: Microcode "bcm43xx_microcode5.fw" not available or load failed.
Turns out it had extracted all the files to /lib/firmware/bcm430x_*
,
rather than /lib/firmware/bcm43xx_*
. Quick little shell-fu:
for i in bcm430x_* ; do j=$(echo $i | sed -e's/bcm430x/bcm43xx/') ; sudo ln -s $i $j ; done
and it worked when next I inserted the module…working right now, in fact, despite lots of error messages like:
bcm43xx: WARNING: Writing invalid LOpair (low: 0, high: -115, index: 120) [<d0ba6ebb>] bcm43xx_phy_lo_adjust+0x1e6/0x223 [bcm43xx] [<d0ba7d04>] bcm43xx_phy_lo_g_measure+0x915/0xaeb [bcm43xx] [<c01eb6db>] bit_cursor+0x479/0x48e [<c02a4416>] __sched_text_start+0x686/0x73b [<d0b9dde4>] bcm43xx_periodic_work_handler+0x15c/0x407 [bcm43xx] [<d0b9dc88>] bcm43xx_periodic_work_handler+0x0/0x407 [bcm43xx] [<c0130260>] run_workqueue+0x7d/0x109 [<c0133308>] prepare_to_wait+0x12/0x49 [<c0130a5d>] worker_thread+0x0/0xc7 [<c0130b17>] worker_thread+0xba/0xc7 [<c01331f5>] autoremove_wake_function+0x0/0x35 [<c013312e>] kthread+0x38/0x5e [<c01330f6>] kthread+0x0/0x5e [<c01049c3>] kernel_thread_helper+0x7/0x10
in the kernel log.
No idea why I had to go through so much rigamarole, but hopefully this will save time for someone else. Oh, and for the record: this is with Debian Etch, 2.6.22 kernel from backports.org.
I ordered the 4.2 CD set of OpenBSD at work, in another optimistic step toward reorganizing the firewall there. In order to (ahem) road-test it, I installed it on my new laptop (which, you'll recall, is running Debian Stable) in a 5GB partition I'd left for just this purpose.
Onboard wireless, like with Debian, did not work, and I didn't expect it to; fuck you too, Broadcom. But my dad offered to send out a couple of wireless cards he couldn't use, and I figured one of 'em would have to work.
One was a Broadcom (op cit.), so that was out. The other, a DWL-650 (which appears to have umpty different versions over the years with not one change in model number) looked promising: a Realtek chipset, so should be good, right?
Well, it worked on OpenBSD -- but not in Linux. There's no driver in
the tree for it, and the outside project to make drivers for it
had its last official release in 2005. What's more, the CVS
version, for some reason, removes all of its source files when I
compile it, then complains that there are no files left to compile. To
be fair, I think this is because of a makefile included from
/lib/modules/2.6.22-2-686/build
rather than the code itself.
Update: Just read Tourrilhes' page on the RealTek driver, and learned something: there's a fork/resurrection of the project I'd looked at, and it appears to be relatively current. I'll have to take a look. SooperUpdate: the new project fixes the let's-delete-all-the-files problem. Score!
What OpenBSD does not do on this laptop is suspend -- or more accurately, come back from suspension. This works reasonably well under Debian, which means that I still have one rose to give away to The Next Laptop OS for Saint Aardvark.
The laptop I bought off eBay arrived at work on Wednesday...which is my day at home with Arlo. Thursday I was off sick with flu. Yesterday I was back at work and slashing open the box it came in, eager to see what I'd got.
Well, I already knew: it's a Dell C400. 12" screen, 1.2GHz P3 (but running at 800MHz with SpeedStep and all), 256MB RAM and a 30GB drive. Not a whole lot of memory, and a bigger hard drive would always be nice, but I can always upgrade. There's no CD drive in this thing, and I hadn't plumped for the docking station, so I set up PXE booting to install Debian. It was a trifle slow, but it worked! (Especially the second time, after I'd accidentally overwritten Debian trying to install OpenBSD on another partition. :-)
I'm surprised at how much Just Works in this thing: X.org (no configuration needed, just start up XDM...mann, that's nice), suspend-to-disk, ethernet (well, it's a 3c905; what do you expect?). Even the battery, which I'd written off in advance, appears to hold a decent charge -- about four hours so far. The one thing that's dicy is the onboard wireless, a Dell 1370 from everybody's favourite company. But again, I'd written that off in advance.
Next up: I've ordered the OpenBSD 4.2 CD set, so I'll be installing that once it arrives. And Noah has shown the way to longer battery life; I'm getting my 2.6.22 kernel now from Backports. (Oh, the shame of not compiling my own kernel...)
On another note, I think someone had one too many Dilbert moments:
$ dig newcastle.edu.au mx
; <<>> DiG 8.3 <<>> newcastle.edu.au mx
;; res options: init recurs defnam dnsrch
;; got answer:
;; ->> HEADER <<- opcode: QUERY, status: NOERROR, id: 2
;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 4, ADDITIONAL: 4
;; QUERY SECTION:
;; newcastle.edu.au, type = MX, class = IN
;; ANSWER SECTION:
newcastle.edu.au. 11h59m12s IN MX 10 proactive.newcastle.edu.au.
newcastle.edu.au. 11h59m12s IN MX 10 synergy.newcastle.edu.au.
Perhaps they got the names from /dev/bollocks.
Just updated my resume for the first time since starting my current job. It's nice to look back at what you've done and realize that, hey, there's been a lot.
In other news, I finally gave in to lust the other day and bought a Dell C400 on eBay. Nothing too special — 1.2GHz, 256MB, 30GB hard drive — but I was mainly after the 12" screen, so that I'd be able to (say) debug raw ethernet frames on my daily commute. About $280 when all was said and done; the strong Canuckistan peso was part of the incentive to buy now. Should be at the office in a week or so, and I can't wait.
It amazed me to see how many off-lease laptops were available, and just how cheap you could pick them up. A whilte back my boss got a D420; with extra memory and a few other things, it came in at about $1700 or so Canadian. But if you look around, there are plenty of D400s and D410s around for less than $500 — even less than $400 if you look hard. Add another $100 (say) for a working battery, and you're in pretty good shape.
Virtualbox has made it to Debian testing — hurrah! Only it won't run (Open)?Solaris. Dang.
On Tuesday, I'm giving a short presentation on my work's subnet at SNAG, the UBC System and Network Administrator's Group. I found Bruce in OpenBSD's ports tree on my laptop; the documentation is (ahem) thin, but it works. Wish me luck.
And there's Arlo up. Time to go get him.
...it's another. Busted CPU on a Sun 440 at the university across town meant I spent a bigger part of my day on the bus than usual. Remove the CPU card/assembly/whatever (god, they're mother huge) and we're back in business.
Incidentally, it amazes me that you can turn up fully spec'd V440s on Ebay for, like, $8000 US. 4 x 1GHz CPUs, 16 GB of RAM, 4 x 72GB drives...what's not to like?
Just when I was about to sign off for the day, suddenly the mail server's down. No response to pings, no response on the console server even. It's an old E220R, and while it's underpowered for all we're asking from it, I haven't had problems with it before. (Well, except for the CDROM drive not powering up. But I can live with that.)
So drive into work with the wife and kid, on the off chance that it'll all be fine quickly. No such luck. It hadn't walked away, the cables were all still in place, and I had to power cycle it to get it to come back up. A lot of fscking later, and I'm waiting for it to finish booting. I can't remember what it was like the last time I rebooted it, but this time it seems rather ridiculous (20 minutes). More stuff to add to the documentation once I'm done…
And once more: sysadmin documentation MUST NOT depend on external services. (The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.)
Time for pizza.
My wife was using her iBook tonight when alla sudden Apple Mail
said the Inbox was read-only. Wha'? Couldn't remove or create files
from the Terminal, and /var/log/system.log
showed this message:
kernel: disk0s3: I/O error kernel: jnl: do_jnl_io: strategy err 0x5
A lot of scary messages turned up in the search results about
replacing hard drives, memory and mainboards, but I decided to try a
fsck
for the fun of it. Splat-s sent the Apple into single-user
mode, and then fsck -f -y
said the volume had been repaired
successfully. Reboot and things look good: I can create and remove
files, and Apple Mail is fine. Interestingly, the disk said it had an
extra GB free compared to before the reboot.
The drive is old, and may still need replacing. Thankfully, I've set up a cron job on this thing to rsync the home directory daily to another machine.
This morning I noticed these entries in the logs of my monitoring machine at work:
hda: drive_cmd: status=0x51 { DriveReady SeekComplete Error } hda: drive_cmd: error=0x04 { DriveStatusError } ide: failed opcode was: 0xef hda: task_no_data_intr: status=0x51 { DriveReady SeekComplete Error } hda: task_no_data_intr: error=0x04 { DriveStatusError } ide: failed opcode was: 0xef
After a lot of Googling, I managed to find a few things that explained it:
hdparm -K 1 /dev/hda
. Sure enough, running this produced the error in the logs, and this one on the screen:setting drive keep features to 1 (on) HDIO_DRIVE_CMD(keepsettings) failed: Input/output error
This is a completely benign error, of course…I really don't care if we have to run hdparm with every boot. I had also tested the drive by booting into Knoppix and md5summing every file on the drive — no errors produced at all.
Don't know what's worse — wasting two hours on this, or not noticing it before now. At any rate, this failed opcode appears to be completely harmless.
I bought a T60 for my boss a while back, and have just finished putting in another memory module. Man, I knew this was the lower end of their laptops, but I had no idea it would feel so cheap.
To get at the memory, you take out a few screws on the back, then lift off the palm guard below the keyboard. It's flimsy plastic, and it's hard to get back in the right place - doubly so, since it feels like instead of clicking into place it's going to break. And you need to remove the ribbon that connects the touch pad and fingerprint reader in order to fully remove it; when putting it back in, it looks like it's going to get crimped. That can't be right.
I had been considering getting one of these, despite having fallen in love with my other boss' Dell D420. But this just makes me think that the extra money for the D420 would be worth it. Of course, I haven't had to crack that one open yet…
While doing some work on one of my WRT54-GL routers last night, I managed to bork OpenWRT: after a reboot, the power LED just kept flashing, and there was no response at its usual IP address. I could ping it on 192.168.1.1 (though, weirdly, I'd only get 3 reponses every 30 seconds or so), but neither telnet nor SSH was working.
Some folks suggested getting out the serial cable, or shorting pins on the flash chip, but a simple TFTP did the job.
Now to get OpenVPN going again, and this time without breaking the damn thing!
Thursday: Go to The Other University to do some prep for the move coming up next week. Check in with their computer store (where you pretty much have to buy things) to see how the order on the console server is going. The guy behind the counter looks up the order, frowns, and tells me that it seems their supplier does not have one in any of their three Canadian warehouses. Okay, so how long will it take to get one in? He looks at me earnestly and says that, sometimes, they never come in. I ask at what point I can count on the supplier a) giving up and b) informing me of that fact. He frowns again, and suggests that I check back in a couple weeks (four weeks after I've placed the order) just to be safe.
Friday: Get email from contractor/university liason for new building to say that network and electrical connections will not be ready in time because the requests were received so very late. While The Other Guy was supposed to get them in long ago, I should've been on top of this.
Monday, a stat in Canada: Go to the old building to do a serverectomy on a soon-to-be-formerly shared rack. The Other Guy mentions that the new server room has water on the floor. I go over to look, and it's a rapidly evaporating puddle, irregular in shape and maybe two metres across at its widest. I can't figure out where it's coming from. Turns out there's some other stuff that should become formerly shared as well, so I spend time poring over Sun Enterprise 1 workstations (which I like) and old inkjet cartridges for printers that may no longer be around (which I don't like). Ask The Other Guy, who's been involved with the move a lot longer than I have, what electrical connections he's asked for him and for me (long story) in the new building. He says that he gave them the model number of the Sun rack he's got (which has built-in, and very nice, PDUs) and asked them to figure out what he needs.
Tuesday: Moving day. As expected, network and electrical are not present; we've got 2 x 15A 120V circuits. Also, the leak is back, and we can see that it's coming from a small leak in the concrete roof. I move my rack into another room; The Other Guy spreads a blanket over his rack. The liason promises us that the contractors are on the job to fix the roof. The network connections (two fiber, two Cat5) get terminated, so I call the local network folks to get that taken care of. The university wireless network is not present in the new building.
Wednesday: The contractors show up to start fixing the leak. The network connections have been set up. The contractors have put in a big tube of plastic sheeting, taped to the roof at one end and a 40-gallon recycling barrel at the other. The Other Guy decides things are good enough and starts setting up his rack; I elect to hold off another day.
Thursday: The contractors say the roof is fixed, so I move the rack in and start hooking things up. The new OpenBSD firewall comes up nicely -- thank you, pf developers -- as does the main Sun server. Next up is the SunRays in the lab, only they're not. I take my laptop in and try to verify connectivity. I can't. The Other Guys suggests that the VLANs on my new switch are the problem and suggests just simplifying things. I do and keep testing. Traffic from the laptop's RFC 1918 address just never makes it to the server. In a fit of desperation I try using an address in our routable subnet, and it works. This takes me until 8pm to figure out. I email various bosses explaining how far I've got, and the campus network folks to ask if they're filtering this subnet in some way. (This isn't completely out of the question; this place has a reputation for a pretty locked-down network.)
Friday: I buttonhole the guy at the campus network office and ask him about this. He considers this and realizes that while he's forgotten to unblock DHCP (told you it was pretty locked down), the other behaviour I'm seeing can be explained if I've somehow got my interfaces crossed. I'm doubtful but give it a try, which is a good thing because suddenly everything works. I don't understand it or what I did wrong, but assume that I was simply too tired the previous night and thank him profusely for taking the time to talk to me. I am now where I should have been twenty hours before. Mighty battles emerge with Sun's DHCP and Sunray servers. In the end, I have to delete the Sunray configuration, delete all DHCP configurations, and then add the Sunray configuration back. This works, which annoys me; why are there all these opaque configurations around? Not a single plain-text file in sight. I manage to get a printer working, then another. DHCP is modified so that laptops work as well. I call it a night and head home.
Checked my email this morning and saw that backups of my wife's computer had timed out. Weird, I thought, but didn't look into it further. Then my wife comes out and says, "Hey, my computer's having a stroke.". Uh-oh.
So I have a look and it's constantly, randomly, power cycling. It will get to the Ubuntu splash page then shut off, then get halfway through the BIOS check and shut off, then get halfway through boot and shut off, then stay off for two minutes, then turn on again. WTF?
First thought is cooling, of course. But the power supply feels cool to the touch, and when I get to the BIOS temperature page it says the CPU is at 51C -- eminently reasonable. (Then it shut itself off.) Okay, flaky RAM? Wonky graphics card? Dying, though not from lack of cooling, power supply?
Then it makes it all the way to Ubuntu's login page. I switch to a
console and start looking at logs. This thing has been rebooting all
night -- as in log messages about how shutdown has been invoked. And
then I check /var/log/acpid
and I see lots and lots and lots of
entries saying that event POWERBTN (or some such) had been receieved,
so Ubuntu was executing /etc/acpi/powerbtn.sh and shutting down
nicely. And then I saw a broadcast message from root saying that the
system was going down for reboot NOW!
Tempted to just try booting w/o ACPI, but I think that would just mask the issue. Back to Google...
(Note: this was actually written back in May.)
Top Tip: Filenames with a tilde in them can confuse Samba.
Case in point: last week a user was
having problems loading his profile: W2K kept choking and saying that
the file Local Data\Applications\foo\backup\~AvariciousMonkeys.c
was
in use. Naturally, lsof on the Samba server turned up nothing, and I
couldn't see any obvious problem. On a hunch, I tried renaming the
file to AvariciousMonkeys.c~
, and hey presto! goodness all
over.
This week I'm trying to get FAI going in seriousness. I've worked on it before, but now I've got three developers who want to switch to Linux. The last thing I want is another series of one-offs, so I'm taking the time to do it right. Now there's a CD version in beta, and so far it's working well. Cf. the usual way of doing it, which is to do PXE booting and grab everything off the network. I'm not opposed to that, but one of the things I wanted out of FAI before was the ability to do CD-based, kickstart-like Debian installs; looks like it's finally going to work.
Looks like we're having a problem with a Maxtor PCI IDE controller and the Intel mobo in our backup server. It's been mysteriously crashing in the middle of the night w/no log messages. Some checking in the BIOS turned up another problem: going to the hardware monitoring page to look at the CPU temperature made the damn thing freeze. WTF? Sure seems like the symptom we were seeing, and backups running at night make big use of the Vinum array that uses drives attached to the IDE adapter...long story short, taking out the card stopped the BIOS freezing. It remains to be seen if it'll work for the random midnight freezes, but it's good to have something to try. I'm hopeful that FreeBSD will be able to handle SATA drives attached to this thing...we'll have to see.
Which brings me to the next bit: fleshing out plans for server upgrades. As I mentioned, last week we had a power supply fail on our Very Important Server, and I want to try and keep that from happening again. Of course, adding umpty thousand dollars worth of hardware to your budget four months before the end of fiscal doesn't really work too well, so as much as possible I need to do this w/o new hardware. Ha! But I'll give it a try.
First off is setting up OpenLDAP and importing Samba's information into it. That'll be neat, since I've never worked w/LDAP before. Second is to set up some BDCs using OpenLDAP to query the master. (Or do they just suck over the whole database? Hm. Either way.) Third is to set up some Linux machines. Why? Two reasons:
LinuxHA and DRBD seem fantastic, and there just doesn't seem to be anything comparable on the FreeBSD side. As for the hardware...well, my first impression of server hardware from IBM, HP and the like (no, don't talk to me about Dell) is that I'm going to need a newer version of FreeBSD than we currently use in order to run SATA drives. (I know SCSI is the way to go, but I was quoted two thousand dollars for two IBM 73GB 15k drives! I know: 15k, IBM, etc, but even halving that means two -- two! -- 73GB drives for a thousand bucks, a/o/t two 200GB drives for, what, four hundred. Heh.)
We're using an older version of the 4-series FreeBSD here. I've already set up one server using a newer 4-series release, and it's a pain: too many differences, one more thing to keep in mind when making changes, and so on. I haven't worked with the 5-series yet, and I don't want to start now...not entirely sure that it'd work for us. Plus, we'll probably migrate to Linux anyway, so I don't mind doing it for a server.
Anyhow! Get a Real Server and throw Linux on it. Hook it up to our drive array and start migrating home directories to ReiserFS from UFS/FreeBSD. Not trivial, but doable. Add more Linux servers as budget allows.
I got the iBook, I got the Slashdot t-shirt, I got the beard...but do you think I can get a wireless signal? Oh no. Thanks, Broadcom. But hey, enough complaining. Time for an update.
The wireless ISP is gonna do a point-to-point link between windows of our old and new temporary offices. Should give us 100Mb/s access or so. Which is good, because for a while I thought I'd have to walk down to London Drugs, grab some Linksys routers, and install my own firmware to do it. Which would have been a lot of fun...but would have been a fuck of a lot to get ready in, like, three days. Now I just have to get OpenVPN talking at either end, get Shaw installed, and set up a firewall. Oboy.
And then there's the troubles I've been having with our backup server. A while back I decided to start racking all the boxes we've been using as servers -- transfer the hard drives to proper servers, then use the old shell as a desktop for a new hire. Welp, the backup server was the first to go, and man it's been a headache.
First off, I didn't take care of cooling properly, and the tape drive (HP Ultrium 215, for those paying attention) suffered a nice little nervous breakdown and kept spitting out the tape. I tried downloading the HP diagnostic tool, but it only runs on Linux and the server runs FreeBSD -- neither Linux compatibility mode (not surprising) nor a Knoppix disk (kept hanging) allowed it to work. So I had no real idea what was going on other than the drive was too hot for my liking.
But HP, bless their souls, came to the rescue. Once I made it through their speech recognition voicemail tree hell, they just sent out another one -- they didn't even bitch about not being able to run the diagnostic tool. Not only that, it came the next day, and we don't even have any special contract with them -- that's just warranty. Thumbs up for them.
But now I've got different problems: the damn machine keeps seizing up
on me. See, I've got this 500GB concatenated Vinum array of three
disks that I use as a copy of yesterday's home directory for people,
and I'm trying to move it to a four-disk RAID5 drive on the Promise
array. I tried using rsync, and it just froze...but eventually. I
thought maybe rsync was spending too much CPU time figuring out what
to transfer, so today I tried using dump | restore
-- and sure
enough, it froze again.
I plugged in a monitor, hoping for a panic or something, but nope -- just unresponsive. I've found some mention in the FreeBSD mailing lists about possible problems with write caching and the Adaptec 3960D SCSI controller (which I thought was a 39160 SCSI controller, but I guess not). I'll have to see if that does the trick or not -- but in the meantime I'm wondering how I'm gonna get yesterday on the Promise. Of course, figuring out why it's crashing in the first place would be even better...
But it's not all bad news: earlier this week, the support manager at Promise that I've been dealing with called to tell me that the word had come down from on high. Yep...Promise is going to follow the GPL and properly release the Linux and Busybox source code for the firmware that goes into the VTrak 15100. Hurray! I'll have to watch, of course, and make sure it shows up...but it sounds good. "Let's put it this way," said the manager. "It's on my desk for me to do. And I don't want it there for long." To the home front, now.
As if I didn't have enough on the go, I've blown my tax return on the makings of a MythTV backend: 2.4GHz P4, umpty-GB hard drive, the PCHDTV-300 (get it while you can!), generic 128MB Nvidia (no onboard video on this mobo, or I would've stuck with that), a Hauppauge PVR-500MCE, and a nice Asus mobo in an Antec case to tie it all together. Random notes:
And now for something completely different: new mottoes for Harley Davidson:
"Harley-Davidson: Because social contracts are for weak pussy-ass losers with small dicks."
"Harley-Davidson: Because those other people aren't really human. Not like you and me."
"Harley-Davidson: You deserve it. So do they."
"Harley-Davidson: Because if you pissed in their faces, you'd be arrested."
"Harley-Davidson: Because 'Fuck you!' is just too damned hard to remember."
"Harley-Davidson: Because 'Fuck you!' is just too damned eloquent."
Less and less impressed with Promise. Here's what I had done: while doing some copying onto a logical drive, I yanked one out. I wanted to see what would happen, what would need to be done, and so on -- I don't want to be figuring this out for the first time when it happens. Well, it started beeping, and the event log said that the logical drive was critical. Start rebuilding, right? Wrong: policy for that drive was set to non-auto-rebuilding. Try turning that on, and it doesn't work: keeps saying it's non-auto-rebuilding. Manual for the VTrak:
if your fault-tolerant logical drive goes offline, go to the Promise website (www.promise.com) and download a document calledArray Recovery Procedure.
Damn good thing I'm not doing this for real. Go to www.promise.com and type "array recovery procedure" into the search bar. The result:
Microsoft OLE DB Provider for ODBC Drivers error '80040e14' [Microsoft][ODBC SQL Server Driver][SQL Server]Cannot use a CONTAINS or FREETEXT predicate on table 'product' because it is not full-text indexed. /search_insert_eng.asp, line 34
Fuck me. Use Google to find the document, which has instructions for the UltraTrak, the predecessor to the one I've got. Hope it still applies and read on. It sez to reboot the array (!) in order to trigger the rebuild (!!). Sure enough, it works. Oh, and have I mentioned they still haven't sent me the SNMP OIDs/MIBs after six weeks of calling their technical support manager? FUCK ME.
Welp, the Promise array is here at last. I don't have any disks yet -- they're coming in next week -- but I've had a chance to play around with the firmware. First off, it's running Linux, just like JWSmythe said. The firmware that came with the box said "Now uncompressing Linux..." at boot time; it may be indicative of something that the newer firmware says "Now uncompressing kernel..." Promise doesn't mention anywhere on their website that the 15100 uses Linux, which surprises me a little. They also don't offer the source code anywhere. I've sent 'em an email asking about that; their autoresponder said I should hear about that today.
Second, I've yet to figure out how to enable SSH on the thing, and I'm increasingly lacking confidence that it even offers this, even after the firmware upgrade. Naturally, this is in strict contrast to what's listed on the website. I've sent them an email about this.
Third, I've yet to figure out how to monitor the thing by SNMP. I can run snmpwalk, sure, and I get info back, but but I don't see anything like network traffic or disk stats or anything. (Compare and contrast with the PDU from APC, which included the SNMP schema [if that's the right word] on the CD.) Then again, this may be because I haven't got any disks in there. We'll see.
Fourth, it looks like there was corruption of the firmware. Got it in yesterday, booted fine, upgraded firmware by TFTP, all good, turned it off before going home (and not for the first time that day, either). This morning I booted it, and things were just wrong: the network address was obviously bogus and couldn't be changed, various menu entries were showing garbage instead of "Promise VTrak 15100" or whatever, and so on. I called tech support, who told me the secret:
Note: if you fry your array by following this advice, you're on your own. But it worked for me. Of course, this doesn't explain why it happened in the first place. I'm going to be watching it carefully.
Funny moment: While waiting for me to figure out how to reboot the array [which took a few minutes because of the menu corruption I called to complain about], the techie I was talking to was having a conversation with someone else. "Are you reading? [pause] Okay, are you working on projects? [pause] It's okay if you're using the web to work on projects. [pause] But if you're just surfing the web looking for a job, that's not working on projects."
Second funny moment: The warranty registration page on the Promise website asks for suggestions and comments to "help us imporve in the future." Third funny moment: When registering the extended support, the page that asked for the value of the product purchased barfed with "Internal Error" when I put a dollar sign in the amount. (Okay, so I'm just easily amused.)
Finally, it's just plain odd to be asked for your bona fides by your power bar:
- Access: Enabled
- Protocol Mode: SSH Version 2 only
- Telnet Port: 23
- SSH Port: 22
- Advanced SSH Configuration
- Accept Changes : Pending?- Help, esc- Cancel Changes, enter- Refresh, ctrl -L- Event Log > 6 LICENSE AGREEMENT By enabling this security feature, you are agreeing to the following statements: A. This Product includes cryptographic software subject to export controls under the U.S. Export Administration Regulations. You agree to cooperate with American Power Conversion Corporation as reasonably necessary to ensure compliance with the laws and regulations of the United States and all other relevant countries, relating to exports and re-exports ("Export Laws"). You shall not import, export, re- export or transfer, directly or indirectly, including via remote access, any part of the Products into or to any country (or its nationals or permanent residents) or to any end user or end use for which prior written governmental authorization is required under applicable Export Laws, without first obtaining such authorization. By ACCEPTING THESE TERMS, you are representing and warranting that neither your use nor your receipt of any part of the Products requires prior written authorization under any Export Laws. You are responsible for complying with any local laws in your jurisdiction which may impact your right to access or use this product. B. By ACCEPTING THESE TERMS, you are representing and warranting that (1) you are not located in or a national of any U.S.-sanctioned or terrorist-supporting countries, (2) identified on the U.S. Treasury Department's List of Specially Designated Nationals, the U.S. Commerce Department's Entity List, or the U.S. Commerce Department's Denied Parties List; or (3) engaged in any proliferation-based or terrorist- supporting activities. Do you accept the terms of this license agreement? Enter 'YES' to continue or ENTER to cancel :
I was shopping for a new rack and the necessary accessories, when I came across the power bar you can SSH to. That's right: not only does it have a digital readout on the thing that lets you know how much power/current you're drawing (and oh man, does that ever make this thing worth it; I'm scared to plug in new machines right now for fear I'm gonna trip a breaker), but you can ssh to the damn thing. There's even a "how to recover a lost password" procedure.
Well, I did the right thing today -- twice. Damn right I'm bragging.
First off, it turns out that the FreeBSD Foundation has run into a (good!) problem: its donations have been too big. In order to keep its US charitable status, it needs to have two-thirds of its donations be relatively small. Due to a couple of big donations, this ratio is a little out of whack at the moment, and they need a bunch of small donations.
Welp, I've been administering FreeBSD systems for a living for...well, I was gonna say four years, but it's more like two and a half or three. I've been working on them for four, though; my rent and food has been paid in large part because of the generosity of the people who put together FreeBSD. A donation went off in short order.
Then I remembered that I've been meaning to join the Free Software Foundation for a while now. The motivation is the same: I've been paying my bills for a long time now (and enjoying myself immensely in the process) because of the generosity of Free-as-in-Freedom software people: Stallman, Torvalds, Wall, and a zillion others. I have a hard time imagining what I'd be doing now without Free software; I suspect that, if I was lucky, I'd be working as a grocery store manager right now. So: off to the FSF website to sign up for an associate membership.
And what did I find but two, count 'em TWO cool things:
If you refer three people to the FSF for associate memberships, RMS or Eben Moglen will record a message for you, suitable for voicemail, Hallowe'en or impressing the ladies. I did a quick search on Google, but couldn't find anyone with the link...damn shame. Better than a free iPod, cooler than a CmdrTaco TiVo -- join the FSF and get RMS to say "All Hail Liddy!"
The FSF is looking for a senior sysadmin. God, that'd be cool. Decent enough pay (no, it's not the sort of job you take because of the money, but it's nice to think about), all the Free software you can handle, and an IBM Thinkpad to run it on. Of course, I think I'd have some 'plainin' to do about the laptop I'm writing this on...and, of course, it would mean living in the US. Frankly, that scares the crap out of me these days. Goddamned PATRIOT Act...
In other news, work continues apace. We're losing two coop students and gaining one, gaining another full-time person, and I'm still trying to get my RAID array -- credit app is with the boss, and after that's done the order'll finally go in.
Rough guess (wild hope) at this point is that it'll be in my hands in mid-January, which won't be a moment too soon. There's a new Linux server I'm setting up that I'm desperately hoping won't have problems due to proprietary kernel modules in the software I'm installing. (I'm just writing myself further and further out of that job, aren't I?)
And I'm wondering if the simplest way to get Nagios to make sure the
right machines are exporting the right filesystems is to check if amd
is mounting them correctly. (No matter whether the machine or amd
fails, something needs to be fixed.) Or maybe I just need to figure
out the right wrapper for showmount -e
.)
On the spam front: good god, what a smoking hole Movable Type is turning out to be. First there were the license changes, then the comment spammers (who seem to be posting a lot more aggressive to MT than to WordPress)...Of course, comment spam affects all blogs, not just MT. Still, this whole idea of rebuilding static pages every time the stars move seems to be causing them a lot of trouble. (Yep, that last sentence was pure FUD. Or bullshit.) And okay, no, I don't use MT, so what precisely is my beef?
As I'm not going to put up, I should shut up. I still have to upgrade WP -- though according to this posting, there are still lots of XSS issues left unfixed. I'm also upgrading PHP, and I should probably use ApacheToolbox to do that automagically, rather than periodically editing my own Makefile.
The release party for Where Are They Coming From? came off JUST FINE, thank you. EVERYONE was there. Top Stars include Topo, Phil Knight and Mos Def, fresh from the set of HHGTTG. Uh huh.
Further thoughts on the MySQL + GPhoto2 thing: gphoto2 does have the ability to pipe to STDOUT, which I don't think I knew...maybe it won't be as much work to insert directly into a database as I thought. Might even be able to do it as a Perl script.
Finally: what a gorgeous day. It's downtown Vancouver on the back steps of the Art Gallery, it's sunny (in December, too) and just cold enough to make you go "brr". The skater kids are practicing their synchronised jumping -- just in time for the Olympics, I'm sure. A far-too-generous co-worker has handed out chocolate, another has handed out home-made rum and brandy balls, and I'm taking off early to go drinking with a third. Feeling pretty damned good right now.
Update: Too bad Topo's not so great -- fever of 102.8F, as of a couple minutes ago. (Still haven't figured out what that is in Celsius; bad Canuckistanian!) It's down a bit from earlier this afternoon, though, so I'm thinking good things. And these pages say to not worry if it's less than a couple days, so I'm not worrying. Nope.
After a lot of consideration, and some reassurance from JWSmythe, I'm going with the Promise VTrak 15100 array for work. It has almost everything I want: serial ATA, dual SCSI adapters, and an ethernet interface. The downside is that Promise doesn't have an office in Canada, so there's the possibility that getting parts across the border could be a problem. However, there's a local company that'll do service, so that makes me feel better.
The other options just weren't as good: one was parallel ATA and had no ethernet interface. The other was the Fastora DAS-315, which certainly looked good -- but the local resellers couldn't be bothered to give me the time of day, let alone answer the questions I had. Best bit: when I asked for a copy of the service level agreement, the sales guy replied that he'd "have to see" if he could release it.
And at home, I've been running into problems with bridging, the 2.6.9
kernel and the 8139too driver. I thought I would enable bridging on
Thornhill for some User-mode Linux fun, so I enabled it as a
module, then rebuilt and reinstalled the modules. However, when I
tried inserting it, I got unknown symbol:
br_handle_frame_hook
. Okay, what about rebuilding the kernel and
including bridging within it? Tried that; when I booted, the kernel
panicked as soon as it came time for the onboard 8139 interface to
grab an address by DHCP.
It was similar to the earlier problems I had with the Shuttle, in that if I took out the ethernet cable everything was fine -- it was only when the response came in that the kernel panicked. And keep in mind this was without setting up a bridge at boot time, or anything like that. I had to go to the backup 2.6.7 kernel in order to calm things down.
I found this thread on LKML, and it seems to match pretty closely what I saw -- the stack trace matches what I saw; I wasn't able to see the whole message, because it would scroll off the screen. However, I'm reluctant to try this patch; I spent a whole evening rebooting (Sorry, Aaron) and trying different things before I finally confirmed that having bridging in the kernel was just a bad thing.
Interesting bit: I didn't realize that Linux does not have panic core dumping built into the kernel, as FreeBSD does; it's only available as a separate patch. Minus one for Linux.
Finally, it's the day after the office Xmas party, and what am I doing? Heading into work to unplug everything. The power is being shut off in our building (thirty-floor or so high-rise) while upgrades are done, so I'm shutting everything down and disconnecting it just in case. Tomorrow I go back in to reverse the process. Whee!
Here's a few more details on the problem with the new Shuttle. First, the card is a DLink DFE-530TX; the Shuttle is an SK43G. If the DLink is connected to my internal network switch, and from there to the gateway box, this sequence will make it freeze:
Interestingly, if the network cable is unplugged, the problem doesn't show up...so it appears there's something about the response to the three-way handshake is what's causing the problems.
I managed to find some reports of wireless cards locking up hard with the VIA KM400 chipset, including cards from DLink. I tried setting all the IRQs to "Reserved" in the BIOS, and that didn't work; however, the card was grabbing IRQ 17, and the BIOS wouldn't let me reserve that one. I also tried upgrading the BIOS, and that didn't work either.
I'd love to pursue it further, but it's now officially the new webserver; I wanted to get it installed while I had a day to fool around with it and get everything working. So far there don't appear to be any problems.
And now, of course, I've got what used to be Thornhill as my desktop machine: P3 500MHz, 640Mb, and a new 160GB Seagate Barracuda. Once again, I'm going with Debian, God's own distro. Still gotta come up with a name for it.
I'm currently trying out KDE and Konqueror -- usually I use IceWM and Firefox, but I thought I'd give something fancier a try now that I've got a slightly hibbier machine. It's not bad so far, although having to set up all the keyboard shortcuts that come with Ice is a little annoying. We'll see how long it lasts.
SK43G, Sempron 2200. eth0: Via Rhine driver -- DLink 350TX? I'll have
to look it up. eth1: RealTek 8139 onboard. ifconfig eth0 192.168.0.1
netmask 255.255.255.0 route add default gw 192.168.0.254 (log in as
self) ssh 192.168.0.254 BAM -- freezes hard, and even the Magic SysRq
key does nothing.
Reboot... ifconfig eth1 192.168.0.1 netmask
255.255.255.0 route add default gw 192.168.0.254 (log in as self) ssh
192.168.0.254 Password: BAM! (the good BAM, this time)
Yay! No BIOS
upgrade required maybe! (UPDATE: Spelled out which one was eth1 [the
onboard Realtek]. What a maroon!)
The sumbitches are at it agin', mother. Comment spam is infecting both my blog and my wife's. So far a relatively small number of keywords -- poker, Texas, debt -- is sufficient to keep 'em away from where Google can see 'em. Well, that and OCD-like running of SELECT statements in MySQL. But the fuckers are gonna be the death of me, or at least blog comments. Although maybe some sort of SURBL plugin for URLs in the post...that'd be cool. Someone must have something like that already.
Not that I notice a whole lot of comments, anyhow, at least away from the Slashdot side of things...although I do notice that I've made it onto somebody's blogroll. How'd that happen?
In other news: I finally decided what to do about new computers: buy a new Shuttle Sk43G, Sempron processor, and make that my web server; then, make my current webserver (older Compaq P3-500 desktop machine) my desktop and firewall: lots of room for ethernet cards, tape drives and whatnot.
I agree, it's a little silly that the more powerful box becomes the horribly underutilized server, but such is life. If there was a comparably cheap shuttle that came with two onboard ethernet interfaces, I'd be buying that instead.
So dive right in, right? I got the new box home last night, assembled it and booted w/o problems. It took little effort to move the hard drive from the web server and put it in the new, tiny box; sure, I had to recompile the kernel (8 minutes! eat that, P90!) to get the right drivers in, but nothing big. Until, that is, it froze. Hard. And only a few minutes after booting. If I ran top and set it to update continuously, I could get to freeze within seconds.
Some fiddling with Grub (boot loader of the GODS, man) showed that the problem seemed to go away if I went with the original Slackware stock 2.4.20 kernel instead of the 2.6.7 kernel I'd last compiled. (I'm a packrat, and that includes keeping every kernel compiled on this damned thing, Just In Case, because You Never Know.) We've got one of these boxes at work with an Athlon XP and it works fine; admittedly, it's not doing much, but neither is my web server. (Ba-zing!)
God only knows what's going on there, but it didn't last: I left it on overnight to see if it'd keep going, and sure enough it froze again around 10pm. I put the HD back in the P3 and left it. I'm going to see Wilco tonight (Whoo! WilCO! WHOO!), so this'll take a back seat to some serious RAWK. Except I'll probably be speculating about crappy memory or badly applied heatsink paste the whole time. No. No, I won't. It's Wilco.
Actually, I'm thinking I may have to upgrade the BIOS in order to get it to work properly with the Sempron; originally it was detected as a 900MHz Athlon, and I had to tweak the bus speed and whatnot to get it to run at 1.5GHz. (Interestingly, this seemed to have no effect whatsoever on how quickly it would crash, compared to the difference the different kernel version made.) (God, that's an awful sentence. I'm sorry, everyone.)
Anyhow, there's probably lots wrong with the settings; I never really wanted to learn about memory spacings and CPU voltages and I don't know what-all.
In other other news, I mentioned that I moved last week, but I didn't mention that I came back to two, count 'em TWO dead computers. (Before you ask: Support contracts are for the weak, and I suspect I'm about to get very weak.) One was a Linux box whose hard drive gave up the ghost. Stupid IDE hard drives in a dusty, hot environment anyway! But the other was was an old Duron whose motherboard's capacitors yearned to be one with the cosmos (ie, they blew up real good). That was running Windows, so the whole let's-just-throw-the-hard-drive-into-another-box-and-see-if-it-boots thing was good for a very, very bitter laugh but little else.
Instead, I reinstalled not only Windows but Cygwin, too. That proved to be harder; we use Cygwin to compile very particular things that depend on version 2.2 of Python. Version 2.3 makes things cry. And no matter how much you tell the Cygwin installer that you don't want to upgrade Python, it goes ahead and does so anyway like some hyperactive sugar-fueled kid who's certain he knows how to fix things.
After far too much experimentation, I did what I should have done in the first place: I found an old archive of Cygwin, with the right version of Python, and I mirrored it. One gigantic, nine-hour long sucking sound later, and I had a local copy to point the Cygwin installer at. Thank god.
Finally, just got in the first 19" LCD monitor at work. This was, of course, two weeks after assuring someone that they were too expensive to get past the boss. My bad. I'm going to get a lot of mean looks, I think. But then, if I was a people person, why would I have become a sysadmin?
Recommendation of the Day: Vicious Battle Rap, by DJ Format and Abdominal. Bow down, baby.
I decided this week to get Amanda working properly at home. I've got an old DDS3 tapedrive in Francisco, my FreeBSD firewall box, but all I've been doing so far is tarring to it once a week.
Setting up Amanda wasn't much of a problem, but I kept getting short write errors -- the damn thing was giving up and saying the tape was full after only about 3GB. I decided to run amtapetype, which takes about two hours per run with my hardware, in order to figure out exactly how much space I had. The first time, it said 2GB. WTF? The second time, the drive crapped out with errors about how a power reset had been detected. I decided to shut down Francisco and reseat the cables just in case. No problem, right?
Wrong! When I brought up Francisco again, it refused to boot -- lots of scary errors about how the hard drive couldn't be read, or found, and maybe the LIES about having a hard drive present should just stop now, huh? Francisco is old: it's an old P90 scrounged from an old job, stuck in this black case with non-working LEDs and a Punisher logo someone poked out in toothpick-sized holes on the front. No cooling fan, four ISA slots and three PCI, and I had to jiggle the BIOS so that it would boot from a 100MB partition at the beginning of an 80GB hard drive. Seems like as good a time as any to simply replace the damned thing...
...but first, a firewall. I tried booting it from an old laptop hard drive I had around, but that didn't work. I tried getting it to boot from a Slackware Live cd, but the whole concept of booting from a CD just made Francisco huddle in the corner in the fetal position.
Nothing else for it: it was time to do The Bad Thing. I grabbed one of the ethernet cards from Francisco, shut down Thornhill (P3, 500MHz, web and DNS server, Slackware and 2.6.7 kernel) and threw it in. A quick module recompile for tulip^Wvia-rhine and that was up; some judicious editing of the firewall set it up for NAT. Ph35r m3!
(Side note: Man, it's been far too long since I set up NAT on Linux; I still don't really understand what I've done. I've worked with FreeBSD for firewalls almost exclusively over the last four years, and I have some serious catching up to do.)
So now the question is: what do I do to replace Francisco? I know, finding a Pentium similar to Francisco is not that hard at all. But dammit, I'm tired of big, noisy boxes that are just waiting to die. I want something small, quiet, and reasonably new; I don't want to be fiddling with it, or worrying about it running out of memory (I tend to run far too much on a firewall, and 92MB of RAM just aggravates the problem).
It's complicated a bit by the recent heat-death of Hardesty, a 300MHz Celeron that had, 'til recently, been my desktop machine. I'd been hoping to replace or upgrade that, too; I've gotten quite used to a fast processor and lots of memory at work, and 15 seconds to render Slashdot's front page seems less like acceptable and more like a sign that civilization is in decline.
So...one option is a VIA Epia Cl6000. Dual ethernet, fanless goodness. That, and a case -- unless I decide to build my own Bubba can computer -- and some memory, and maybe a hard drive or maybe PXE booting. Whee! That'd make a pretty decent firewall and fileserver, no question.
But another option would be to let Thornhill keep doing the firewall thing, even though it's a webserver and should, like, rilly be outside the firewall, or at least in a DMZ. I could do something really funky like run Apache inside User-Mode Linux. Or maybe my own stuff, although I'm sure X would be a bear to get working.
A third option would be to keep using Francisco, but w/o a hard drive: let it PXE boot and do all the firewall stuff that way, totally stateless (well, hard drive-less). That could be interesting: almost no moving parts at that point. That would let me get a Mini-ITX something-or-other to use as a desktop machine. They're not the most powerful processors around, but when you can compile a kernel in 6 minutes, who the hell cares? Or maybe a Shuttle, so I could keep using my video card. Hm...
Well, enough of that for now; my cat needs chasing. And anyhow, King of the Hill season premiere tonight! @Woo!
Network problems again last week. Cheap switches will be the death of me, I swear, unless cable management gets me first. (Actually, it was both this time...cable looped back on itself + cheap switch == lots of embarassing explanations.)
But there are bright spots in this morass -- 48 of them, to be precise, in the form of 2 x HP 2626 Procurve Managed Switches. SSH login, VLANs up the wazoo, and much muchness. The only thing I'm not sure about is whether or not it does port mirroring (which I can live without, but it'd be nice). (UPDATE: Yes it does. Weeoo!) If these work out, then I think it'll be 2 x 2650s to replace the DLink unmanaged ones that keep crashing. The Ciscos seem nice and all, but the cost...oh my. And the respondents to the recent Ask Slashdot seemed to like HP a lot. Plus, we used to use 'em at my old job, and everyone was pretty happy. We'll see how it goes.
Just bought Neal Stephenson's The System Of The World at Big Hair Bookstore. Twenty-two pages and I love it already. God, the man can write.
My wife and I kinda made an impulse purchase on the weekend: a new 12" iBook G4. It was weird: I made a joke about buying a laptop. Then I explained that I was only joking, but if we were going to buy one it should be an iBook since I kept hearing how sweet they were. Then we were going to go to Stanley Park, hang out at the beach, but maybe go to London Drugs (I don't know about you Americans, but in Canada we go to the drugstore for everything...car insurance, furniture, computers, you name it. Oh, and occasionally prescriptions) to see what prices were like. Then we were buying one. It all happened so fast.
So far, it's pretty damned impressive. After all the trouble I had to go to get gphoto to work with our digital camera, my wife just plugged it in here and it worked with iPhoto right away. Not only that, but we were looking at a slideshow of the crack-induced photos we'd taken while Fur Elise played in the background. Fucking unreal, man.
It's weird: I do feel a bit like I've made a deal with the devil. I've come to agree more and more with RMS about Free-as-in-Freedom, and here I am with a closed-source OS. Yada-yada-Darwin, what about Aqua? But it's sooooo nice...well, mostly, anyway.
I'm trying to use MacStumbler at the moment to find a wireless network to hook up to, but no luck: it just sits there, looking like it's scanning but with no more feedback than a scrolling bar. Dammit, I thought W2K was the only culprit there...and dammit, if I can't blog from the steps of the Vancouver Art Gallery, this thing is going back to the store. I suspect a problem with MacStumbler, but it's hard to be sure; I managed to find five or six access points at the office with Knoppix and the work laptop, and (apparently) wasn't able to find a thing with MS. I need to find a command-line version.
So far, though, that's my only complaint. Pretty fucking sweet, if you ask me.
Had a problem at work with Debian and VNC: the alt keys wouldn't work,
for some reason. This was pretty annoying for the developer who
really, really wanted to use Emacs. It took me about an hour of
poring through Google -- Jesus Christ, the number of complaints about
ALT keys disappearing, and Good God the long uber-thread about the
change in keyboard behaviour between Debian versions -- to find the
solution: vncserver --compatiblekbd
A-ha!
Back to work and still no wireless access. Carousel is a LIE!!!
UPDATE: The VNC trick doesn't work. Details: The developer is running VNCViewer under VNC to connect to an X desktop on a Debian machine. On that machine, he's opening up an xterm and running User-Mode Linux. Alt-equals-meta works for Emacs when run on the Debian machine, but not for Emacs when run in the User-Mode Linux xterm. Fuck. UPDATE: Buddy found the trick: shift-left-click in the xterm to get the menu, then click "Meta sends escape". Double fuck!
Jesus Christ. Every time I mess around with hardware or upgrades, I swear I'll never do it again. Then I forget.
My first computer, bought eight years ago now, was a 486 w/16MB of RAM and some amount of HD space. I installed Slackware on it, got a 33.6 modem, and had email and net access. Then a roommate sold me his old P90. It crashed constantly until I figured out I had set the CPU voltage wrong. It took me a long time to figure that out, and I was nearly ready to hurl the thing out the window.
A few years later I upgraded to my current desktop machine, a 333 Celeron overclocked to 450 MHz. The machine is fine unless I open up the case to add/remove/shift something in it; then it will, for a day, spontaneously reboot. I've checked it for shorts and can't find any. I don't know what I'm missing, but I'm sure it would be obvious to someone else.
And now the latest. My wife bought an iMac from her old work a few years ago, and has had problems w/it since. It just crashes for no good reason. It'll work fine for two weeks, then she can't keep it running for more than an hour. So last week I went out and bought her a fairly skookum machine: Athlon 2600 (I think...details to follow), ECS K7S5A mobo, 60GB HD and 256 MB RAM.
I got it all home and assembled it. The mobo and Red Hat 9 (not my favourite, but great for my wife) called the CPU a 2000 (1.6GHz instead of 2.0), so I looked around and decided a BIOS upgrade would be in order. Did that and promptly lost the back USB -- bad, since her keyboard and mouse are USB. The front ones, hooked up to the pins on the motherboard, still worked. Tried rolling the BIOS back, but nothing: the back, onboard USB just didn't go. Fuck.
So I went out and got some additional USB risers a few days later. I added them; no problem. Then I had to add a connector from the CDROM's audio to the motherboard. I made the mistake of removing one of the USB connectors while the power was still on. Didn't even think; just did it. Now the BIOS freezes at "Checking NVRAM...". Flashed the CMOS half a dozen times, left it off most of the night while we went to see Finding Nemo (not as good as Monsters, Inc., but still well worth it), and no change.
Today I'm going to stop by my new hardware supplier of choice|http://www.ntcw.com/ to pick up a Gigabyte 7VAX. We'll see if I got ripped off on the CPU or what.
Mostly, though, I am not going to fuck with this computer again. I mean it this time.
So here it is, 8.30pm, and I'm restoring a Cobalt Raq 4 to something approaching virginity. It belongs to a colo'd customer, and it got cracked; we offered, for a modest cost, to restore it, and here I am.
It's Linux under the hood of course -- Red Hat, or at least they use RPM --and it's interesting to see what's been done with it. The management page is pretty slick, though it always leaves me wanting to log on. To do that, I need to telnet -- shudder -- and of course the cust. hasn't got SSH on it. (Confirmation that we had a cracker was nmap showing lots of open ports that responded with an SSH banner. Seems weird to me that a cracker would install ssh, but oh well.) But all the web functionality seems to be there, and it seems pretty and easy to use.
The cust. kept up to date with the patches from Sun (part of what I'm reinstalling right now), but I think there's still a few holes; I'm pretty sure there's an old version of Apache, for instance. And would it kill them to have OpenSSH? Or firewalling tools?
Anyhow, it's the first time I've worked with an automatic patch installer that wasn't Windows, and I must admit I'm impressed. Download the patch -- which is a tarball of script + rpms + patches -- clicky-click install on the web interface, and away you go. I'm sure it's not news for most of you, but it's neat for me. The only thing is that it reboots between a lot of them -- c'mon guys, I thought this was Linux! :-)
Random idea for a program: I'm hooked up to this thing by a crossover cable to another Linux box, just to keep it off the 'net while it's having everything reinstalled. I telnet in occasionally to make sure things are working, but the damn prompt always takes so long to come up. It's the Raq doing a reverse lookup on my DNS, of course, but because it's just on an Xover cable it sits there until the queries time out. We're talking a minute or so to time out, which is unacceptable. I'm an important man, after all.
So my idea is to have a program listening for queries like that and answering them, masquerading as whatever DNS server the query was directed at. Basically, just fake 'em out with whatever info they want. In cases like this (which I can see coming up, oh, at least once a year), it'd speed things up immensely. Anyone heard of anything like this, or is it just full of Crak(tm)?
...urghh. Just rebooted for a patch that alleges fixing Apache and OpenSSL problems. Why the hell does this need a reboot?
Update time.
I got into work today and found that the mail server had just come up after *half a fucking hour* of being down because of the insane load placed on it by spam -- just spam -- coming in. The owner of the company couldn't send email. I started setting up the new mail server.
And it was nice. I got to go away, away from the help desk, sit down and figure out how to make it work. FreeBSD's vinum + Promise raid controller == kernel panic (details later on). Finally got vinum figured out -- I've only worked w/it once before -- and before I was grabbed back to help desk had the disk setup about 80% done.
So some more details: there's 4 x 40GB maxtor IDE drives. (Yeah yeah yeah SCSI.) We've got an onboard Promise controller chip; I'll put in the mobo tomorrow and make this all seamless. First it turns out we've got the Promise Lite (Less Filling!) BIOS, which means we can only have one (1) array of two disks; the other two disks can be single arrays on their own, which is useful in some alternate universe I'm sure. So okay, try setting up one mirrored (Raid 1? 0? I can't keep 'em straight) array, and we'll use vinum to tie it together with the other single drives...
Only as soon as I try using vinum to do _anything_ with the Promise'd arrays, BANG: kernel panic. This is 4.6, not the latest (4.7RC1 as I type), but still. Arghh. Doesn't matter whether vinum tries raid 0, 1 or 5 -- just panics right away. If I had more time and a box of my own to fool around with, I'd try [Michael Lucas'|http://www.oreillynet.com/pub/a/bsd/2002/03/21/Big_Scary_Daemons.html] SlashdotJournal_25September2002-02 (Buy his book!) and contribute something useful to the FreeBSD folk. Alas, it's not my box or my time, and if I were to post this message to freebsd-hackers-important-vinum-people tomorrow I'd (deservedly) get laughed at so hard I'd feel it over the ether.
Anyway. Point is I can't get vinum to play nice w/the Promise'd chip even as an IDE controller. The BIOS of the box allows you to turn the Promise chip on, off, or to ATA/IDE; but even set to the latter, it panics once vinum touches /dev/ar*. You have been warned.
So get vinum using the four drives on the first two IDE channels, and that works fine once I learn the intricacies of disklabel (set type to vinum, kids!) and vinum init (and that takes a long time w/3*35GB partitions^H^H^H^H^H^H^H^H^subsooperplexen). 1 5m 5o 133t!
OT: One of my side notes was going to be about how I'm posting this w/Lynx 'cos Mozilla won't let me use vi, editor of the Elder Gods, as an editor. Then I realized I could have just fired up a shell and used vi in there. Sigh. Rumours of my cleverness have been exaggerated.
So I just moved into a new place with my wife: main floor suite of a house, tons more space than the one bedroom apartment we had. Went to Ikea today and got a new desk: The Jerker (no, really). And is this baby ever sweet!
It's $144 (Canadian), which was one of the cheapest desks around, and it's absolutely perfect for my needs. For a start, it's rock fucking solid. Even putting it together, when I only had the uprights and one crosspiece bolted together, it wasn't wobbly in the least. For another, it's got a huge expanse of desk area, both wide and deep; this is nice, since I've got a big-assed 21" monitor (free, but another story). Third, it's got a shelf above for books and dippin'. Fourth, it's all adjustable: you bolt the shelf and desk plank (what the hell's the right word? Top, I suppose) into holes in the uprights, spaced at 1" intervals.
The only thing this is missing is a hole for cables, but that's a minor complaint. Also, there's no drawers or cd holders included, but that's all good for me; I hate 'em.