WARNING: The size of this disk is 2.4 TB (2391994793984 bytes). DOS partition table format can not be used on drives for volumes larger than 2.2 TB (2199023255040 bytes). Use parted(1) and GUID partition table format (GPT).
If you don't follow his blog, it's worth it. Two snippets from his post on the 2.6.32 kernel. First:
We all drifted back to our companies, and planted the seeds that maybe something like the 2.6.32 kernel would be a nice one to do our product on. This planting worked so well, I had to refrain from fits of laughter in one meeting where a project manager got up and said, "We decided that the 2.6.32 kernel would be the best for our product, what does engineering think about this?"
This successfully cumulated in the release of SLE11 SP1, Debian "Squeeze", RHEL 6, Oracle Linux 6, and Ubuntu 10.4 LTS, all based on the 2.6.32 kernel.
Hacking the business models of these different and competing groups, to coordinate on this specific kernel, was one of the (previously) unsung successes of how the community really can achieve remarkable things if they decide to do it.
Two:
I would personally like to thank the Debian kernel developers, specifically Ben Hutchings, Maximilian Attems, Dann Frazier, Bastian Blank, and Moritz Muehlenhoff. They went above and beyond what any "normal" developer would have done, ferreting patches out of the kernel.org releases and the different vendor kernels and bug tracking systems, backporting them to the 2.6.32 kernel, testing, and then forwarding them on to me. Their dedication to their user community is amazing for such a "volunteer" group of developers.
I firmly believe that without their help, the 2.6.32 kernel would not have been the success that it was. The users of Red Hat and SuSE products owe them a great debt.
Buy them a beer the next time you see them, they more than deserve it.
Today I gave some impromptu training at $WORK; the approximate topic was "Saving State in Linux". I've been meaning to do something like this for a while, but it was prompted by a conversation yesterday with one of the researchers who kept losing work state when shit happened -- Emacs window arrangements, SSH sessions to other machines, and so on. I found myself mentioning things like tmux, workgroups, and Emacs daemon mode...and after a while, I said "Let me talk to you about this tomorrow."
So today I found half an hour, decided to mention this to everyone in the lab, crowded into a meeting room, set up my laptop and the projector, and away I went. For a fly-by-the-seat-of-my-pants first attempt, I think it went relatively well. Best idea: asking people for questions. It hadn't occurred to me that people would want to know more basic stuff like "How do I split windows in Emacs?". I'm never sure what people already know, so I don't want to bore them...
Next time:
In other news: finally converted my SVN repos to Git yesterday in a fit of pique. The big three -- my org-mode stuff, and the two Cfengine repos (Cf2 and -3) -- are already in use, as in that's where I'm checking stuff into. The rest (Nagios configs, for example) are being done as I get to them. It's really, really wonderful.
Family: holy house o' plague, Batman!
Gah. We're getting the house boiled next week. (Update, March 13: too late; I puked on Friday night and spent Saturday moaning in bed; my wife did the same thing Saturday night/Sunday. FUCK.)
Also? There's a Planet Lisp. Who knew?
With Grub 2, you can change the default menu entry without changing order by editing /boto/grub/grub.cfg. Edit "set default" line to be of the titles in /boot/grub/grub.cfg.
But in Ubuntu/Debian, you want to edit /etc/default/grub and change the appropriate line there, then run "update-grub".
In Unity, there is no easy, risk-free way of changing Unity settings. You can install Compiz Settings Manager but that will make baby Jesus cry:
I am an experienced Linux user, I've contributed to kernel and work on the Canonical OEM team; I only mention these facts to show my context, which is -- the other day, I did a fresh install of 11.10 on my laptop, and wanted to customize something (turning on focus-follows-mouse). I poked around in gnome-control-center for about 30 minutes before giving up and discovering the only way to do this was using ccsm.
After installing ccsm, I configured ffm, and then -- accidentally! -- my mouse cursor passed over the preferences button and the touchpad on my laptop registered a click.
Boom!
Unity session dead.
Holy crap, what a crock.
Okay, did you know that the Ohio LinuxFest has put up audio from their sessions at archive.org? I didn't, but I'm downloading it all now. (Along with a couple of NYLUG presentations on Rocks and Cobbler and Kexec/Kdump.)
Kudos to the organizers for such a great idea!
"I say we take off and nuke the entire site from orbit. It's the only way to be sure."
Saturday afternoon my home web server got cracked. I found out because Google started refusing my searches, asking me to fill out a CAPTCHA form (incidentally, I hate the word CAPTCHA, and even typing it gives me hives) to prove I was human. What the hell?
So I checked on the server, which is also our firewall, which isn't good but frankly I was tired of maintaining a complex network at home, and sure enough there was some perl script running as user www-data (which Debian uses to run the webserver), sending off tons of Google queries and taking commands on IRC the way I keep hearing nobody does anymore. Crap.
Fortunately I've been running Bacula for a while now, backing up to an external hard drive, and so I figured that even though it probably would go away when I rebooted, I'd Do The Right Thing(tm) and rebuild from scratch.
This had to wait 'til the evening, so I shut down the webserver, ran backups a bunch more times, got more info, and moved the machine (a tiny li'l Shuttle box) from my youngest son's bedroom (apparently the only room in the house w/a phone outlet not covered by an ADSL filter) to our bedroom upstairs, running the network cable up the stairs.
In the end, it all went pretty smoothly. I was able to get all my packages back and restore from backup; the only thing I messed up was getting the ownership wrong on my restored crontab. (Debian uses a pool of UIDs for daemons, so you're not guaranteed to get the same UIDs if you reinstall.)
As a bandaid, I've firewalled off www-data from initiating connections out. I should have done this long before. Now I'm starting to think about the next step -- Xen, maybe, or SELinux. (I did briefly consider other distros, or even a BSD: CentOS for SELinux, FreeBSD for pf and jails. But I decided that one problem at a time was quite enough, thanks.)
I've run into an interesting problem with the new backup machine.
It's a Sun X4240 with 10 x 15k disks in it: 2 x 73GB (mirrored for the OS) and 8 x, um, a bunch (250GB?), RAID0 for Bacula spooling. (I want fast disk access, so RAID0 it is.) RAID is taken care of by an onboard RAID card, so these look like regular disks to Linux.
Now the spool disk works out to about 2.2TB or so — which is big
enough to make baby fdisk
cry:
WARNING: The size of this disk is 2.4 TB (2391994793984 bytes). DOS partition table format can not be used on drives for volumes larger than 2.2 TB (2199023255040 bytes). Use parted(1) and GUID partition table format (GPT).
Well, okay, haven't used parted before but that's no reason to hold
back. I follow directions and eventually figure out that mkpart gpt
ext3 0 2392G
will do what I want. GPT? Piece of cake! And then I
rebooted, and I couldn't boot up again. Blank screen after the
POST. Crap!
The first time this happened, the reboot also coincided with some additional problems during the POST where too many cards were trying to shove their ROM into the BIOS memory (or some such); I thought the two were connected. But then I did it again today, and I finally started digging.
The problem is that parted overwrites the MBR when setting up a GPT disklabel. This has been noted and argued over. My understanding of the two sides of the debate is:
Meanwhile, the parted camp has a number of bugs dealing with this very issue, two opened a year ago, and none have any response in them.
This enterprising soul submitted a patch back in December 2008, which appears to have fallen to the floor.
As for me, I was able to convince the BIOS to boot from the smaller
disk, and then get a rescue CentOS image going via PXE booting, and
then reinstall grub on the smaller disk. Sorted. All I had to do was
change root (hd1,0)
to `root (hd0,0) in grub.conf.
A touch anti-climactic after all that, perhaps. But it was interesting a) to learn about all this (I hadn't really thought about successors to the DOS partition format before), and b) to see what a slender thread we (okay, I) hang our hopes on sometimes. It's a necessary, sobering thing to realize how much of what I use, depend on, believe in is created by volunteers who are smart, hard-working people — they argue and and focus and forget just like real people, not inhabitants of some shining city on a hill I sometimes take them for ("Next beer in Jerusalem!").
This has been one of those days where all I've done is stare at monitors too closely.
I know, I'm a sysadmin, what do I expect? But some days I get up, move around; I'm sedentary (and introverted) by nature but I try to talk to people, stare off into the distance, get away from my desk. Going to the server room is always a good break.
Not today, though. My carefully-chosen ATI video card (the Radeon 4550) is giving me headaches, metaphorical and real:
Dual monitors is important. My own damn fault for not getting something old enough...
...that TCP Offload Engines (TOE) were so detested by Linux kernel folks. The arguments here make interesting reading and seem convincing to me.
(From Andy Grover's blog.)
Okay, I feel like a bit of a tool for never realizing how cool suspend-to-ram is in a laptop. My new laptop for work is a Dell D630, which I'd got 'cos its hardware is pretty much completely compatable w/Linux. However, I've also figured out that a) Ubuntu does suspend-to-ram quite nicely (aside from a couple times when the keyboard doesn't work, but closing/reopening the lid makes it work), and b) it just sips — sips, I tell you! — from the battery.
Now to try and make it work on my own laptop, which is currently sitting at the shop waiting for me to pick it up.
Today's agenda:
See? I am still a sysadmin.
Just upgraded my laptop to Debian Lenny with only minor hiccups (my own fault). Not only have I got the latest version of Iceweasel/Firefox without any GTK version nonsense, but I've got wicd working, including my Broadcom wireless and WPA2! (I never could figure out the settings to get encryption working with the various /etc/network/ files...) I'm happy…
A few quick notes about building Fedora Directory Server RPMs for CentOS:
$instance_dir
points to /etc/dirsrv
, not /etc/fedora-ds
.(Partly a memo to myself, and partly to help anyone in the same boat; edits have been disabled in the FDS wiki, so I can't add this right now.)
The Internet Storm Center writes about a new variant on malware that messes with your DNS: it installs a rogue DHCP server.
While not too sophisticated, the whole attack is very interesting. First, it's about a race between the rogue DHCP server and the legitimate one. Second, once a machine has been poisoned it is impossible to detect how it actually got poisoned in the first place - you will have to analyze network traffic to see the MAC address of thoese DHCP Offer packets to find out where the infected machine actually is.
In other news...all $job_2's new machines are set up and running. Kickstart is very nice…I really wish Debian had something similar; FAI is lovely, but Kickstart has the lovely feature of taking a hand-done installation you've just finished and turning that into a config file for a hands-off version. That saves a huge amount of time.
Next up: turn nscd back on (forgot I'd left it off for debugging LDAP
'til a simple find -exec chown
was taking 10 minutes to finish);
relabel the machines with their new names; commit the documentation
I've been piecing together on my laptop; open up to others in the
group; look at either moving the LDAP server over to the server room,
or setting up a slave over there.
I just spent the weekend (well, like an hour a day...kids, life, you know how it is) trying to track down why a bunch of new CentOS 5.2 installs at $job_2 couldn't pipe:
$ ls foo foo $ ls | grep foo $ echo $? 141
(Actually, I didn't think to look at the error code 'til someone else pointed it out…141 turns out to be SIGIPE) In the end, it would have been quicker if I'd simply searched for the first thing I saw when logging in:
-bash: [: =: unary operator expected -bash: [: -le: unary operator expected
This was particularly aggravating to track down because not every machine was doing this, and no matter what I thought to look at (/etc contents, /tmp permissions (those have a habit of going wonky on me for some reason), SELinux) I couldn't figure out what was different.
Turned out to be an upstream bug in nss_ldap. (The Bugzilla entry makes for some interesting reading, to be sure…) And I didn't see it on each machine because I hadn't upgraded after installation on all machines. (They're not yet in production, and I'm working on getting my kickstart straight.)
Man, it was gratifying to upgrade nss_ldap and see the problem go away…
I've since found a great deal more about multipath in Linux:
The trick was to search for "multipath" and "fstab".
Also, I contacted the installer from Sun who worked on our new machines, and he told me that the multipath driver download was lost during an upgrade of the download page; they're working on it, but in the meantime he's sent me a copy of the driver. Sweet!
I'm in the process of setting up a bunch of new servers for $job_2. All but one are CentOS 5.2, kickstart installed and managed with cfengine. This is the third time I've goen thorugh a cfengine setup, and it always feels like starting from scratch each time. It seems -- and I'm not at all sure this is fair or accurate -- that each time I set up one of these systems, there's a lot that I've lost from the last time and have to relearn. I'm fortunate this time that I can refer to $job_1's setup to see how I did things last time, but if I didn't have that I'd be significantly further behind than I am.
I'm not sure what the solution is. Part of me thinks I should just be more aggressive about taking notes, or committing stuff to a private repository, or writing it down here more; part of me thinks that this might be a clue that cfengine is too low-level for my head. It feels like when I was trying to learn C, and couldn't believe that I had to remember all this stuff just to print something, or read a file, or connect to another machine over the Internet. By contrast, Perl (or any other scripted language) was such a relief...just print, or open, or use the Net::Telnet module, or whatever. The details are there and they are important, sometimes very much so; that doesn't mean I want to learn more metallurgy every time I need a fork. (No, I don't think that metaphor's tortured; why do you ask?)
Another thing is that I'm trying to get multipath connections working for the first time. We've got two database servers, each of which is connected via dual SAS HBAs to outboard disk arrays. (I don't think anyone else calls them "outboard", but I like the sound of it. See this hard drive? It's outboard, baby!) The arrays are from Sun and come with drivers, but the documentation is confusing: it says it's available for RHEL 5 (aka CentOS 5), but the actual download says it's only for RHEL 4.
As a temporary respite, I'm trying to see if I can get these working using Linux's own multipath daemon, and it's also confusing. The documentation for it is tough to track down, and I just don't understand the different device names: am I meant to put /dev/dm-2 in fstab, or /dev/mpath/mpath2p1? If the latter, why does the name sometimes change to the WWUID (/dev/mpath/$(cat /dev/random)) when I restart multipathd? (use_friendly_names is uncommented in the config file.) If the whole point of multipath is failover, why does this sequence:
(where /mnt is where I've got this array mounted, obvs) sometimes work, and sometimes end with "I/O error" being logged, and the filesystem being read-only? Is this the sort of thing that the Sun driver will fix? I can't find anything about this.
And I mentioned electrical problems. When we got our servers installed, the Sun guys told us they'd tripped breakers on the PDU and/or breakers in the room's electrical cabinet. Since it had a sign on it saying "100A", I figured we might be running up against power limtis -- either in the room as a whole, if my figures were 'way out, or on individual PDUs. Turns out I was probably wrong: I missed the bit on the sign that said 3-phase, which means (deep breath) we probably have 3 x 100A power available (I think).
It's more complicated than that, because some of it is in 120V, some of it is in twist-lock 220V 30A circuits, and so on. But I should've checked before emailing the faculty member who, in a year or two, will be going into this room (we're there as guests of the department) and happens to sit on the facilities committee. He had asked how we were doing, so I sent him an email -- nice, polite, and including a bit about how grateful we were for the room and the help of the local sysadmins (all of which is true).
I was under the impression that he was asking for info now, so that he could bring it up for action in a few months when we were out. Instead, two hours later when I'm swearing at multipath, in come the facilities manager and one of the sysadmins I was dealing with, looking to find out just how much power we were using anyhow. I apologized profusely, and they were very cool about it. But when the committee guy asks questions, people jump. I had not anticipated this. Welcome to University Politics 101. I emailed again and explained my mistake.
There are lots of remedial courses I could take. However, today I would most like to take "Electricity and wiring for sysadmins".
And on another note: Ack! My laptop's home partition is 93% full! How the hell did that happen?
And again: How did I not know about apt-file? This is perfect!
(Touch o' the hat to Tears For Fears and Steve Kemp; I'm moving closer every day to switching to Chronicle.)
New Dell 2950 server. 2 x quad-core Xeons, 2 x 6MB cache on each die, 16GB RAM, 6 x 300GB SAS 10K SCSI drives in a RAID-6 array using the PERC/6 controller.
/usr/src/linux-source-2.6.18# time make -j 9 bzImage
[snip]
Root device is (8, 3)
Boot sector 512 bytes.
Setup is 7295 bytes.
System is 1222 kB
Kernel: arch/i386/boot/bzImage is ready (#1)
real 0m22.668s
user 2m20.425s
sys 0m14.537s
That's just insane.
Matthew Garret's presentation on Suspend-to-Disk make fun reading.
Arlo's sick with flu or something; I was up 'til 1am last night rocking him to sleep. Haven't done that in a while…
Telling detail: I'm about to blow away Debian testing on my desktop machine and install Ubuntu's Gutsy Gibbon. Partly it's because I'm tired of installing 80MB worth of updates every two weeks, and partly it's because it'll make setting up the printer a breeze.
I'll probably leave half the drive aside for good ol' Debian stable, but Ubuntu'll stay there for experimenting and so my parents, on their next visit, will not have to bring out their 4-tonne laptop.
I'll be reinstalling Ubuntu on my laptop as well; due to a stupid
error, I installed Dapper, not Gutsy. I tried updating in one fell
swoop, and after three days of apt-get -f install
I finally got
things working…except for the boot artwork, and GDM doesn't start one
time out of three. Interesting experiment, but I think I'll take a
do-over.
I may even install it twice, so that I can try out The Depenguinator, which appears to be a lot easier than trying to figure out PXE booting for FreeBSD. Unlike OpenBSD, there's no readily apparent "official way" of doing it, and the handful of HOWTOs I've found have contradicted each other. At this point I'm just too lazy to keep trying and seeing what I'm doing wrong.
(Note: this was actually written back in May.)
Top Tip: Filenames with a tilde in them can confuse Samba.
Case in point: last week a user was
having problems loading his profile: W2K kept choking and saying that
the file Local Data\Applications\foo\backup\~AvariciousMonkeys.c
was
in use. Naturally, lsof on the Samba server turned up nothing, and I
couldn't see any obvious problem. On a hunch, I tried renaming the
file to AvariciousMonkeys.c~
, and hey presto! goodness all
over.
This week I'm trying to get FAI going in seriousness. I've worked on it before, but now I've got three developers who want to switch to Linux. The last thing I want is another series of one-offs, so I'm taking the time to do it right. Now there's a CD version in beta, and so far it's working well. Cf. the usual way of doing it, which is to do PXE booting and grab everything off the network. I'm not opposed to that, but one of the things I wanted out of FAI before was the ability to do CD-based, kickstart-like Debian installs; looks like it's finally going to work.
Looks like we're having a problem with a Maxtor PCI IDE controller and the Intel mobo in our backup server. It's been mysteriously crashing in the middle of the night w/no log messages. Some checking in the BIOS turned up another problem: going to the hardware monitoring page to look at the CPU temperature made the damn thing freeze. WTF? Sure seems like the symptom we were seeing, and backups running at night make big use of the Vinum array that uses drives attached to the IDE adapter...long story short, taking out the card stopped the BIOS freezing. It remains to be seen if it'll work for the random midnight freezes, but it's good to have something to try. I'm hopeful that FreeBSD will be able to handle SATA drives attached to this thing...we'll have to see.
Which brings me to the next bit: fleshing out plans for server upgrades. As I mentioned, last week we had a power supply fail on our Very Important Server, and I want to try and keep that from happening again. Of course, adding umpty thousand dollars worth of hardware to your budget four months before the end of fiscal doesn't really work too well, so as much as possible I need to do this w/o new hardware. Ha! But I'll give it a try.
First off is setting up OpenLDAP and importing Samba's information into it. That'll be neat, since I've never worked w/LDAP before. Second is to set up some BDCs using OpenLDAP to query the master. (Or do they just suck over the whole database? Hm. Either way.) Third is to set up some Linux machines. Why? Two reasons:
LinuxHA and DRBD seem fantastic, and there just doesn't seem to be anything comparable on the FreeBSD side. As for the hardware...well, my first impression of server hardware from IBM, HP and the like (no, don't talk to me about Dell) is that I'm going to need a newer version of FreeBSD than we currently use in order to run SATA drives. (I know SCSI is the way to go, but I was quoted two thousand dollars for two IBM 73GB 15k drives! I know: 15k, IBM, etc, but even halving that means two -- two! -- 73GB drives for a thousand bucks, a/o/t two 200GB drives for, what, four hundred. Heh.)
We're using an older version of the 4-series FreeBSD here. I've already set up one server using a newer 4-series release, and it's a pain: too many differences, one more thing to keep in mind when making changes, and so on. I haven't worked with the 5-series yet, and I don't want to start now...not entirely sure that it'd work for us. Plus, we'll probably migrate to Linux anyway, so I don't mind doing it for a server.
Anyhow! Get a Real Server and throw Linux on it. Hook it up to our drive array and start migrating home directories to ReiserFS from UFS/FreeBSD. Not trivial, but doable. Add more Linux servers as budget allows.
Welp, the Promise array is here at last. I don't have any disks yet -- they're coming in next week -- but I've had a chance to play around with the firmware. First off, it's running Linux, just like JWSmythe said. The firmware that came with the box said "Now uncompressing Linux..." at boot time; it may be indicative of something that the newer firmware says "Now uncompressing kernel..." Promise doesn't mention anywhere on their website that the 15100 uses Linux, which surprises me a little. They also don't offer the source code anywhere. I've sent 'em an email asking about that; their autoresponder said I should hear about that today.
Second, I've yet to figure out how to enable SSH on the thing, and I'm increasingly lacking confidence that it even offers this, even after the firmware upgrade. Naturally, this is in strict contrast to what's listed on the website. I've sent them an email about this.
Third, I've yet to figure out how to monitor the thing by SNMP. I can run snmpwalk, sure, and I get info back, but but I don't see anything like network traffic or disk stats or anything. (Compare and contrast with the PDU from APC, which included the SNMP schema [if that's the right word] on the CD.) Then again, this may be because I haven't got any disks in there. We'll see.
Fourth, it looks like there was corruption of the firmware. Got it in yesterday, booted fine, upgraded firmware by TFTP, all good, turned it off before going home (and not for the first time that day, either). This morning I booted it, and things were just wrong: the network address was obviously bogus and couldn't be changed, various menu entries were showing garbage instead of "Promise VTrak 15100" or whatever, and so on. I called tech support, who told me the secret:
Note: if you fry your array by following this advice, you're on your own. But it worked for me. Of course, this doesn't explain why it happened in the first place. I'm going to be watching it carefully.
Funny moment: While waiting for me to figure out how to reboot the array [which took a few minutes because of the menu corruption I called to complain about], the techie I was talking to was having a conversation with someone else. "Are you reading? [pause] Okay, are you working on projects? [pause] It's okay if you're using the web to work on projects. [pause] But if you're just surfing the web looking for a job, that's not working on projects."
Second funny moment: The warranty registration page on the Promise website asks for suggestions and comments to "help us imporve in the future." Third funny moment: When registering the extended support, the page that asked for the value of the product purchased barfed with "Internal Error" when I put a dollar sign in the amount. (Okay, so I'm just easily amused.)
Finally, it's just plain odd to be asked for your bona fides by your power bar:
- Access: Enabled
- Protocol Mode: SSH Version 2 only
- Telnet Port: 23
- SSH Port: 22
- Advanced SSH Configuration
- Accept Changes : Pending?- Help, esc- Cancel Changes, enter- Refresh, ctrl -L- Event Log > 6 LICENSE AGREEMENT By enabling this security feature, you are agreeing to the following statements: A. This Product includes cryptographic software subject to export controls under the U.S. Export Administration Regulations. You agree to cooperate with American Power Conversion Corporation as reasonably necessary to ensure compliance with the laws and regulations of the United States and all other relevant countries, relating to exports and re-exports ("Export Laws"). You shall not import, export, re- export or transfer, directly or indirectly, including via remote access, any part of the Products into or to any country (or its nationals or permanent residents) or to any end user or end use for which prior written governmental authorization is required under applicable Export Laws, without first obtaining such authorization. By ACCEPTING THESE TERMS, you are representing and warranting that neither your use nor your receipt of any part of the Products requires prior written authorization under any Export Laws. You are responsible for complying with any local laws in your jurisdiction which may impact your right to access or use this product. B. By ACCEPTING THESE TERMS, you are representing and warranting that (1) you are not located in or a national of any U.S.-sanctioned or terrorist-supporting countries, (2) identified on the U.S. Treasury Department's List of Specially Designated Nationals, the U.S. Commerce Department's Entity List, or the U.S. Commerce Department's Denied Parties List; or (3) engaged in any proliferation-based or terrorist- supporting activities. Do you accept the terms of this license agreement? Enter 'YES' to continue or ENTER to cancel :
I decided this week to get Amanda working properly at home. I've got an old DDS3 tapedrive in Francisco, my FreeBSD firewall box, but all I've been doing so far is tarring to it once a week.
Setting up Amanda wasn't much of a problem, but I kept getting short write errors -- the damn thing was giving up and saying the tape was full after only about 3GB. I decided to run amtapetype, which takes about two hours per run with my hardware, in order to figure out exactly how much space I had. The first time, it said 2GB. WTF? The second time, the drive crapped out with errors about how a power reset had been detected. I decided to shut down Francisco and reseat the cables just in case. No problem, right?
Wrong! When I brought up Francisco again, it refused to boot -- lots of scary errors about how the hard drive couldn't be read, or found, and maybe the LIES about having a hard drive present should just stop now, huh? Francisco is old: it's an old P90 scrounged from an old job, stuck in this black case with non-working LEDs and a Punisher logo someone poked out in toothpick-sized holes on the front. No cooling fan, four ISA slots and three PCI, and I had to jiggle the BIOS so that it would boot from a 100MB partition at the beginning of an 80GB hard drive. Seems like as good a time as any to simply replace the damned thing...
...but first, a firewall. I tried booting it from an old laptop hard drive I had around, but that didn't work. I tried getting it to boot from a Slackware Live cd, but the whole concept of booting from a CD just made Francisco huddle in the corner in the fetal position.
Nothing else for it: it was time to do The Bad Thing. I grabbed one of the ethernet cards from Francisco, shut down Thornhill (P3, 500MHz, web and DNS server, Slackware and 2.6.7 kernel) and threw it in. A quick module recompile for tulip^Wvia-rhine and that was up; some judicious editing of the firewall set it up for NAT. Ph35r m3!
(Side note: Man, it's been far too long since I set up NAT on Linux; I still don't really understand what I've done. I've worked with FreeBSD for firewalls almost exclusively over the last four years, and I have some serious catching up to do.)
So now the question is: what do I do to replace Francisco? I know, finding a Pentium similar to Francisco is not that hard at all. But dammit, I'm tired of big, noisy boxes that are just waiting to die. I want something small, quiet, and reasonably new; I don't want to be fiddling with it, or worrying about it running out of memory (I tend to run far too much on a firewall, and 92MB of RAM just aggravates the problem).
It's complicated a bit by the recent heat-death of Hardesty, a 300MHz Celeron that had, 'til recently, been my desktop machine. I'd been hoping to replace or upgrade that, too; I've gotten quite used to a fast processor and lots of memory at work, and 15 seconds to render Slashdot's front page seems less like acceptable and more like a sign that civilization is in decline.
So...one option is a VIA Epia Cl6000. Dual ethernet, fanless goodness. That, and a case -- unless I decide to build my own Bubba can computer -- and some memory, and maybe a hard drive or maybe PXE booting. Whee! That'd make a pretty decent firewall and fileserver, no question.
But another option would be to let Thornhill keep doing the firewall thing, even though it's a webserver and should, like, rilly be outside the firewall, or at least in a DMZ. I could do something really funky like run Apache inside User-Mode Linux. Or maybe my own stuff, although I'm sure X would be a bear to get working.
A third option would be to keep using Francisco, but w/o a hard drive: let it PXE boot and do all the firewall stuff that way, totally stateless (well, hard drive-less). That could be interesting: almost no moving parts at that point. That would let me get a Mini-ITX something-or-other to use as a desktop machine. They're not the most powerful processors around, but when you can compile a kernel in 6 minutes, who the hell cares? Or maybe a Shuttle, so I could keep using my video card. Hm...
Well, enough of that for now; my cat needs chasing. And anyhow, King of the Hill season premiere tonight! @Woo!
Jesus Christ. Every time I mess around with hardware or upgrades, I swear I'll never do it again. Then I forget.
My first computer, bought eight years ago now, was a 486 w/16MB of RAM and some amount of HD space. I installed Slackware on it, got a 33.6 modem, and had email and net access. Then a roommate sold me his old P90. It crashed constantly until I figured out I had set the CPU voltage wrong. It took me a long time to figure that out, and I was nearly ready to hurl the thing out the window.
A few years later I upgraded to my current desktop machine, a 333 Celeron overclocked to 450 MHz. The machine is fine unless I open up the case to add/remove/shift something in it; then it will, for a day, spontaneously reboot. I've checked it for shorts and can't find any. I don't know what I'm missing, but I'm sure it would be obvious to someone else.
And now the latest. My wife bought an iMac from her old work a few years ago, and has had problems w/it since. It just crashes for no good reason. It'll work fine for two weeks, then she can't keep it running for more than an hour. So last week I went out and bought her a fairly skookum machine: Athlon 2600 (I think...details to follow), ECS K7S5A mobo, 60GB HD and 256 MB RAM.
I got it all home and assembled it. The mobo and Red Hat 9 (not my favourite, but great for my wife) called the CPU a 2000 (1.6GHz instead of 2.0), so I looked around and decided a BIOS upgrade would be in order. Did that and promptly lost the back USB -- bad, since her keyboard and mouse are USB. The front ones, hooked up to the pins on the motherboard, still worked. Tried rolling the BIOS back, but nothing: the back, onboard USB just didn't go. Fuck.
So I went out and got some additional USB risers a few days later. I added them; no problem. Then I had to add a connector from the CDROM's audio to the motherboard. I made the mistake of removing one of the USB connectors while the power was still on. Didn't even think; just did it. Now the BIOS freezes at "Checking NVRAM...". Flashed the CMOS half a dozen times, left it off most of the night while we went to see Finding Nemo (not as good as Monsters, Inc., but still well worth it), and no change.
Today I'm going to stop by my new hardware supplier of choice|http://www.ntcw.com/ to pick up a Gigabyte 7VAX. We'll see if I got ripped off on the CPU or what.
Mostly, though, I am not going to fuck with this computer again. I mean it this time.
So here it is, 8.30pm, and I'm restoring a Cobalt Raq 4 to something approaching virginity. It belongs to a colo'd customer, and it got cracked; we offered, for a modest cost, to restore it, and here I am.
It's Linux under the hood of course -- Red Hat, or at least they use RPM --and it's interesting to see what's been done with it. The management page is pretty slick, though it always leaves me wanting to log on. To do that, I need to telnet -- shudder -- and of course the cust. hasn't got SSH on it. (Confirmation that we had a cracker was nmap showing lots of open ports that responded with an SSH banner. Seems weird to me that a cracker would install ssh, but oh well.) But all the web functionality seems to be there, and it seems pretty and easy to use.
The cust. kept up to date with the patches from Sun (part of what I'm reinstalling right now), but I think there's still a few holes; I'm pretty sure there's an old version of Apache, for instance. And would it kill them to have OpenSSH? Or firewalling tools?
Anyhow, it's the first time I've worked with an automatic patch installer that wasn't Windows, and I must admit I'm impressed. Download the patch -- which is a tarball of script + rpms + patches -- clicky-click install on the web interface, and away you go. I'm sure it's not news for most of you, but it's neat for me. The only thing is that it reboots between a lot of them -- c'mon guys, I thought this was Linux! :-)
Random idea for a program: I'm hooked up to this thing by a crossover cable to another Linux box, just to keep it off the 'net while it's having everything reinstalled. I telnet in occasionally to make sure things are working, but the damn prompt always takes so long to come up. It's the Raq doing a reverse lookup on my DNS, of course, but because it's just on an Xover cable it sits there until the queries time out. We're talking a minute or so to time out, which is unacceptable. I'm an important man, after all.
So my idea is to have a program listening for queries like that and answering them, masquerading as whatever DNS server the query was directed at. Basically, just fake 'em out with whatever info they want. In cases like this (which I can see coming up, oh, at least once a year), it'd speed things up immensely. Anyone heard of anything like this, or is it just full of Crak(tm)?
...urghh. Just rebooted for a patch that alleges fixing Apache and OpenSSL problems. Why the hell does this need a reboot?