The Life of a Sysadmin

Carousel is a lie!

Entries from June 2003.

I found it. Do I get a prize?
6:30 Tuesday 03 June 2003

So I was trying to set up a diskless boot system on FreeBSD at work last week for The Thing. I'd found this article on booting FreeBSD with PXE, and it was nearly all I needed...except that this was about installation, and what I needed was a working system.

So I started fooling with it and trying to figure out how to add things like individual swap files and configuration information -- The Thing is going to have up to ten or so computers booting disklessly -- and then I came across a passing reference to /etc/rc.diskless1 and /etc/rc.diskless2. Bless their pants, the good folks at FreeBSD had already come up with a way of doing all this, and pretty much all I had to do was read those two scripts and /usr/share/examples/diskless/clone_root.

I got the hang of it pretty quickly and (mostly -- I have a habit of not following each and every instruction every single time, which is why Automation Rox) got it working...sort of. Something was going wrong and I couldn't figure out what it was.

See, what happens is you create, on the server, /diskless_root, which has a copy of almost everything the diskless unit is meant to use for a filesystem. /var is a memory filesystem (MFS) populated on boot with canonical files (see /etc/mtree), and /tmp is a symlink to /var/tmp. /etc, in turn, is filled with files specified in /conf on the server: you can set a default, then add stuff specific to particular hosts specified by IP address. And if a file called diskless_remount is around in the right place, /etc will be filled first with the base files you'd expect, then overlaid with the stuff in the default section, then the host-specific stuff.

Only that last part wasn't happening. Somehow it'd get to the part where it was meant to copy the base files, then all these errors would pop up. I was convinced I was doing something wrong, but I couldn't figure out what. So, single-user then sh -x /etc/rc.diskless1 2>&1 | less. (God, I love the -x flag for sh/bash. Lovelovelove.) I print out the scripts so I can follow along. I follow along.

And I found a bug!

At one point it checks to see if the diskless_remount file is around; if it is, it creates the MFS, then proceeds to fill it. It uses eval a couple times in its checks, which always throws me; I've never been clear on what eval is for. But I muddled through it, then realized there needed to be a second slash on one line. Without it the MFS creation subroutine doesn't get the proper arguments.

Okay, no big deal; it's not like I found a ticking time bomb in the centre of the earth and had to alert everyone within 24 hours. But I was happy I was able to recognise it. I was all prepared to send in a PR when I found this one. Welp, there goes fame and fortune. :-)

In other news, it looks like FreeBSD does not like USB keyboards; we've got we're using for The Thing, where the PS/2 socket is inaccessible, but FreeBSD immediately panics and dumps registers when it finds it or when we plug it in. I'm going to see about maybe getting a core dump from it using Michael Lucas' excellent instructions.

Original entry.

Tags: slashdot.
Super NIS weirdness
13:13 Friday 06 June 2003

So I'm working on The Thing today, and it's decided that the automount daemon needs to be set up on The Inside Things. (The Inside Things are in a separate network, with The Outside Thing acting as a gateway between them and the rest of our internal network. And so everyone knows, The Thing is running FreeBSD.) It's not going to be left like this when The Thing is deployed, but it's handy for right now.

Only The Inside Things are on a separate network -- 10.0.0/24, as opposed to 192.168.0/24 for the rest of our network -- so ypbind isn't working. I'm not too familiar with NIS/NFS, so this is taking me a while to figure out.

Eventually I decide that I need to enable NIS for amd to work, and to do NIS I need to bind to the right server. Well, in the man page for ypbind I see the -S option: bind to a particular server. Should work, right?

So I boot The Inside Thing, and do these commands:

domainname thing ypbind ypset -h localhost -d thing 192.168.0.1

At the same time I'm running tcpdump on The Outside Thing to watch what happens, because these commands aren't working. And I see the weirdest thing: packets going to another, completely foreign IP address, port 111: RPC.

I scratch my head, try again: same thing. Reboot The Inside Thing, try again: same thing. The Inside Thing is running nothing more than NFS and SSH, I'm the only one on it, and still it keeps going to this IP.

I look up the IP address and it belongs to the Washington State Department of Transportation. WTF?

Try it on The Outside Thing -- unnecessary, since it's running amd quite happily, but I want to see what happens. Same thing.

I check the source code for ypset on the off chance that Theo de Raadt (he wrote it) put in some kind of trojan to...I don't know, ask for his driveway in Seattle to be plowed. Nothing -- but then, my rule of thumb has always been "If you're looking at source code, you're in over your head." (True for me, and if the source code is written in anything other than Perl or Bash. Still learning.)

I have no idea what the hell was going on. Anyone?

Flash! Just tried it at home on my FreeBSD gateway: same results. Jesus.

Original entry.

Tags: slashdot.
What's wrong with this code?
8:41 Saturday 14 June 2003

In the grand tradition of ryanr's journal, let's see who figures out what's wrong with this Bourne shell script. First prizeis a Cadillac Eldorado. Second prize is a set of steak knives. Third prize is Cowboy Neal fires you.

Background: yesterday at work, my home-grown backup system choked when it tried to burn a 740MB ISO to CD. (Still waiting for a tape drive.) I decided to finally implement the long-delayed exclusion of certain files (.core, .o, etc.). I know, I know, should've put it in from the start, but this was the first time it became an issue.

Anyhow, I was testing to see if my changes worked, and they didn't: the files were not being excluded. What was doubly weird was that I could run, on its own, the command the script was running, and it would work fine: everything that should have been excluded was. I finally boiled it down to this script:

    #!/bin/sh

    TAR_EXCLUDE="--exclude='*.core' --exclude='*.a'
    --exclude='*.o' --ignore-case --exclude='*cache*'"

    # This command works:

    /usr/bin/tar cvj --exclude='*.core' --exclude='*.a'
    --exclude='*.o' --ignore-case --exclude='*cache*' -f
    backup.tar /home/foo

    # And this command doesn't:

    /usr/bin/tar cvj $TAR_EXCLUDE -f backup.tar /home/foo

I tried putting echo behind each command, and I tried putting -x in the shebang; both showed the same output for both commands. (That make sense?)

What did I do wrong?

Original entry.

Tags: slashdot.
/dev/console and the MIBs of Heaven
19:18 Thursday 26 June 2003

Interesting problem with The Inside Thing at work today. The Inside Thing runs headless -- no video card, no serial port (yet...sigh), connections only over IP. SSH to The Inside Thing has been no problem, and I never thought it would be.

One of the developers today loaded a debugging version of the FreeBSD kernel module he's working on, and found that it really slowed things down: a test script that would complete in a second, using the non-debugging module, would take a minute or more to run; in addition, the whole system would slow down to the point of near-unusability. WTF?

The debugging version does a lot of kernel printfs (I'm not a developer, so forgive me for any imprecision in language here). Logging is done to two places: /var/log/whatever, and over UDP to The Outside Thing, which has its syslog daemon listening to port grep syslog /etc/services. /var, on The Inside Thing, is just this big (32 MB) memory filesystem, so that shouldn't be a problem. And the network connection is gigabit ethernet, so that shouldn't be an issue.

I ran fstat while the program was running; it showed nothing unexpected: files open in /usr (where the program lives), the developer's home directory (NFS), /dev/insidething and /var/log/whatever. But run systat -vm, and hey, what's this: tons of interrupts on sio0.

This didn't work:

rm /dev/cuaa
mknod /dev/cuaa0 c 2 2

So on to less drastic measures.

I tried upping the serial port speed (we'd turned it on, but still haven't got a socket we can hook to yet) from 9600 to 115200 in /etc/ttys, and HUPping init; no change. (Incidentally, to get the serial port working on another FreeBSD machine over a null modem cable, I had to set it to 9600.3wire; strange. Or perhaps not.)

My boss came by at that point, and told me that the kernel printfs were not affected by stuff like getty and init; instead, there was a kernel option or possibly a sysctl that set that. Sure enough, look around and there's machdep.conspeed: 9600. Set it to 115200 and whee, look at things go! The debugging program ran in 30 seconds, which by this point seemed like a definite improvement.

I experimented a bit and found the highest machdep.conspeed could be set to something like 118900. Like before, this was better but by no means great. Then my boss came in again and announced a new sysctl MIB, greater than all the rest. This one was The Light, and the other one only came to announce The Light to the world:

kern.consmute

Set to 1, and all those kernel printfs still get logged to syslog, but never slow down The Inside Thing. I'm assuming that all this was trying to go out over the serial port after FreeBSD detected no video card...but I'm the first to admit that's probably a crack-addled dream.

Original entry.

Tags: slashdot.
Slashdotting your P200 HOWTO
19:08 Friday 27 June 2003

So I'm the Geek You Know for about six friends, and one of them needs a website. I volunteer my li'l server, no problem. But she wants something she can manage herself, and as she's not a geek that means something easy on the eyes and easy to use.

So I start looking at (can't use this phrase w/o gritting my teeth) content management systems, c/o Freshmeat and OpenSourceCMS.com (this site rocks). I've tried out two or three so far, and all have had the same results: my li'l server is dreadfully overworked.

It's a P200, 48MB of RAM, and it's fine at serving up static content: most of my site and my wife's site is just that, so it's not a problem most of the time. But start throwing some MySQL into the picture, and things slow down fast.

I'd settled, sort of, on Back-End as a likely contender; I liked its management pages, can't beat the street cred when CUPE uses it, there's an integrated gallery, and the installation went well. But when I tried it out...holy crap, it was slow: 10 seconds to throw up a page. Admittedly, better than the 30 seconds with some other packages, but still.

I figured it was time to move the database to the faster computer. I've got a Celeron, 450MHz, 384MB RAM, that I use for my desktop. Wasn't doing much besides 87 xterms and setiathome, so I figured out how to move it over there. Still slow. Well, decided to try some benchmarks. That's what separates us from the animals.

ab, against my own (static) site, shows 1000 requests being served, none dropped, concurrency 5, in 26 seconds. Against the Back-End demo site I set up, it timed out. I upped the timeout; same thing. In frustration I set concurrency to 500, set up iostat and top to watch on both the database and the web server (fancy!), and waited.

And waited.

After five minutes, the SSH session showing the stats on the P200 stopped. (Load on the Celeron and its disks barely registered, BTW.) I logged in via the KVM to see what was happening, and the answer is: not much. "eth0: card reports no resources", whole lotta processes being killed due to lack of memory, and fifteen minutes it is still chugging its way back to freedom. (Don't want to reboot and ruin that 60-day uptime...)

So: My questions to all of you are:

  1. If it's not the CPU dedicated to the database that's the bottleneck, what is? Is it the PHP processing? Should I be moving the web server to the faster processor?
  2. Anyone know a light -weight CMS system? Don't need a lot of bells and whistles; mostly this'll be a way of editing a mostly static site. Comments, polls, user journals...eh.

My thanks in advance for whatever help you can provide.

Original entry.

Tags: slashdot.
Hardware demons
30 June 2003 12:00:00 PST

Jesus Christ. Every time I mess around with hardware or upgrades, I swear I'll never do it again. Then I forget.

My first computer, bought eight years ago now, was a 486 w/16MB of RAM and some amount of HD space. I installed Slackware on it, got a 33.6 modem, and had email and net access. Then a roommate sold me his old P90. It crashed constantly until I figured out I had set the CPU voltage wrong. It took me a long time to figure that out, and I was nearly ready to hurl the thing out the window.

A few years later I upgraded to my current desktop machine, a 333 Celeron overclocked to 450 MHz. The machine is fine unless I open up the case to add/remove/shift something in it; then it will, for a day, spontaneously reboot. I've checked it for shorts and can't find any. I don't know what I'm missing, but I'm sure it would be obvious to someone else.

And now the latest. My wife bought an iMac from her old work a few years ago, and has had problems w/it since. It just crashes for no good reason. It'll work fine for two weeks, then she can't keep it running for more than an hour. So last week I went out and bought her a fairly skookum machine: Athlon 2600 (I think...details to follow), ECS K7S5A mobo, 60GB HD and 256 MB RAM.

I got it all home and assembled it. The mobo and Red Hat 9 (not my favourite, but great for my wife) called the CPU a 2000 (1.6GHz instead of 2.0), so I looked around and decided a BIOS upgrade would be in order. Did that and promptly lost the back USB -- bad, since her keyboard and mouse are USB. The front ones, hooked up to the pins on the motherboard, still worked. Tried rolling the BIOS back, but nothing: the back, onboard USB just didn't go. Fuck.

So I went out and got some additional USB risers a few days later. I added them; no problem. Then I had to add a connector from the CDROM's audio to the motherboard. I made the mistake of removing one of the USB connectors while the power was still on. Didn't even think; just did it. Now the BIOS freezes at "Checking NVRAM...". Flashed the CMOS half a dozen times, left it off most of the night while we went to see Finding Nemo (not as good as Monsters, Inc., but still well worth it), and no change.

Today I'm going to stop by my new [hardware supplier of choice|http://www.ntcw.com/] (well recommended if you're in Vancouver; prices nearly as good as Atic, but much better service) to pick up a Gigabyte 7VAX. We'll see if I got ripped off on the CPU or what.

Mostly, though, I am not going to fuck with this computer again. I mean it this time.

Original entry

Tags: hardware, linux, upgrades.

RSS Feed