The Life of a Sysadmin

Carousel is a lie!

Entries from August 2005.

New firewall machine soonish
2005-08-03 18:49:34

A while back a friend of mine moved out of town, and gave me some of his old computers. He's a Mac guy, so they were a bit different from all the x86 stuff I've got around me. One of them was a Umax c500/180, one of the few (so I understand) Apple clones. It's got a 180MHz 603e processor, I think 92 MB of RAM, 2 PCI slots, and a 2GB hard drive -- which is currently split between OS8 and an old version of Yellow Dog Linux. Being the perverse bastard I am, I've decided to try and get Gentoo going on it (from Stage 1, baby! yeah!) and turning into a firewall machine. I'm picking up the Apple terminology slowly. It's taken a good month to bang into my head that this is an Old World machine -- which means you can't just throw in a CD and boot from it. Oh no. You need to use BootX, an Apple extension that, early in the boot process, asks if you want to keep running the Mac OS or switch to Linux. To add a new kernel -- like, say, your install CD kernel -- you need to copy important files into the right folder in Mac OS. Not only that, but because the machine is old and funky enough, you need to download certain non-schedule kernels to get it to work...but then it works great. My kernel args were a little different:

cdroot root=/dev/ram0 rw init=linuxrc loop=livecd.squashfs looptype=squashfs console=tty0 nodevfs udev video=imsttfb

But hey! Worked! At least, until I started getting SIG11 errors...time to rip out some memory, I guess. Update: Huh...reseating the RAM and the CPU seems to have done the trick. Currently compiling binutils with no problem.

No tags
Just when you're being too snotty
2005-08-03 18:51:43

/dev/da0s1e: phase 2: dir 700529 of 808563 (86%)

  1. I hate fscks that don't provide a progress report.
  2. When you're fscking, don't fucking touch the keyboard. Especially don't hit the up arrow or the enter key so that the forty-minute fsck is repeated; you'll have to kill it and the disk will be marked dirty.
  3. Make sure power cords can't be removed from servers.
  4. "Server" means, among other things, "redundant power supplies". Yes, it does.
  5. Non-journaled filesystems are for the fucking birds.
  6. Make sure that your SCSI cables are firmly fastened.
No tags
God, I hate outlook
2005-08-04 17:20:32

Problem: Outlook 2003 User gets a message from System Administrator saying that his message to a coworker is undeliverable -- something about relaying denied -- and asks me why this happens. Pretty simple, right? Just get him to forward the message and then check the logs. Only no, it's not simple: despite twiddling all the bits you're supposed to, I keep getting the message attachments in MS' TNEF format. I use Mutt. I give up and decide to look at Outlook itself. (Yes, I know about the decoder scripts you can get, but I was being bullheaded.) Now we have the problem of getting the proper Internet headers in the email. (I've given up trying to persuade Outlook that I am ritually pure enough to look upon the shining glory that is The Message Source without melting like some kind of Nazi-collaborating French archaeologist; it doesn't work.) A quick Google turns up three suggestions: a $24.95 VB plugin, giving up entirely, or right-clicking and selecting Options, then looking at the box that sez Internet Headers. I'm game, so I try right-clicking and choosing Options. There's the Internet Headers box, all right, but it's empty. WTF? I look around, but there's nowhere else to clickyclick. I try right-clicking on another message, and sure enough it shows the headers. I try selecting the From address in the problem message (helpfully labelled as "System Administrator", which I'm pretty sure is a bald-faced lie), then Properties. It just says it's from System Administrator, and shows no actual, real email address. You remember...email, one of the things Outlook is meant to do? Then it dawns on me: the mentioned-in-passing comment from the user that the message is probably from Outlook itself is true. I'd thought this was just a friendly gloss on an unfriendly message, but it wasn't. This fucking message is from Outlook. And it's not until I skip ahead and tell you the exciting conclusion -- it was our mail server refusing to relay and saying so, something never not once mentioned in the offending message -- that you're going to realize the full horror of the situation. For Outlook does not only mangle email, and hide attachments in weird files called "winmail.dat", and shake its baloney all over the place like a drugged-out Hula girl in the "Before" picture in all those rehab clinic advertisements. No. That is not all. Outlook -- the mail client -- also takes error messages from mail servers and disguises them as email messages that have just arrived, rather than showing the user the fucking error as an error when and as it occurs! It hides the origin of the error by pretending to be some non-existant sysadmin when it sends this message! And it does nothing to indicate that this false email is any different from the other messages from Bill and Bob and Ted littering your inbox about horizontal opportunity mission statements, complete with an animated surfing guy for Bob shouting "Whoah!" to differentiate his mangling of the English language from Ted's, leaving me to wonder what the fuck kind of congenital brain damage must've been at work to make this seem like a good idea to anyone. Fuck me, I hate Outlook.

No tags
NWR04B: Toolchain Problems Redux
2005-08-08 20:17:27

July 9th: arm-elf-tools-20030314.sh from uClinux.org. Busybox fails when run as init with "undefined instruction" and "unknown data abort code." July 10th: toolchain from hri.sourceforge.net. BB fails with "undefined instruction". July 11: uClibc buildroot script. No copy of elf2flt. Tried latest snapshot, which does have elf2flt, but it failed to install. July 24th: uClinux toolchain again. BB fails with "bad data abort", "unknown data abort code" and "obsolete system call". Possibly including different versions of unistd.h? July 26th: ptxdist. uClibc appears to be built with for CPU with MMU/FPU, ignoring values set in original menuconfig. July 30th: 3.4 toolchain from uClinux site (hidden!). Tried compiling the kernel with this toolchain, but far too many errors relating to change in behaviour from 2.95.3. BB failed with "relocation outside program". STL failed to build. God, this is pissing me off.

No tags
/etc/ld.so.nohwcap
2005-08-14 12:35:39

Now this is interesting. Guy comes up to me at work and sez, "Hey, that new Linux machine is really slow." How can that be? It's an umpty-GHz processor with a gig of RAM, a nice hard drive and the same 100Mb/s connection to the network that the FreeBSD machine beside it has. "It's just slow." Slow how? Doing what? "It's just slow -- all the time." Eventually we got it down to a working demonstration: log in. The developers've got a fairly intricate set of .cshrc files, so they echo some progress reports: Setting FOO...setting BAR...setting LD_LIBRARY_PATH...only it's taking for-freaking-ever -- well, relatively speaking: 8 seconds sez /usr/bin/time. cf. ~2 s. on the (if anything, slower) FreeBSD machine right beside it. WTF? At first I started looking at the rc scripts. By deking out various bits, I could see where the 8 seconds was coming from -- half a second there, two and a half seconds there...but then I came to my senses and realized that looping over half a dozen items should not be causing this kind of nonsense. I checked DMA on the hard drives. Aha, they're off! But all the home directory access is over NFS, so it's probably not much of an issue. And hdparm -d /dev/hda 1 just came back with permission denied (even as root...I seem to remember something about newer Intel chipsets having DMA built in), so I left that out. Out of desperation more than anything else, I tried running strace /bin/csh /bin/echo foo -- and hot damn if we're not trying to open 209 different directories to find libncurses! Holy hell! And what is this happy crappy? It's checking out a crapload of directories we haven't even told it about. For example, check out what it does for this one element in LD_LIBRARY_PATH, /home/foo/lib/bling:

open("/home/foo/lib/bling/tls/i686/mmv/cmov/libncurses.so.5", O_RDONLY) = -1 ENOENT (no such file or directory)
open("/home/foo/lib/bling/tls/i686/mmv/libncurses.so.5", O_RDONLY) = -1 ENOENT (no such file or directory)
open("/home/foo/lib/bling/tls/i686/libncurses.so.5", O_RDONLY) = -1 ENOENT (no such file or directory)
open("/home/foo/lib/bling/tls/libncurses.so.5", O_RDONLY) = -1 ENOENT (no such file or directory)
open("/home/foo/lib/bling", O_RDONLY) = -1 ENOENT (no such file or directory)
open("/home/foo/lib/bling/i686/mmv/cmov/libncurses.so.5", O_RDONLY) = -1 ENOENT (no such file or directory)
open("/home/foo/lib/bling/i686/mmv/libncurses.so.5", O_RDONLY) = -1 ENOENT (no such file or directory)
open("/home/foo/lib/bling/i686/libncurses.so.5", O_RDONLY) = -1 ENOENT (no such file or directory)
open("/home/foo/lib/bling/mmv/cmov/libncurses.so.5", O_RDONLY) = -1 ENOENT (no such file or directory)
open("/home/foo/lib/bling/mmv/libncurses.so.5", O_RDONLY) = -1 ENOENT (no such file or directory)
open("/home/foo/lib/bling/cmov/libncurses.so.5", O_RDONLY) = -1 ENOENT (no such file or directory)

That's right, folks, the preloader has taken it upon itself to do some kinda combinatoric search for hardware-optimized libraries, and as a result a measly thirteen entries in LD_LIBRARY_PATH turn into two hundred and nine. Add to that the aggravation of all these entries being in people's home directories which are NFS-mounted from elsewhere, and it's not too hard to see why the hell it's slow. To be fair, this is meant to do something good: you can have libraries compiled for different hardware capabilities (HWCAP is the term to search for), which I imagine would be handy if you want one disk image for a bunch of different hardware. The trouble, of course, is that you get into these ridiculously long lists of directories that might exist...at least, if you're not using Alpha CPUs. Fortunately, the folks at Debian have anticipated my whining and have done something about it. Unfortunately, it's SOOPERSEKRIT. Fortunately, I dug it out of a very cranky email to debian-devel:

# touch /etc/ld.so.nohwcap

With that in place, the formerly-plodding test that took 8 seconds to finish now runs in 1.5. And that is one hell of a performance gain from just touching one file.

1 comments. No tags
dmesg -n 1
2005-08-16 10:58:30

From dmesg(8):

 -nlevel
              Set the level at which logging of messages is done to  the  con-
              sole.   For  example,  -n  1 prevents all messages, expect panic
              messages, from appearing on the console.  All levels of messages
              are still written to /proc/kmsg, so syslogd(8) can still be used
              to control exactly where kernel messages appear.   When  the  -n
              option  is  used,  dmesg will not print or clear the kernel ring
              buffer.
No tags
error loading ucdata (error -127)
2005-08-17 12:57:35

When installing openldap22-server on FreeBSD from ports, I got this error when starting slapd:

error loading ucdata (error -127)

This is a permission problem with the directory /usr/local/share/openldap/ucdata; changing the group ownership cleared it up.

No tags
1 != 2
2005-08-26 12:00:36

I love cfengine. If you haven't checked it out yet, do so. You can do really neat stuff like this:

editfiles::
        { /etc/Xprint/C/print/attributes/document
                BeginGroupIfNoLineMatching "^\*default-printer-resolution: 300"
                        CommentLinesMatching "^\*default-printer-resolution: 600"
                        LocateLineMatching "^# \*default-printer-resolution: 600"
                        InsertLine "*default-printer-resolution: 300"
                        DefineInGroup restart_xprint
                EndGroup
        }

shell::
        debian.restart_xprint::
                "/etc/init.d/xprint restart"

(Which, by the way, totally fixes the problem of Debian printing 'way huge stuff. Bug number 262958. You should totally look it up.) Look at that. It's lovely. It's obvious what it's looking for, what it'll do if it can't find it, and what'll happen after that. And it does it automagically. At night. From cron. The way God intended all system administration to be done. However -- and I cannot emphasize how important it is to keep this in mind -- it is absolutely NFG reading the documentation for an hour trying to figure out why the DefineInGroup statement just does not work if:

  1. you're reading the docs for cfengine v2, and
  2. you're working with cfengine v1.

It's my own fault for printing out v2 docs and not thinking much about it. However, in my own defense it would be nice if cfengine would complain about something it appears not to recognize. Not even with -d2 (which produces output along the lines of CheckingDateForSolarEclipseToday [no]) did it whisper a word about this.

Tags: cfengine.
NWR04B: Oh so very close
2005-08-27 00:37:59

Okay, so the solution appears to have been there all along:

CFLAGS_PIC="-msingle-pic-base -fpic"
CFLAGS="-O2 ${CFLAGS_PIC}"
LDFLAGS="-Wl,-elf2flt ${CFLAGS_PIC}"

Compile busybox with that and the HRI/Snapgear toolchain, and I get this when it boots:

IP-Config: Complete:
      device=eth0, addr=192.168.23.12, mask=255.255.255.0, gw=255.255.255.255,
     host=test, domain=, nis-domain=(none),
     bootserver=192.168.23.254, rootserver=192.168.23.254, rootpath=
NetWinder Floating Point Emulator V0.95 (c) 1998-1999 Rebel.com
Looking up port of RPC 100003/2 on 192.168.23.254
Looking up port of RPC 100005/1 on 192.168.23.254
VFS: Mounted root (nfs filesystem).
Freeing init memory: 48K
Using fallback suid method
init started:  BusyBox v1.00 (2005.08.27-07:03+0000) multi-call binary

...and there it hangs -- gotta get the console right. Still, this is a fuck of a lot better than the various errors I was getting. Can't believe I missed what was right in front of my face. This is with, notice, no softfloat options for GCC. The trick appears to be using -msingle-pic-base -fpic in LDFLAGS -- I'd had those earlier in CFLAGS alone, and it wasn't working then.

No tags
NWR04B: SHELL!
2005-08-27 13:10:49

Just under eight months:

IP-Config: Guessing netmask 255.255.255.0
IP-Config: Complete:
      device=eth0, addr=192.168.23.12, mask=255.255.255.0, gw=255.255.255.255,
     host=test, domain=, nis-domain=(none),
     bootserver=192.168.23.254, rootserver=192.168.23.254, rootpath=
Looking up port of RPC 100003/2 on 192.168.23.254
Looking up port of RPC 100005/1 on 192.168.23.254
VFS: Mounted root (nfs filesystem).
Freeing init memory: 52K
Using fallback suid method
init: Bummer, can't write to log on /dev/tty5!
console=/dev/console
init started:  BusyBox v1.00 (2005.08.27-16:28+0000) multi-call binary
command='-/bin/sh' action='4' terminal='/dev/console'

init: Bummer, can't write to log on /dev/tty5!
Starting pid 9, console /dev/console: '/bin/sh'
Using fallback suid method

BusyBox v1.00 (2005.08.27-16:28+0000) Built-in shell (lash)
Enter 'help' for a list of built-in commands.

/ # busybox
NFS: giant filename in readdir (len 0x80000001)!
Using fallback suid method
BusyBox v1.00 (2005.08.27-16:28+0000) multi-call binary

Usage: busybox [function] [arguments]...
   or: [function] [arguments]...

        BusyBox is a multi-call binary that combines many common Unix
        utilities into a single executable.  Most people will create a
        link to busybox for each function they wish to use, and BusyBox
        will act like whatever it was invoked as.

Currently defined functions:
        [, busybox, cat, cp, date, dd, dmesg, du, echo, egrep, false,
        fgrep, find, free, getty, grep, hexdump, ifconfig, init, kill,
        lash, linuxrc, login, ls, mkdir, mknod, more, mount, mv, netstat,
        passwd, ping, ps, pwd, rm, rmdir, route, sed, sh, strings, stty,
        su, tail, tee, test, time, top, touch, true, tty, umount, uname,
        uptime, vi, whoami, xargs

/ # ping -c 5 192.168.23.254
Using fallback suid method
PING 192.168.23.254 (192.168.23.254): 56 data bytes
64 bytes from 192.168.23.254: icmp_seq=0 ttl=64 time=0.0 ms
64 bytes from 192.168.23.254: icmp_seq=1 ttl=64 time=0.0 ms
64 bytes from 192.168.23.254: icmp_seq=2 ttl=64 time=0.0 ms
64 bytes from 192.168.23.254: icmp_seq=3 ttl=64 time=0.0 ms
64 bytes from 192.168.23.254: icmp_seq=4 ttl=64 time=0.0 ms
- 192.168.23.254 ping statistics ---
5 packets transmitted, 5 packets received, 0% packet loss
round-trip min/avg/max = 0.0/0.0/0.0 ms
--/ # ls -l
Using fallback suid method
drwxr-xr-x    2 1000     1000         1304 Aug 27  2005 bin
drwxr-xr-x    2 1000     1000          312 Aug 27  2005 dev
drwxr-xr-x    3 1000     1000          176 Aug 27  2005 etc
drwxr-xr-x    2 1000     1000           48 Jul  5  2005 lib
lrwxrwxrwx    1 1000     1000           11 Aug 27  2005 linuxrc -> bin/busybox
drwx------    2 1000     1000           48 Jul  5  2005 lost+found
drwxr-xr-x    2 1000     1000           48 Jul  5  2005 mnt
drwxr-xr-x    2 1000     1000           48 Jul  5  2005 proc
drwxr-xr-x    2 1000     1000           48 Jul  5  2005 root
drwxr-xr-x    2 1000     1000          344 Aug 27  2005 sbin
drwxr-xr-x    2 1000     1000           48 Jul  5  2005 tmp
drwxr-xr-x    4 1000     1000           96 Jul  9  2005 usr
drwxr-xr-x    2 1000     1000           48 Jul  5  2005 var

I am the greatest man IN THE ENTIRE WORLD.

No tags
NWR04B: How'd that happen?
2005-08-28 09:21:48

After getting the CFLAGS fixed up, the last step before a working shell was getting init going -- either just saying something like init=/bin/sh in the kernel command line, or else getting init proper working. The first didn't work, so on to the second. First I looked at /etc/inittab on the filesystem. This message suggested that a very simple inittab should work just fine:

::askfirst:-/bin/sh

However, it wasn't working: the last message I got during the boot process was the BusyBox banner, and then nothing. I could ping it, but it wasn't responding to anything on the keyboard. I turned on debugging in init/init.c (#define DEBUG_INIT 1 up at the top), then started throwing in messageD(LOG|CONSOLE, "FIXME: Made it here") at various spots. I could see that init was running, and it was parsing /etc/inittab -- good. (Oh, should also mention that since the router is currently mounting its filesystem by NFS, running tcpdump host [ip address] | awk '/"/ {print $NF}' showed me the files it was trying to get -- which also showed inittab.) Okay, so move on to actually running the damn programs. That takes us through init_main() and run_actions()...yep, messageD shows we're getting there, too. From run_actions() we go to run()...and here's where we run into problems. run() basically blocks signals then runs fork()like so:

        if ((pid = fork()) == 0) {
               /* run the damn program */
        }
        return pid;

A few more messageDs showed we were reaching the other side of the if block w/o any problems, but didn't seem to be actually going inside. init kept trying, about once a second, to start up the programs in inittab but it was failing each time. And then I remembered: uClibc does not implement fork(); instead, it uses vfork(), which blocks the parent until the child exits, or calls execve(). (Here's a good explanation.) So what if we do:

        if ((pid = vfork()) == 0) {

Well, hot damn -- it works!

No tags
NW04B: What next?
2005-08-28 13:18:02

I've had a few questions about what I actually plan to do with this thing now that I can get a shell running. I've been thinking about this for a while, and here, in no particular order, is what I want to do: Network access: It'd be cool if SSH (preferred) or telnet would work on this. And hey, it'd be handy (okay, for certain values of "handy") to have a web server you could just plug in anywhere, especially if it was combined with a USB flash drive (see below). Get a filesystem embedded in flash: I went with NFS mounting the filesystem because I had no idea how to embed one in flash. I still don't, plus I'm unsure how this all interacts with the bootloader that's on there...my memory is a bit hazy at this point, but I think that the BL crapped out with a sufficiently large kernel at one point (maybe one I'd compiled with debugging?). The image I'm uploading now is 513KB, which leaves about 1.5MB of flash left for the filesystem. Since NFS works right now and has tons of room, I figure I can experiment 'til I figure out what I'm doing, then make an entirely self-contained image. Provide an image for other people to upload to their own routers: 'cos what good is doing all this if I can't share? There are a few other people with this router who have (I think) been following this, so there ought to be some small amount of interest in this. I'd like to do the same with the devel stuff -- at least have a nice tarball for Busybox, the kernel and an NFS filesystem, say. (If you're impatient, email me.) Make the world's first Beowulf cluster of wireless routers: 'cos I'd like my 15 minutes of fame, please. Slashdot, here I come! Turn it into a firewall/wireless access poing: This thing is small, doesn't consume much power, and is silent. It's got five ethernet ports and a wireless card with a GPL'd driver. How cool is that? A Linux firewall'd be nice and flexible, and it'd be nice to (say) only allow SSH/SSL on the wireless card. I'm curious to see how much additional memory firewall rules will take up, and if I can get something like tarpitting working on it without sucking up all the RAM. More hacks: This chip has 2 UARTS and USB. It'd be cool to, say, add a USB flash drive to this thing; I've got a 64MB one lying around that I used to get my XBox running Linux, and a 64MB filesystem would be huge compared to what I can fit in 1.5MB. What about breaking out the second UART to a serial port? Can we add more RAM? And the CPU can run at different clock speeds -- what happens if we play around with that?

No tags
NWR04B: First release
2005-08-30 17:44:29

I've put together a couple downloads for the NWR04B. The first is the whole tarball of source code -- BusyBox, uClinux, plus some glue. You can use this to compile your own images. It's about 29MB. The second is just the bare minimum: the firmware image, the root directory for the router, and some instructions. It's about 685KB. You can find both of them here: http://saintaardvarkthecarpeted.com/nwr04b/download The downloads have been signed with key 0x4705C9C7 and checksummed with SHA1.

No tags

RSS Feed