The Life of a Sysadmin

Carousel is a lie!

Entries from October 2005.

NWR04B: My descent into little-endian binary arithmetic hell
2005-10-01 13:54:53

Currently writing this entry in emacs. Once upon a time, I stopped using emacs for fear of what loading a 20MB editor would do to the mail server I was working on, and learned to love vi. Prompted by ESR's Art of Unix Programming, I've decided to try pick up emacs again. It's interesting....Anyhow: Right now I'm trying to figure out why the hell writing to flash on the NWR04B is not working. First off, I've edited the map file for the flash devices (drivers/mtd/maps/cx84200-flash.c for those of you playing the home game) so that I've got two partitions declared:

static struct mtd_partition cx84200_partitions[] = {
        {
                name:           "bootloader",
                size:           0x00020000,
                offset:         0x00000000,
                mask_flags:     MTD_WRITEABLE, /* force read-only */
        }, {
                name:           "root_fs",
                // Codeman's original:
                // size:        0x000fa000,
                // My efforts at making a root partition:
                size:           0x00040000,
                offset:         0x000f0000,
}

The first I'm not really doing anything with, but it could (as the title suggests) be turned into a bootloader partition someday. The second is where I'm concentrating my efforts. The read-only flag that was originally in there was removed once I figured out it might help matters. :-) Okay, so now what? Well, got a jffs2 image that I created, so let's try the obvious:

# cat test.jffs2 > /dev/mtd1

...and it just hangs. (I still haven't bothered to figure out how to make CTRL-C interrupt a process yet...something to do with the terminal, I think.) Up the debugging output and you see MTD_open, and then nothing. I had a look at the part of the driver (drivers/mtd/chips/sst39vf080.c) to see what's going on here, and I managed to figure it out a bit. The write operation tries to write one byte at a time, then reads it back to make sure it got read. If so, move on to the next byte; if not, try 256 more times (I guess waiting to see if it just needs a moment) and see if that works. If yes, next byte; if not, give up on the write entirely. I threw in some messages to track that, and one that shows what value it's reading back from flash after the write. After throwing in ridiculous amounts of debugging info to track this, it seems that the write of the first byte is simply not working. The write fails, and cat just keeps on trying (or something). A bunch of looking around finally turned up the MTD-JFFS-HOWTO from (I think) the guy who wrote the MTD driver. 'S full of all sorts of helpful hints, like:

Well, fuck. So I follow the directions for the 2.4 kernel support, and figure out how to compile the flash_eraseall utility. Wonderful! Ready to go! Just gotta erase this here partition, and... Only no, that doesn't work: I get the same error re: the byte not being written as before. I'm currently throwing in even more unholy amounts of debugging than before, and teaching myself the simplest bits of binary arithmetic you can image, in order to confirm that, yes, write protection is being turned off...I think. This little-endian thing still confuses the hell out of me. The datasheet sez that, at the address the enable_write() operation is accessing, there are 32 bits set aside for controlling the first bank of flash (which is what we're after here). The 26th bit is write-protect (1 for on, 0 for off). enable_write() reads all 32 bits at that address, &'s it with 0x04000000, and then WP should be off. So the unholy debugging shows that the long int being read:

Okay, so that works. Maybe I'll give the flashcp utility (part of the MTD tree) a try and see how that goes.

Tags: emacs.
Stupid Debian syslog.conf
2005-10-03 19:33:22

Debian. I love the Debian. But the logs in Debian annoy me.

  1. You can't read 'em unless you're part of the adm group, or root. Not right.
  2. Iptables denied packets get logged to kernel.log, messages.log and syslog.
  3. What, precisely, is the difference between kernel.log and syslog? Between daemon.log and debug.log? Why is exim4/mainlog not symlinked to mail.log?
  4. There's the annoying habit of printing far too much to the console. My time and my screen's real estate are precious -- doubly so when I'm in single-user mode (another rant) trying to fix something. The last thing I want is to have precious, precious vi sessions drowned out with kernel: IN=eth2 SRC=24.82.14.99 LEN=64 TOS=0x00 PREC=0x00 TTL=45 ID=40654 DF PROTO=TCP SPT=2678 DPT=445 WINDOW=53760 RES=0x00 SYN URGP=0. Fuck that noise!

FreeBSD, by contrast, has it fucking down. There's messages.log for everything you're likely to need as a normal user -- except mail messages, which are conveniently located at mail.log. For the sysadmin, you've got security.log (firewall stuff), auth.log (login stuff) and all.log (everything). It's simple, easy to understand and you can bloody well ready what you need to without becoming root. Sigh. In other news, thanks to Mr. Dickens I came across Belanix today, a live OpenSolaris CD. I'm currently scrounging around for a spare machien to boot from, as QEMU seems to confuse it.

1 comments. No tags
NWR04B: What the hell am I missing?
2005-10-04 20:01:31

I must really be missing something here, because I am unable to get this thing to write to flash at all. Here's what's going on in the kernel:

  1. Turn off write protection; working. By that I mean that the kernel is successfully able to change a value in memory; the driver for this chip agrees with the datasheet from the HRI project that this is the bit that twiddles write protection.
  2. The kernel tries to write the following mysterious values: 0xaa to 0x5555, 0x55 to 0x2aaa, and 0xa0 to 0x5555. The destination addresses (0x5555 and 0x2aaa) get mapped to the right area of memory: 0x20000000 plus the offset for mtd1 I've set up. Checking these writes show that they fail.
  3. The kernel tries to write the first byte of data copied from the user request. Again, the address gets changed properly (0x20000000 + mtd1 offset). Again, the write fails.

(All this is in cx84200-flash.c, BTW.) I can think of two things...wait, three things...that might be happening:

  1. There's a big change in memory mapping that happens some time after boot. Before The Change, flash begins at 0x0; after The Change, it starts at 0x20000000. I've been assuming, without much evidence, that the onboard bootloader does this flip before loading and running Linux. As ryanr suggested, it may not. In this case, I'd either need to make The Change myself, or else change the memory mappings.
  2. There's some weirdness with little-endianness going on. Datasheet sez it's the 26th bit at 0x4000000 that twiddles write protection; this address is not affected by The Change. Maybe I'm simply counting bits from the wrong end...or something...arghh, this makes my head hurt. I think it's unlikely, though, that the developers would not have accounted for this.
  3. Datasheet's wrong, or the chips not the same. Which'd suck.

Any other ideas, please let me know.

No tags
Blood alone moves the wheels of history
2005-10-04 20:06:21

From the ever-excellent GrigorPDX:

Blood alone moves the wheels of history

The truth is that men are tired of liberty

Don't mess with Texas

The images are originally from this site, an online collection/store of Soviet and Communist propaganda posters. The original images are hypnotizing, especially when (like me) you're fascinated by right-wing politics (and its fixation on left-wing/Communist conspiracies), melodrama and the paranoid style; never let it be said that the only place to find the three together is at Alex Jones' website or Sisters of Mercy lyrics.

These images are not profound observations on Bush2's presidency. It's not fair to compare Bush2 to, say, Stalin. But that doesn't stop them from being very, very powerful.

Tags: politics.
First SuSE!
2005-10-05 20:17:57

Got my first SuSE machine at work (well, not mine, but I'm setting it up), and I'm running into a weird problem with ypbind. If I call ypbind on its own -- no arguments -- it'll work. man page sez it's parsing /etc/yp.conf, which has the line "domain foo broadcast", and sure enough it broadcasts on a nice privileged port and binds to the server for domain foo. If I call ypbind with the -d argument, it stays in foreground, prints debuggin messages and fails like so:

do_broadcast() for domain 'foo' is called
broadcast: RPC: Can't encode arguments.
leave do_broadcast() for domain 'foo'
Signal (2) for quitting program arrived.

Well, crap. That's weird. After some searching, found Debian bug #231593, which sounds pretty similar. They're blaming it (tentatively, but) on libc. And but so there's these other bugs from, you know, Novell/SuSE, which also sound similar. And holy crap, where the hell have I been that I haven't heard of:

echo 65535 > /proc/sys/sunrpc/rpc_debug
echo 65535 > /proc/sys/sunrpc/nfs_debug

And other interesting behaviour from that second bug:

The problem in that bug was that immediately following a reboot, the NFS client will end up opening the same TCP port it used before, so it tries to establish a TCP connection from client:1234 -> server:2049. The server still has a TCP control block for this, and replies with a single ACK containing what it thinks are the right sequence numbers. That ACK is eaten by the conntrack module because the connection isn't yet in state ESTABLISHED.

Okay but back to our original bug: which appears to have been fixed now by adding one line to /etc/yp.conf:

broadcast
domain foo broadcast

Why the fuck that should work is beyond me...

No tags
cfengine classes and shellcommands
2005-10-06 18:16:56

cfengine is great, it really is. But there are some things that tripped me up. Often you want to set up a daemon to run The Right Way, which involves changing its config file. After that, of course, you want to restart it. What to do? The naive way (ie, the first way I tried) of doing things is:

control::
        sequence ( editfiles shellcommands )

editfiles::
        debian:
                { /etc/foo.conf
                        BeginGroupIfNoLineMatching "bar"
                                AddLine "bar"
                                Define restart_foo
                        EndGroup
                }

        freebsd:

                { /usr/local/etc/foo.conf
                        BeginGroupIfNoLineMatching "bar"
                                AddLine "bar"
                                Define restart_foo
                        EndGroup
                }

shellcommands::
        debian.restart_foo:
                "/etc/init.d/foo restart"

        freebsd.restart_foo:
                "/usr/local/etc/rc.d/foo restart"

However, the correct way of doing this is:

control::
        sequence = ( editfiles shellcommands )
        AddInstallable = ( restart_foo )

editfiles::
        debian:
                { /etc/foo.conf
                        BeginGroupIfNoLineMatching "bar"
                                AddLine "bar"
                                DefineInGroup "restart_foo"
                        EndGroup
                }

        freebsd:
                { /usr/local/etc/foo.conf
                        BeginGroupIfNoLineMatching "bar"
                                AddLine "bar"
                                DefineInGroup "restart_foo"
                        EndGroup
                }

shellcommands::
        debian.restart_foo:
                "/etc/init.d/foo restart"

        freebsd.restart_foo:
                "/usr/local/etc/rc.d/foo restart"

Without both the enumeration of all your made-up classes in AddInstallable and the enclosing of that class in quotes, cfengine will fail to do what you want -- and will do so quietly and with no clue about why. God, that took me a long time to find.

Tags: cfengine.
NWR04B: Look at the board, dumbass
2005-10-07 06:15:49

Okay, so if you look at the goddamned chip on the NWR04B, you see it's a Hynix HY29LV160-BT, which is not nearly the same as an SST39VF08. I've got the datasheet, at least, so I can look and see if there's maybe some simple change to the driver I'm using to make it work.

That's ugly, though (but no uglier than my debugging code...ugh), and I need to make that better. The MTD folks are no longer supporting the 2.4 kernel; however, looks like the uClinux folks have backported the MTD stuff. Which means I might try upgrading to the latest uClinux version and see if I can port my changes over...although frankly, I'm scared that I'll just be back at square one with this project and trying to figure out why the hell I can't print to the screen.

Yeah, it's an irrational fear, but if I can just break this out into a separate driver I'll be happy. Anyhow, it doesn't look like this particular chip is supported yet by the MTD people, so that's less of an incentive to move up. Or more of an incentive...maybe the closest I'll come to getting a patch into the Linux kernel tree. :-)

No tags
SparcStaionLX: NetBSD a go!
2005-10-10 10:21:33

Picked up a 25-to-9 pin adapter yesterday, and in combination with a 9-pin null modem cable I finally managed to get at the Sun firmware prompt, and thus to install NetBSD over the network. Very nice, very simple install; the only problem I had was this one: the option root-path "/home/aardvark/netbsd-sparc/nfsroot" was too long, and the DHCP server just did not hand out that option -- no complaints or anything. Not good. Just starting up SSH for the first time right now, and holy crap it's taking a while to generate the host keys. --Only 24MB of RAM...huh, thought for some reason these things came with 96MB. Should probably just give up and generate host keys for the thing.

No tags
Holy crap.
2005-10-11 18:35:40

From the ever-excellent Secrecy News comes this. I am agog.

No tags
NWR04B: Secret Knocks
2005-10-11 19:39:55

So I came to the the realization that I've been including the driver for the wrong damn flash chip. This came straight from Codeman's tree, which in turn is based on (I think) the HRI tree. Codeman's .config file for uClinux included drivers for the SST39V flash chip, which just plain isn't right for this router. It's possible that he had a different revision of the board or some such, but I suspect that since he wrote to the flash using the JTAG interface, the issue just never came up. I grabbed the datasheet for the Hynix chip, and it's not that different from what's in the SST driver...but it's just different enough that it's causes problems. First of all, you've got to give the secret knock before writing a byte to flash -- apparently to keep electrical noise (or some such) from accidentally erasing important data. In the SST driver, it looks like this:

map->write8(map, 0xaa, 0x5555)
map->write8(map, 0x55, 0x2aaa);
map->write8(map, 0xa0, 0x5555);

But according to the Hynix datasheet, it should look like this:

map->write8(map, 0xaa, 0xaaa);
map->write8(map, 0x55, 0x555);
map->write8(map, 0xa0, 0xaaa);

Okay, easy enough to change. Still didn't work, though, when I tried to copy the jffs2 image to /dev/mtd1; the writes just keep on failing. But then I remembered that only an erase can turn on a particular bit -- ordinary writes can only flip 'em off. Just for fun, I tried copying an image where, compared to what was in flash already, bits would only have to be turned off -- and sure enough, that worked. Didn't survive a reboot, though...weird. On, then, to the bit of the datasheet that deals with erasing. There's the secret knock for erasing, but that was easy enough to fix. The last part of the secret knock tells the chip which 0x1000-byte sector to erase. With the SST driver, it looks like you just use the beginning of the 0x1000 byte sector you want to erase, making sure that it's on an erase boundary (ie, some multiple of 0x1000). The Hynix, though...I'm having trouble figuring it out. The sector I'm trying to erase starts at 0xf0000, so I'll use that as an example. The datasheet has a table listing what address to write the final command, and it says that the addres should be binary 01111??? -- the last three bits don't matter. But this table also seems to say these should be bits 19 through 12 (counting from zero). If that's the case, then we're just shifting the address over by one, which means writing the final command to 0x78000. But that doesn't seem to work. In another part of the datasheet, it seems to imply that the sector address is just 8 bits long -- in which case, we're shifting the address right by 13 bits. That seems like a very strange number. It works out to a write to 0x78, and that doesn't work, either. The only thing that I can think of is that flash memory is supposed to be mapped to 0x20000000, so maybe it's 0x2f000000 that should be shifted as necessary. But that doesn't make any sense to me. And the fact that the bits I managed to flip don't survive a reboot makes me suspicious -- am I trying to write to RAM or some such rather than flash? If anyone out there knows this sort of thing, I'd be grateful if you could take a look at the datasheet and see if you can figure out what I'm doing wrong.

No tags
Stick that in your .bashrc and smoke it.
2005-10-20 18:11:44
function tv () { i=$(echo ${@} | sed -e's/.*\(...\)$/\1/') ; case $i in bz2) tar tvjf ${@} ;; *gz) tar tvzf ${@} ;; *) tar tvf 
${@} ;; esac }                                                                                                                 

function xv () { i=$(echo ${@} | sed -e's/.*\(...\)$/\1/') ; case $i in bz2) tar xvjf ${@} ;; *gz) tar xvzf ${@} ;; *) tar xvf 
${@} ;; esac }
No tags
NWR04B: Update
2005-10-30 09:11:18

It's been a while since I posted, so it's time to catch up on what I've been doing. I got frustrated with not being able to write to flash memory, possibly because I was unable to figure out what the datasheet was telling me. I decided to have a look at the firmware itself (the part that prints the initial menu, gives you a chance to load your own firmware, then boots the thing) and see if that would tell me what was going on. Seemed like a good thing to try -- after all, it writes to flash when you upload a new image, so it's got to have the secret knocks in there somewhere, right? Well, damned if I could find it. There are parts in the firmware where it wants to print a message to the screen, and the way it does it by:

  1. Loading a memory address
  2. that contains a pointer to a text string
  3. into a register
  4. and then calling the print routine

So I was looking for something similar in a let's-write-to-flash routine:

  1. Loading a memory address
  2. that contains a pointer to a secret number
  3. into a register, or possibly a memory address
  4. that unlocks or erases flash memory

But I simply couldn't find it, nor could I find any obvious constants in the firmware itself. The datasheet gives the steps and numbers needed -- 0x00000555, 0x00000aaa, and so on -- and I just could not see them anywhere. So that left trying to track execution of the firmware itself. I was able to track down the place where the firmware printed the menu that allowed you to upload more firmware. From there, I could see the jump-off points to receive the firmware, then the checksum, then -- aha! -- erasing and writing to the flash memory. But keeping track of what was going on after that was just too much; I'm not used to thinking in assembly, there were lots of jumps that depended upon the state of registers (which I coudn't figure out), and again there was nothing obvious that showed me what was being written where. I may try again with the second-stage firmware -- the original, visible firmware from the manufacturers that I'm replacing with Linux -- but I'm not much more optimistic. Okay, so what next? Well, the MTD subsystem has been rewritten extensively since the kernel version that I've got, and they're no longer supporting version 2.4 of the kernel. But the uClinux people have backported their work to the 2.4 version of uClinux, and maybe that'll work. Jumping to the 2.6 kernel seems a little too scary, but upgrading to the latest 2.4 shouldn't be too bad, right? Heh. It may not be too bad if you know what you're doing, but for me it's a little more challenging. I've spent a week on this so far, and I'm finally at the point where it fails at the final link. Since I've done little more than copy files over wholecloth, this is indeed progress. Of course, there's probably going to be a lot to go through once I get the final linking done; it took a while for me to get Linux printing to the screen, let alone successfully booting. And after that, I don't think there will be a driver for the flash -- at least, I can't see one in the current tree. What I'm hoping is that the updated MTD tree will allow for better probing of the flash's abilities using the Common Flash Interface, or at least I'll be able to ask for help without being ignored.

No tags

RSS Feed