Posts tagged “nwr04b”

April 09, 2006 Damage
It has been a busy ten days, no lie. My wife and I moved into our new place with only minor difficulties. Tuesday, though, I came down with flu and spent the next three days lying on the couch, dozing through CNN Headline News (if you're gonna rot your brain, you gotta do it right) and gobbling Tylenol by the handful. So much for my plan to spend the week assembling all the Ikea furniture in the world...

The computers made it through okay, except for the XBox (used as a MythTV frontend). First the hard drive crapped out -- all sorts of hard errors during fscking, followed by scary looking errors about how it couldn't find init. Fortunately I had a spare 80GB Seagate Barracuda, so I installed Xebian v1.1.4. Worked well, and in fact it fixed the display-offset-an-inch-to-the-left-and-up problem I'd been unable to fix before, so that was good.

And then last night, it started behaving weirdly. First, it wouldn't play a program in MythTV -- it just sat there beeping whenever it accessed the drive. I tried power cycling, but then it just sat and beeped at the Cromwell BIOS screen without going any further. I tried searching for beeping hard drives/XBoxes, but this was all I could find. This morning it's fine, so I suspect overheating -- it is a little more enclosed than it was at our new place. We'll see if it happens again.

The new place has imposed some network changes, too. Our last place was an apartment -- all one floor, so it wasn't hard to snake cables around to hook up the XBox, my wife's laptop and so on. Now, though, my computers and the cable modem are on the second floor, and the laptop/XBox are on the first. And while this house is only about four years old, it doesn't have built-in CAT5. :-(

I had brief fantasies of just poking holes in MY drywall (pride of house ownership picks weird times to pop into my head) and snaking a cable down to the TV (which is almost directly underneath me right now), but gave up. Then I ran a cable from the second floor to the first, thinking I could just run it along the edge of the walls and hide it nicely. It was worth trying, but it really didn't work and even I thought it just looked ugly. Only one way to go: wireless.

Since the NWR04B's on hold, I decided to pick up a couple Linksys WRT54GLs and run OpenWRT on them. They arrived on Thursday, and w/in 15 minutes I'd voided the warranty by installing White Russian on them. :-)

Man, OpenWRT is nice -- it's exactly what I've been hoping to achieve on the NWR. I'm still working on the configuration for these things, but the basic idea is to have one on the second floor, running as an AP that the laptop can connect to, and one on the first floor running in client mode. The XBox will be connected to the client, and Hunsacker (MythTV backend) will be connected to the AP. The laptop will connect to the internal network using OpenVPN; probably the MythTV boxen will use OpenVPN as well. The AP will be sitting inside my internal network, rather than outside, but will be firewalled by itself and my gateway to only allow OpenVPN and SSH connections in or out. It's a bit more complicated than I've set up for a home network before, but it's starting to come together in my head.

Of course, I could just run the AP as the firewall itself -- Lord knows the thing could do it just fine. But I just ordered OpenBSD 3.9 a couple weeks ago (plus sent 'em a nice donation), and it'd be a shame to waste that.
March 22, 2006 NWR04B Update and the Overland LoaderXpress
As I haven't written about the NWR04B in a while, I thought I'd mention that it's because I haven't done anything with it in a while. Part of it has just been buying a house, getting ready to move, pregnant wife, and so on. But I've also just put it aside for a while, as I wasn't making much progress on either writing to flash or getting all the ethernet ports working. I may take it up again later this year, but I suspect that the new kid and my wandering mind means it'll be a while, if ever, before I return to it.

In other news: Just got a new Overland LoaderXpress at work yesterday, and it's...interesting. Very simple machine once you take the cover off: a plastic tape magazine in the middle, a robot arm along the left, a double-height Ultrium 960 drive from HP at the back on the right, a power supply in the middle on the right, and the control board along the right-hand edge. That's pretty much it. (I may take pictures, 'cos I'm just that big a geek.)

There were a couple little blemishes: the cover had half-fallen off the tape drive and was lying at an angle; I had to push it back on. And the two screws that were holding the tape drive in place were loose and had to be screwed back on. I realize this is a budget jukebox, but it's still $8000 list. Oh, and their sales guy doesn't return calls. Weird.

Once I got the cover back on and put it in the server room, it wasn't too hard to get it hooked up. I made the mistake of not attaching a terminator before hooking up the SCSI cable -- don't do that! And I had to recompile the kernel (the backup box runs FreeBSD) to add the ch device. But once I got that figured out, getting Amanda to see it was as simple as telling it the changer device (/dev/ch0) and the changer script (chg-chio). Perfect!

One slight hiccup: Ulrium 3 drives will read Ultrium 1 tapes (of which we have, oh, 50), but will only write Ultrium 2 and 3 tapes. I didn't find this out 'til after I placed the order...my bad. This'll change my backup plans a bit, but it shouldn't be a big problem.
February 04, 2006 One week 'til I'm 34!
It feels like I've been slacking with my entries lately, so it's time to do some catchup.

First, the NWR04B: I've not been very active on this lately, but there has been a little progress. When last I wrote I was trying to figure out why the kernel was hanging at rtnl_lock, when I used the ADM5120 driver for the switch. It turned out that I was calling register_netdev, which in turn calls rtnl_lock, from within another routine that calls register_netdev itself. That's a problem right there. I fixed this (it was due to some blind cut-n-paste from the old driver), and now it's getting further: it initializes eth0 through eth6...though still doesn't actually send or receive traffic, near as I can tell. I need to spend some time sprinkling more printks throughout the code to figure out where it's failing.

Next, I'm doing some work on Thornhill, my web server. Amanda has been installed; I want to back up stuff a little more intelligently than I'm doing now (tar up everything and dump it on my desktop, which gets backed up by Amanda running on my desktop). Running into a few firewall problems, but nothing unexpected or too difficult.

I'm also trying out Xen again, with an eye to upgrading Thornhill. A while back Alioth answered some questions I had about Xen and servers, and it seemed worth trying. So I've got VMWare Player running on the fastest machine I have (Hunsacker, a 2.4GHz P4 MythTV backend) while I practice getting things right. I've put Gentoo both in dom0 and a guest domain (FristDomain (I kill myself)), and I'm populating FristDomain with the usual LAMP environment. This is all pretty preliminary; I'm pretty much just trying to get familiar with how it all fits together.

I'm considering moving to NetBSD for dom0...stateful IPv6 filtering (though Linux has that now), pf, and just the chance to try something new. For the web server OS, though, I think I'll stick with Linux, and probably with Gentoo. I want something easily upgradeable, and for that it's Gentoo or Debian. I think Gentoo will be a little more up-to-date than Debian, and I want to give portage a try...Hunsacker runs Gentoo, but I rarely touch it.

At work, we had a problem last week with the Subversion repository when, against my advice, someone acting under their manager's direction tried checking in the contents of a SuSE DVD. They weren't trying to check in the ISO itself, at least, but rather, all the contents: whole lotta binary RPMs, mostly. This borked the repository, probably because of a default 2GB limit for Apache. The user saw this error:
```
svn: MERGE request failed on '/svn'
svn:  Revision file lacks trailing newline
```
So did everyone else who tried to work with the repository afterward.

I tried svnadmin recover like the good book says, but ze goggles, zey did nossing! Well, crap. We were running hotbackup.py every night, and a quick look showed that last night's copy had everything up to revision 1538 -- 14 revisions ago. (It was revision 1553 that failed.) So I could try moving that in place and losing a bunch of work, or look for something else.

In the end, I was able to get things working by taking a copy of the hotbackup, dumping everything since then, and then applying that dump to the backup. To wit:
```
$ cp /path/to/hotbackup /path/to/recovered_repository
$ svnadmin dump /path/to/repository --revision 1539:1552 --incremental > dumpfile
$ svnadmin load /path/to/recovered_repository < dumpfile
$ svnadmin verify /path/to/recovered_repository
$ mv /path/to/recovered_repository /path/to/repository
```
I may up the limit for Apache, but I'm very much inclined not to do so. I really don't think we'll need to check in 2GB at one time, and I still think checking in a DVD is a stupid thing to do.
January 01, 2006 NWR04B: Turning to the switch
I'm not making much headway erasing flash, so I'm taking a bit of a break from that by turning my attention to the network interfaces.

There are five ethernet interfaces on the NWR04B, plus the wireless card. Only one of the ethernet interfaces comes up enabled in Linux, so I've been trying to track down how it's all initialized and how to change it. The problem is, once again, that I really don't know what I'm doing, and I'm having to teach myself as I go along.

For example: the driver and the datasheet for the CPU talk about the CPU port for the switch. What the hell is that? Originally I thought that might be a special ethernet interface -- you know, like the one that's enabled in Linux. But this info says no, it's essentially a logical interface that may or may not be connected to a particular ethernet interface.

Yeah, the info is for the ADM5120 switch (which in turn is based on a MIPS chip) "a/o/t the ADM5106 (which is ARM-based) I'm working with, but I think the principal should be the same. It certainly seems to match the sort of stuff I'm seeing in the driver code. I'm hopeful, too, that the configuration tool for the 5120 will be, broadly speaking, applicable to the 5106...they talk about ioctls in the driver being used for this sort of thing; not sure if they're in the 5106 driver (I suspect not), but the source code available for 5120-based routers might have enough info to let me cut-and-paste^Wport something over.

Update: So the config tool for the 5120 uses a couple ioctls, SIOCSMATRIX and SIOCGMATRIX, to control which interfaces are on which VLAN. SIOCSMATRIX is defined in the config tool's code as SIOCDEVPRIVATE, and that has to be defined by the driver. It wasn't there in the source code, but a quick search for SIOCSMATRIX turned up a few diffs against the main kernel tree for the 5120. Looks like the switch driver may be a bit more full-featured than what I've currently got for the 5106, and I think it may be more or less a simple cut-n-paste to get it working for the NWR04B. Here's hoping.
December 29, 2005 NWR04B: Another year older, and what have I done?
Gaw'bless you, Matt Johnson.

A year ago today I mentioned, almost in passing, that I had picked up a cheap wireless router and hoped to get Linux running on it shortly. Since then, I've learned an incredible amount about electronics, reverse-engineering, assembly language, compilers, the Linux kernel, and programming as I moved further up the abstraction ladder. I'm still no expert at any of this, but it astounds me how far I've managed to get along.

Currently I'm stuck at getting flash memory to work -- specifically, being able to erase and then program a chunk of flash memory. The trouble is that the magic numbers that the Linux drivers and the datasheet say are needed don't seem to be working. Previously, I was having the same sort of problem getting the kernel to detect the flash in the first place; the trick was figuring out that GPIO was involved in all this. But I'm doing that same trick now, and it's still not working. As always, I'm not sure what I'm doing wrong.

Still, though, I think I'm going to keep poking at it -- for a while, anyway. My interest is beginning to wane a bit (I flit a lot; a year is a long time to me), plus I got a kid on the way (ack!). I may move on to trying to make all the ethernet interfaces work, not to mention the wireless card, as a way of taking a bit of a break. And of course, I'm still aiming at making the world's first Beowulf cluster of wireless routers.

On another note: today's entry is brought to you by the fine, fine folks at the Free Software Foundation, to whom I've just paid my membership dues for another year. I owe these people a huge amount: not only do I get to use a staggering amount of world-class software, written by their members and with their support, for free (I'm writing this on Emacs right now), not only have I been able to earn a fucking living from what I've taught myself using GPL'd and BSD'd software, RMS has also given us the language to, I dunno, frame the whole question of why this is important: by starting the FSF, by naming the Free Software movement, by giving us the GPL. There are those who disagree, while still cherishing the freedom the FSF seeks -- but I think you'd be hard-pressed to deny the power that one pissed-off geek gained when he got pissed off about some closed-source printer drivers.

(Yes, that may be a big myth -- but that is not the same as being a lie, and the providing^Hsynthesis of motivating myths is important too.)

From their website:

Please support the work of the FSF bymaking a donation,joiningas anassociate member,ordering books and merchandise, or signing your organization up as acorporate patron.

Hate RMS? Fine by me. Give to others:
- The FreeBSD Foundation
- The OpenBSD folks (Don't use SSH? Liar.)
- NetBSD
- KDE
- Gnome
- Mozilla
- Gentoo
- Software in the Public Interest (Debian, among others)
- Slackware
Do it. We owe them.
December 20, 2005 NWR04B: Rebooting, set\_vpp
Up way too late for this sort of thing, but I can't sleep.

Managed to get rebooting working. I'd had a jerry-rigged workaround: use devmem2 to read the magic location, 0x88000004. The weird thing was, that was already in the kernel, at include/asm-armnommu/mach-cx84200/system.h:
```
extern inline void arch_reset(char mode)
{
    /* REVISIT --gmcnutt */
#define CX84200_RESET                                  0x88000004
    int data;
        data = *(__u8 *)(CX84200_RESET);
}
```
So what the hell? Just for fun, I tried sticking this in at the end of the function:
```
    printk (KERN_EMERG "Did that work? data = 0x%08x\n", data);
```
And sure enough, bam! There I am rebooting. I'm guessing the compiler was optimizing away the read, since it was never used or returned...but that seems like an obvious thing to overlook. Hm. Can't deny that it's working, though. Interestingly, a simple:
```
       return data;
```
does not seem to work. The plot thickens.

Also managed to find another chip driver that has to twiddle GPIO in order to write to flash, and it looks like there's a standard place to put this: the set_vpp member (part?) of the map_info structure that is deep, deep at the heart of the MTD driver system. Along with the usual stuff you might expect to find there -- how to read 8 bytes, how to write 16 bytes, and so on -- there is this bit that the Dilnet PC board uses to twiddle GPIO in what looks like a most familiar way. I may manage to soothe baby Linus before long.

Update: Ashtead provided the answer: declare data as volatile, so GCC doesn't optimize away the read.
December 20, 2005 NWR04B: Version 0.3 released, or, \"The madness continues\"
Welp, six weeks after upgrading to 2.4.31-uc0, four months after the first release, and nearly a year after getting the goddamned thing in the first place, I'm finally releasing version 0.3 of Linux for the NWR04B. Share and enjoy!

The big change from 0.1 (version 0.2 was never released to my adoring public) is that I've upgraded to the 2.4.31 uClinux kernel, copying over the necessary bits from Codeman's original kernel. This was mainly done in hopes of getting access to the onboard flash memory through the more up-to-date MTD code tree. After a lot of work disassembling the factory firmware for this thing, I finally figured out that one of the GPIO lines? leads? values? needs to be twiddled in order to write to flash successfully. Thus, the onboard flash is being recognized as an AMD-compatable device, which should allow me to erase it and write my own FS there.

Note that I say should. Right now the necessary twiddling is only done at detection time in the kernel, and the code to do so has been rammed in with a crowbar; there is absolutely no grace to this at all. (cfi_probe_chip() in uClinux-2.4.31-uc0/drivers/mtd/chips/cfi_probe.c makes Baby Linus cry.) And when I try changing the necessary bits using devmem2 (included in the tarballs, which I forgot to document) and then erase, the kernel panics. But hey! It's all progress, far as I'm concerned. :-)
December 18, 2005 NWR04B: GPIO and Flash
Okay, so I've finally got the onboard flash chip detected on this thing. It took a few things to get this working:
1. I had to apply this patch to the MTD code, which apparently never made it to their main tree.
2. I had to twiddle the GPIO on this chip just so, which I figured out by disassembling the stock bootloader that comes with this thing.
3. I had to take out a bit of code that I'd put in 'cos I thought I was smarter than the kernel folks. That took a while to recognize.
Still can't actually erase the flash, even with the GPIO twiddle, without causing a kernel panic. And the code that does ttwiddle itself is ugly, and pretty much stuck in a random place. But progress!
December 02, 2005 NWR04B: Module works
Welp, I've got my module working...at least, in the sense that it makes Baby Linus cry. It's pretty ugly, but it loads and unloads and PEEKs and POKEs memory the way I want it to, which is all I need. (Used to have a VIC-20 when I was a kid, where you'd have to POKE different bits of memory to make it play a tune. Never thought I'd be duplicating that 20 -- no, closer to 30 -- years later.)

Of course, I'm still having no luck at all actually probing for flash. What I'm doing should match up with both the datasheets and what the firmware does, but it's just not working; instead of getting 0xAD back (the manufacturer code) I just see the bit of memory that's actually there.

I've come up with a few reasons this might be happening:
1. The flash datasheet lies. Unlikely, since the firmware seems to follow the same sequence I'm doing.
2. The CPU datasheet lies about where the flash memory is. Possible, but it doesn't strike me as all that likely either; what I see at that location (0x2000.0000 for those of you playing the home game) is firmware that does exactly what you'd expect booting firmware to do: set up memory, clear some registers, print a menu, and generally get the thing ready to go.
3. The CPU datasheet lies about what's needed to program the flash. Possible. I know there's at least one write-protect bit; unless I turn it off, any attempt to write to flash results in a protection fault. It's possible there's more.
4. Timing problems. Sounds reasonable.
5. The process is different once you've booted. Okay, maybe. I haven't looked at the original firmware from the manufacturer (the stuff that comes after the bootloader, I mean); it's possible there's a different sequence that needs to happen.
6. I'm not doing what I think I'm doing. I'm thinking about endianness here, which always confuses me. Dunno about this. I've tried disassembling my module, and though I don't understand a lot of what's going on, it still looks pretty familiar and pretty much what I'd expect.
7. I've missed something. Bingo, because I definitely don't know what I'm doing.
For example: the flash datasheet says that one of the locations you POKE to is 0x555 -- yet both the firmware and the AMD-compatible flash driver in the kernel POKE to 0x554. Why? Obviously it's a 4-aligned (word-aligned?) address, but why does that still work? (For the firmware, I mean, which can obviously still write to flash.)

Or there's the whole question of bus or map width, which is important to the Linux drivers. Take this bit, again from the AMD-compatible driver:
```
static inline void send_unlock(struct map_info *map, unsigned long base)
{
    wide_write(map, (CMD_UNLOCK_DATA_1 < < 16) | CMD_UNLOCK_DATA_1,
           base + (map->buswidth * ADDR_UNLOCK_1));
    wide_write(map, (CMD_UNLOCK_DATA_2 < < 16) | CMD_UNLOCK_DATA_2,
           base + (map->buswidth * ADDR_UNLOCK_2));
}
```
Dead simple routine -- but why is ADDR_UNLOCK being multiplied by the buswidth? According to the (suspect) driver in Codeman's kernel for the CX84200 flash, the buswidth was 2 -- which means that instead of writing to (say) 0x554, it would write to 0xAA8. Wha'?

I may have to break down and set up a JTAG interface on this thing. I've been avoiding it 'til now out of stubbornness and lack of soldering skill, but it may be the only way to figure out what the hell is going on.
November 29, 2005 NWR04B: Digging through firmware
Over the last few weeks, I've been slowly picking away at the original firmware for the NWR04B. This is the stuff that lets me upload Linux to the router, and then run it (as opposed to the various firmwares from different companies that do useful stuff like firewalling, web control panels and so on). This is the second time I've taken on this beast, and like the first time it's because I can't get the damned thing to write to flash from Linux.

Writing to flash should be simple. I've got a datasheet for the chip that's on the router, and it lays out pretty explicitly exactly what you need to do to erase a sector, program a random location, find out what model chip you have, and so on. I keep watching what the kernel does by throwing printfs everywhere, but I still get no response from the flash. So I decided to see how the firmware does it.

The first time I looked at it I was just confused. There were at least half a dozen places in the firmware where I could find the magic numbers used to send the flash into command mode, where you could query or program it. The two numbers were right next to each other, and it was obviously no coincidence. But trying to read what the damned thing was doing just made my head hurt. All these registers, changing all the time, and no hint of what they were when you got here....

I tried, once, following from the very beginning, and keeping track of what register held what value. That didn't last very long. Then I thought of whipping up a quick Perl script to do the same...you know, writing an ARM simulator. That didn't last very long either (though for the sort of very basic functionality I was after, it might not have been that hard).

After that, I gave up and upgraded the kernel. At the very least, I figured the new MTD framework would probably make the problem pretty generic; if I was very lucky, it might even have drivers already in place. Instead, though, I kept having the same problem: the numbers looked good, but it just didn't respond the way I thought it should. The framework was there to query the thing in three different ways, but none of them did what I wanted.

So...back to the firmware, and this time I'm beginning to understand it a little better. For example: there's a big series of tables at the end of the program. I only noticed 'em this time, but they're pretty fundamental to what I'm after. There's a list of sectors you can erase, and then addresses for routines to erase a sector, fill a region with bytes, program it, and check some details on the chip itself.

And the half-dozen places where you can find the command numbers are there because there are half a dozen erase-a-sector routines -- which, in turn, is because this firmware supports six chips from four manufacturers: SST, AMD, Atmel and Hynix. It's the Hynix chip that I've got, so that means I can focus on that.

Since I can see the addresses of all these routines, I'm confident now that I can pick out where a subroutine starts and ends, and how to pick out the registers used to carry arguments. With that down, I can pick out other routines and see what calls them, and I can pay less attention to keeping track of all the registers at all times.

Now I need to look at the driver in the kernel, and compare it to what I can see in the original firmware. The firmware's check for the chip I've got seems to match what the kernel does, so at least that part's good.

Part of the problem is having to recompile and reload the kernel each time I want to check the driver, and the fact that the probing is not under my control; I'm still trying to wrap my head around the driver initialization sequence in the kernel. I'm thinking of writing a toy kernel module to do the writes I want, since it looks like the MTD drivers can't be compiled as modules on my own. This would save me having to reboot with a new image all the time.

Ah, well...at least this is all fun. I'm still having the time of my life here. :-)
November 13, 2005 NWR04B: Waiting for /dev/mtd0
It's been a couple weeks now since I upgraded to the 2.4.31 kernel, and I'm still trying to get write access to the flash memory from Linux. This is turning out to be a real pain. The whole point of upgrading kernels was so that I could use the up-to-date version of the MTD drivers, backported by the uClinux folks. This has helped, but I'm not there yet.

The MTD drivers attempt to probe the chip to see what it is, who made it and what it can do. To do so, it writes a few values to special locations, then reads back from another special location. There are a couple standards for this sort of thing (CFI, JEDEC), and the datasheet I've got for this flash chips says it supports those.

It also turns out that this is what the MTD drivers call an AMD-compatible flash device -- the commands to unlock a sector, say, or to spit out a device number, match those from AMD for some of their flash chips. So that's at least three different sorts of drivers to use, and three different ways of saying "Are you this kind of device?"

According to the datasheet I've got, all of this should work: the CFI probes, the JEDEC probes, the AMD stuff. The CPU datasheet says the flash is mapped to 0x2000.0000 after boot, and all the debugging whatnot I've thrown into the drivers say that's where they're writing to. Yet all I get back is raw memory. It matches what I read from the flash memory under the old kernel, and it matches what you'd expect from the bootloader on this thing -- set up some registers, reset devices, print the menu then jump to a loaded flash image.

I'm unsure what's going on here. The location of flash doesn't seem to be wrong. If the flash chip datasheet is wrong, I've got some fairly big problems, I think. But I can't figure out why I can't get the answers I'm expecting.

November 06, 2005 NWR04B: 2.4.31-uc0

Took a while, but I managed to get uClinux version 2.4.31 compiled and working on the router. I may release another firmware package, or I may wait until the mtd stuff is working. Looks like the newer drivers may handle the flash chip without needing a special driver... Here's where I am now:

Linux version 2.4.31-uc0 (aardvark@rearden) (gcc version 2.95.3 20010315 (release)) #13 Sun Nov 6 13:10:29 PST 2005
Processor: Conexant CX84200 revision 1
Architecture: cx84200
Reserving page zero for vector table
hm, page 00000000 reserved twice.
hm, page 00001000 reserved twice.
hm, page 00002000 reserved twice.
hm, page 00003000 reserved twice.
hm, page 00004000 reserved twice.
hm, page 00005000 reserved twice.
hm, page 00006000 reserved twice.
hm, page 00007000 reserved twice.
On node 0 totalpages: 2048
zone(0): 0 pages.
zone(1): 2048 pages.
zone(2): 0 pages.
Kernel command line: root=/dev/nfs nfsroot=192.168.23.254:/home/aardvark/nwr04b/nfsroot ip=192.168.23.12:192.168.23.254:::testf
Calibrating delay loop... 6.32 BogoMIPS
Memory: 8MB = 8MB total
Memory: 6480KB available (1108K code, 193K data, 52K init)
Dentry cache hash table entries: 1024 (order: 1, 8192 bytes)
Inode cache hash table entries: 512 (order: 0, 4096 bytes)
Mount cache hash table entries: 512 (order: 0, 4096 bytes)
Buffer cache hash table entries: 1024 (order: 0, 4096 bytes)
Page-cache hash table entries: 2048 (order: 1, 8192 bytes)
POSIX conformance testing by UNIFIX
Linux NET4.0 for Linux 2.4
Based upon Swansea University Computer Society NET3.039
Initializing RT netlink socket
Starting kswapd
JFFS2 version 2.1. (C) 2001 Red Hat, Inc., designed by Axis Communications AB.
RAMDISK driver initialized: 16 RAM disks of 4096K size 1024 blocksize
eth0: a0:98:76:54:32:10
physmap flash device: 8000000 at 20000000
FIXME:  Made it here: 140 in physmap.c
FIXME:  Made it here: 55 in cfi_probe.c
Unhandled fault: vector exception (0) at 0x1
fault-common.c(97): start_code=0x1198, start_stack=0xe3a02322)
Internal error: Oops: 0

I'm tracking down the source of that error, which happens when physmap_write16 is called from cfi_probe_chip. I'm wondering right now if it's because I said in the config file that the bus width is 16 bits; based on some earlier notes, I think it might be 32 bits. We'll have to see.

November 02, 2005 Upgrades^3
Upgrading SpamAssassin at work; we're using 2.63, and they're up to, what, 3.1.0 now? The upgrade itself was relatively painless, but for complicated reasons it was integrated with Mimedefang, and I didn't like that. MDF is great, but:
1. It takes out the SA score header. This can be corrected, but
2. it turns the SA score into a number, rather than a series of asterisks, which makes it difficult to filter with a regex, or with Outlook. (I have SA set conservatively, but the header makes it easy to filter more aggressively if that's what you want.
3. Finally, MDF puts the SA report into the message as an attachment. Admittedly, it's a plain-text attachment, but that doesn't console the Outlook users who are worried (and rightly so) about clicking on attachments.
Hm. Will have to figure out a way around that; maybe just run spamc/spamd like I currently do.

I've also got word that, due to some old prototype equipment no longer being needed, I will have three new boxes to play with. Woohoo! I'm already planning the DRBD fileserver.

Finally, I managed to get the new version of uClinux to compile and run on the NWR04B. Sweet...except that I didn't check out a particular tag, and I'm having to guess at the date when I did check out my tree, which makes it difficult to say exactly what I've got. Currently checking out with the date set to when I think I grabbed it, then may upgrade/downgrade to the latest tag (currently 2.4.31).
October 30, 2005 NWR04B: Update
It's been a while since I posted, so it's time to catch up on what I've been doing.

I got frustrated with not being able to write to flash memory, possibly because I was unable to figure out what the datasheet was telling me. I decided to have a look at the firmware itself (the part that prints the initial menu, gives you a chance to load your own firmware, then boots the thing) and see if that would tell me what was going on. Seemed like a good thing to try -- after all, it writes to flash when you upload a new image, so it's got to have the secret knocks in there somewhere, right?

Well, damned if I could find it. There are parts in the firmware where it wants to print a message to the screen, and the way it does it by:
1. Loading a memory address
2. that contains a pointer to a text string
3. into a register
4. and then calling the print routine
So I was looking for something similar in a let's-write-to-flash routine:
1. Loading a memory address
2. that contains a pointer to a secret number
3. into a register, or possibly a memory address
4. that unlocks or erases flash memory
But I simply couldn't find it, nor could I find any obvious constants in the firmware itself. The datasheet gives the steps and numbers needed -- 0x00000555, 0x00000aaa, and so on -- and I just could not see them anywhere. So that left trying to track execution of the firmware itself.

I was able to track down the place where the firmware printed the menu that allowed you to upload more firmware. From there, I could see the jump-off points to receive the firmware, then the checksum, then -- aha! -- erasing and writing to the flash memory. But keeping track of what was going on after that was just too much; I'm not used to thinking in assembly, there were lots of jumps that depended upon the state of registers (which I coudn't figure out), and again there was nothing obvious that showed me what was being written where. I may try again with the second-stage firmware -- the original, visible firmware from the manufacturers that I'm replacing with Linux -- but I'm not much more optimistic.

Okay, so what next? Well, the MTD subsystem has been rewritten extensively since the kernel version that I've got, and they're no longer supporting version 2.4 of the kernel. But the uClinux people have backported their work to the 2.4 version of uClinux, and maybe that'll work. Jumping to the 2.6 kernel seems a little too scary, but upgrading to the latest 2.4 shouldn't be too bad, right? Heh. It may not be too bad if you know what you're doing, but for me it's a little more challenging. I've spent a week on this so far, and I'm finally at the point where it fails at the final link. Since I've done little more than copy files over wholecloth, this is indeed progress.

Of course, there's probably going to be a lot to go through once I get the final linking done; it took a while for me to get Linux printing to the screen, let alone successfully booting. And after that, I don't think there will be a driver for the flash -- at least, I can't see one in the current tree. What I'm hoping is that the updated MTD tree will allow for better probing of the flash's abilities using the Common Flash Interface, or at least I'll be able to ask for help without being ignored.
October 11, 2005 NWR04B: Secret Knocks
So I came to the the realization that I've been including the driver for the wrong damn flash chip. This came straight from Codeman's tree, which in turn is based on (I think) the HRI tree. Codeman's .config file for uClinux included drivers for the SST39V flash chip, which just plain isn't right for this router. It's possible that he had a different revision of the board or some such, but I suspect that since he wrote to the flash using the JTAG interface, the issue just never came up.

I grabbed the datasheet for the Hynix chip, and it's not that different from what's in the SST driver...but it's just different enough that it's causes problems. First of all, you've got to give the secret knock before writing a byte to flash -- apparently to keep electrical noise (or some such) from accidentally erasing important data. In the SST driver, it looks like this:
```
map->write8(map, 0xaa, 0x5555)
map->write8(map, 0x55, 0x2aaa);
map->write8(map, 0xa0, 0x5555);
```
But according to the Hynix datasheet, it should look like this:
```
map->write8(map, 0xaa, 0xaaa);
map->write8(map, 0x55, 0x555);
map->write8(map, 0xa0, 0xaaa);
```
Okay, easy enough to change. Still didn't work, though, when I tried to copy the jffs2 image to /dev/mtd1; the writes just keep on failing. But then I remembered that only an erase can turn on a particular bit -- ordinary writes can only flip 'em off.

Just for fun, I tried copying an image where, compared to what was in flash already, bits would only have to be turned off -- and sure enough, that worked. Didn't survive a reboot, though...weird.

On, then, to the bit of the datasheet that deals with erasing. There's the secret knock for erasing, but that was easy enough to fix. The last part of the secret knock tells the chip which 0x1000-byte sector to erase. With the SST driver, it looks like you just use the beginning of the 0x1000 byte sector you want to erase, making sure that it's on an erase boundary (ie, some multiple of 0x1000).

The Hynix, though...I'm having trouble figuring it out. The sector I'm trying to erase starts at 0xf0000, so I'll use that as an example. The datasheet has a table listing what address to write the final command, and it says that the addres should be binary 01111??? -- the last three bits don't matter. But this table also seems to say these should be bits 19 through 12 (counting from zero). If that's the case, then we're just shifting the address over by one, which means writing the final command to 0x78000. But that doesn't seem to work.

In another part of the datasheet, it seems to imply that the sector address is just 8 bits long -- in which case, we're shifting the address right by 13 bits. That seems like a very strange number. It works out to a write to 0x78, and that doesn't work, either. The only thing that I can think of is that flash memory is supposed to be mapped to 0x20000000, so maybe it's 0x2f000000 that should be shifted as necessary. But that doesn't make any sense to me.

And the fact that the bits I managed to flip don't survive a reboot makes me suspicious -- am I trying to write to RAM or some such rather than flash? If anyone out there knows this sort of thing, I'd be grateful if you could take a look at the datasheet and see if you can figure out what I'm doing wrong.
October 07, 2005 NWR04B: Look at the board, dumbass
Okay, so if you look at the goddamned chip on the NWR04B, you see it's a Hynix HY29LV160-BT, which is not nearly the same as an SST39VF08. I've got the datasheet, at least, so I can look and see if there's maybe some simple change to the driver I'm using to make it work.

That's ugly, though (but no uglier than my debugging code...ugh), and I need to make that better. The MTD folks are no longer supporting the 2.4 kernel; however, looks like the uClinux folks have backported the MTD stuff. Which means I might try upgrading to the latest uClinux version and see if I can port my changes over...although frankly, I'm scared that I'll just be back at square one with this project and trying to figure out why the hell I can't print to the screen.

Yeah, it's an irrational fear, but if I can just break this out into a separate driver I'll be happy. Anyhow, it doesn't look like this particular chip is supported yet by the MTD people, so that's less of an incentive to move up. Or more of an incentive...maybe the closest I'll come to getting a patch into the Linux kernel tree. :-)
October 04, 2005 NWR04B: What the hell am I missing?
I must really be missing something here, because I am unable to get this thing to write to flash at all. Here's what's going on in the kernel:
1. Turn off write protection; working. By that I mean that the kernel is successfully able to change a value in memory; the driver for this chip agrees with the datasheet from the HRI project that this is the bit that twiddles write protection.
2. The kernel tries to write the following mysterious values: 0xaa to 0x5555, 0x55 to 0x2aaa, and 0xa0 to 0x5555. The destination addresses (0x5555 and 0x2aaa) get mapped to the right area of memory: 0x20000000 plus the offset for mtd1 I've set up. Checking these writes show that they fail.
3. The kernel tries to write the first byte of data copied from the user request. Again, the address gets changed properly (0x20000000 + mtd1 offset). Again, the write fails.
(All this is in cx84200-flash.c, BTW.) I can think of two things...wait, three things...that might be happening:
1. There's a big change in memory mapping that happens some time after boot. Before The Change, flash begins at 0x0; after The Change, it starts at 0x20000000. I've been assuming, without much evidence, that the onboard bootloader does this flip before loading and running Linux. As ryanr suggested, it may not. In this case, I'd either need to make The Change myself, or else change the memory mappings.
2. There's some weirdness with little-endianness going on. Datasheet sez it's the 26th bit at 0x4000000 that twiddles write protection; this address is not affected by The Change. Maybe I'm simply counting bits from the wrong end...or something...arghh, this makes my head hurt. I think it's unlikely, though, that the developers would not have accounted for this.
3. Datasheet's wrong, or the chips not the same. Which'd suck.
Any other ideas, please let me know.
October 01, 2005 NWR04B: My descent into little-endian binary arithmetic hell
Currently writing this entry in emacs. Once upon a time, I stopped using emacs for fear of what loading a 20MB editor would do to the mail server I was working on, and learned to love vi. Prompted by ESR's Art of Unix Programming, I've decided to try pick up emacs again. It's interesting....Anyhow: Right now I'm trying to figure out why the hell writing to flash on the NWR04B is not working. First off, I've edited the map file for the flash devices (drivers/mtd/maps/cx84200-flash.c for those of you playing the home game) so that I've got two partitions declared:
```
static struct mtd_partition cx84200_partitions[] = {
        {
                name:           "bootloader",
                size:           0x00020000,
                offset:         0x00000000,
                mask_flags:     MTD_WRITEABLE, /* force read-only */
        }, {
                name:           "root_fs",
                // Codeman's original:
                // size:        0x000fa000,
                // My efforts at making a root partition:
                size:           0x00040000,
                offset:         0x000f0000,
}
```
The first I'm not really doing anything with, but it could (as the title suggests) be turned into a bootloader partition someday. The second is where I'm concentrating my efforts. The read-only flag that was originally in there was removed once I figured out it might help matters. :-) Okay, so now what? Well, got a jffs2 image that I created, so let's try the obvious:
```
# cat test.jffs2 > /dev/mtd1
```
...and it just hangs. (I still haven't bothered to figure out how to make CTRL-C interrupt a process yet...something to do with the terminal, I think.) Up the debugging output and you see MTD_open, and then nothing. I had a look at the part of the driver (drivers/mtd/chips/sst39vf080.c) to see what's going on here, and I managed to figure it out a bit. The write operation tries to write one byte at a time, then reads it back to make sure it got read. If so, move on to the next byte; if not, try 256 more times (I guess waiting to see if it just needs a moment) and see if that works. If yes, next byte; if not, give up on the write entirely. I threw in some messages to track that, and one that shows what value it's reading back from flash after the write. After throwing in ridiculous amounts of debugging info to track this, it seems that the write of the first byte is simply not working. The write fails, and cat just keeps on trying (or something). A bunch of looking around finally turned up the MTD-JFFS-HOWTO from (I think) the guy who wrote the MTD driver. 'S full of all sorts of helpful hints, like:
- you can do cp test.jffs2 /dev/mtd1 to copy stuff to flash (but I got the same result as with cat),
- you can mount an erased block device, then just copy files to it w/o formatting it (mount /dev/mtdblock1 /mnt && cp foo /mnt), and
- you need to erase a flash partition if it has anything on it before just blindly copying files over.
Well, fuck. So I follow the directions for the 2.4 kernel support, and figure out how to compile the flash_eraseall utility. Wonderful! Ready to go! Just gotta erase this here partition, and... Only no, that doesn't work: I get the same error re: the byte not being written as before. I'm currently throwing in even more unholy amounts of debugging than before, and teaching myself the simplest bits of binary arithmetic you can image, in order to confirm that, yes, write protection is being turned off...I think. This little-endian thing still confuses the hell out of me. The datasheet sez that, at the address the enable_write() operation is accessing, there are 32 bits set aside for controlling the first bank of flash (which is what we're after here). The 26th bit is write-protect (1 for on, 0 for off). enable_write() reads all 32 bits at that address, &'s it with 0x04000000, and then WP should be off. So the unholy debugging shows that the long int being read:
- before turning off write-protect: 0x1400ffef
- what it wants to write to turn off WP: 0x1000ffef
- what it reads back after, checking how it went: 0x1000ffef
Okay, so that works. Maybe I'll give the flashcp utility (part of the MTD tree) a try and see how that goes.
September 04, 2005 NWR04B: Flash, telnetd, serial port...plus Ted Leo
Things are coming along on this router, and I've managed to make some progress on a couple of fronts.

First off, I've managed to get access to the flash memory on this thing. It's a little embarrassing, because I went through a lot of code in the mtd section of the kernel before realizing that I had simply not included the driver in the configuration file. Managed to learn a bit about how it fits together, though, so it wasn't wasted effort.

I've been able to get the contents of the flash out -- at least, the bit that's covered by the memory map that Codeman put together in his driver, which is about 1MB of the 2MB on board. Still, that appears to include the bootloader menu on this thing, which is good; with luck I'll be able to figure out the checksum for that, and maybe upload armboot or something. Of course, I could always just overwrite the flash directly...but I'm a little scared of that. We'll see.

The other thing that this'll lead to, of course, is including a filesystem in the flash memory itself. Right now I'm mounting everything by NFS, which is very flexible but not terribly self-contained. With something like JFFS2 and a separate partition for the kernel, I should be able to have something pretty skookum.

I ran into some weirdness with Minicom and the serial port: at random times, for reasons I couldn't figure out, the display from the router would get all scrambled. Letters and newlines would be dropped, or transposed, or just garbled out of recognition entirely. I tried everything I could think of: power-cycling the router, letting it cool (it doesn't take long for it to heat up, and things tend to go south pretty quickly when it does...must do something about that), swapping cables, swapping serial ports, exiting minicom, trying other serial port terminal programs (and let me tell you, there aren't many for Linux). Eventually I gave up and ran:
```
cat /dev/ttyS0 & cat > /dev/ttyS0
```
which worked perfectly: I could watch it boot, run commands, all that stuff. I could even see that the shell was using colours for ls, which made me wonder if maybe that was a problem.

Finally, though, it came time to try uploading another kernel image. I tried fooling around with sb, but while I could get it to upload to the router w/o problems, it was difficult to get the timing right when it ended, and the image didn't seem to load properly. All right, I thought, I'll use Minicom just for uploading. But check it out: when I ran Minicom again, it was perfect -- no display problems at all. Still don't know what changed, but I'm glad it's working again.

This led me to try getting the telnet daemon from BusyBox working...if I can't use a serial port, why not just use the network? But getting it going is going to take some work. With uClibc, there is neither a fork() nor a daemon() routine, both of which are used by telnetd. Instead, you get vfork, which lets a child run but blocks the parent until the child calls either exit() or exec. So, as uCdot points out, the trick is to do exec() the same program, but with a command-line option that tells the application that it's a child, and should be treated accordingly. Good trick.

By the time I realized that, though, it was midnight, and I figured I'd be too tired to do it coherently. And then I got the flash memory working, so I was distracted. Coming soon, though...

On another note: on Friday my wife and I went with the famous Victor Scott to see Ted Leo and the Pharmacists. Holy fuck can that man play. And his drummer! His drummer has the beard I want plus all the drumming chops in the entire world. The last drummer I saw who was anywhere close to him played for Wilco; before that, Lotion. Absolutely fucking amazing, and a must-see if you ever get the chance.

Hon. mention to opening band The Parallels, for whom I can't find a website. Great 60s-mod outfits and music, and a fun show.
August 30, 2005 NWR04B: First release
I've put together a couple downloads for the NWR04B. The first is the whole tarball of source code -- BusyBox, uClinux, plus some glue. You can use this to compile your own images. It's about 29MB. The second is just the bare minimum: the firmware image, the root directory for the router, and some instructions. It's about 685KB. You can find both of them here: http://saintaardvarkthecarpeted.com/nwr04b/download The downloads have been signed with key 0x4705C9C7 and checksummed with SHA1.
August 28, 2005 NWR04B: What next?
I've had a few questions about what I actually plan to do with this thing now that I can get a shell running. I've been thinking about this for a while, and here, in no particular order, is what I want to do: Network access: It'd be cool if SSH (preferred) or telnet would work on this. And hey, it'd be handy (okay, for certain values of "handy") to have a web server you could just plug in anywhere, especially if it was combined with a USB flash drive (see below). Get a filesystem embedded in flash: I went with NFS mounting the filesystem because I had no idea how to embed one in flash. I still don't, plus I'm unsure how this all interacts with the bootloader that's on there...my memory is a bit hazy at this point, but I think that the BL crapped out with a sufficiently large kernel at one point (maybe one I'd compiled with debugging?). The image I'm uploading now is 513KB, which leaves about 1.5MB of flash left for the filesystem. Since NFS works right now and has tons of room, I figure I can experiment 'til I figure out what I'm doing, then make an entirely self-contained image. Provide an image for other people to upload to their own routers: 'cos what good is doing all this if I can't share? There are a few other people with this router who have (I think) been following this, so there ought to be some small amount of interest in this. I'd like to do the same with the devel stuff -- at least have a nice tarball for Busybox, the kernel and an NFS filesystem, say. (If you're impatient, email me.) Make the world's first Beowulf cluster of wireless routers: 'cos I'd like my 15 minutes of fame, please. Slashdot, here I come! Turn it into a firewall/wireless access poing: This thing is small, doesn't consume much power, and is silent. It's got five ethernet ports and a wireless card with a GPL'd driver. How cool is that? A Linux firewall'd be nice and flexible, and it'd be nice to (say) only allow SSH/SSL on the wireless card. I'm curious to see how much additional memory firewall rules will take up, and if I can get something like tarpitting working on it without sucking up all the RAM. More hacks: This chip has 2 UARTS and USB. It'd be cool to, say, add a USB flash drive to this thing; I've got a 64MB one lying around that I used to get my XBox running Linux, and a 64MB filesystem would be huge compared to what I can fit in 1.5MB. What about breaking out the second UART to a serial port? Can we add more RAM? And the CPU can run at different clock speeds -- what happens if we play around with that?
August 28, 2005 NWR04B: How'd that happen?
After getting the CFLAGS fixed up, the last step before a working shell was getting init going -- either just saying something like init=/bin/sh in the kernel command line, or else getting init proper working. The first didn't work, so on to the second. First I looked at /etc/inittab on the filesystem. This message suggested that a very simple inittab should work just fine:
```
::askfirst:-/bin/sh
```
However, it wasn't working: the last message I got during the boot process was the BusyBox banner, and then nothing. I could ping it, but it wasn't responding to anything on the keyboard. I turned on debugging in init/init.c (#define DEBUG_INIT 1 up at the top), then started throwing in messageD(LOG|CONSOLE, "FIXME: Made it here") at various spots. I could see that init was running, and it was parsing /etc/inittab -- good. (Oh, should also mention that since the router is currently mounting its filesystem by NFS, running tcpdump host [ip address] | awk '/"/ {print $NF}' showed me the files it was trying to get -- which also showed inittab.) Okay, so move on to actually running the damn programs. That takes us through init_main() and run_actions()...yep, messageD shows we're getting there, too. From run_actions() we go to run()...and here's where we run into problems. run() basically blocks signals then runs fork()like so:
```
        if ((pid = fork()) == 0) {
               /* run the damn program */
        }
        return pid;
```
A few more messageDs showed we were reaching the other side of the if block w/o any problems, but didn't seem to be actually going inside. init kept trying, about once a second, to start up the programs in inittab but it was failing each time. And then I remembered: uClibc does not implement fork(); instead, it uses vfork(), which blocks the parent until the child exits, or calls execve(). (Here's a good explanation.) So what if we do:
```
        if ((pid = vfork()) == 0) {
```
Well, hot damn -- it works!

August 27, 2005 NWR04B: SHELL!

Just under eight months:

IP-Config: Guessing netmask 255.255.255.0
IP-Config: Complete:
      device=eth0, addr=192.168.23.12, mask=255.255.255.0, gw=255.255.255.255,
     host=test, domain=, nis-domain=(none),
     bootserver=192.168.23.254, rootserver=192.168.23.254, rootpath=
Looking up port of RPC 100003/2 on 192.168.23.254
Looking up port of RPC 100005/1 on 192.168.23.254
VFS: Mounted root (nfs filesystem).
Freeing init memory: 52K
Using fallback suid method
init: Bummer, can't write to log on /dev/tty5!
console=/dev/console
init started:  BusyBox v1.00 (2005.08.27-16:28+0000) multi-call binary
command='-/bin/sh' action='4' terminal='/dev/console'

init: Bummer, can't write to log on /dev/tty5!
Starting pid 9, console /dev/console: '/bin/sh'
Using fallback suid method

BusyBox v1.00 (2005.08.27-16:28+0000) Built-in shell (lash)
Enter 'help' for a list of built-in commands.

/ # busybox
NFS: giant filename in readdir (len 0x80000001)!
Using fallback suid method
BusyBox v1.00 (2005.08.27-16:28+0000) multi-call binary

Usage: busybox [function] [arguments]...
   or: [function] [arguments]...

        BusyBox is a multi-call binary that combines many common Unix
        utilities into a single executable.  Most people will create a
        link to busybox for each function they wish to use, and BusyBox
        will act like whatever it was invoked as.

Currently defined functions:
        [, busybox, cat, cp, date, dd, dmesg, du, echo, egrep, false,
        fgrep, find, free, getty, grep, hexdump, ifconfig, init, kill,
        lash, linuxrc, login, ls, mkdir, mknod, more, mount, mv, netstat,
        passwd, ping, ps, pwd, rm, rmdir, route, sed, sh, strings, stty,
        su, tail, tee, test, time, top, touch, true, tty, umount, uname,
        uptime, vi, whoami, xargs

/ # ping -c 5 192.168.23.254
Using fallback suid method
PING 192.168.23.254 (192.168.23.254): 56 data bytes
64 bytes from 192.168.23.254: icmp_seq=0 ttl=64 time=0.0 ms
64 bytes from 192.168.23.254: icmp_seq=1 ttl=64 time=0.0 ms
64 bytes from 192.168.23.254: icmp_seq=2 ttl=64 time=0.0 ms
64 bytes from 192.168.23.254: icmp_seq=3 ttl=64 time=0.0 ms
64 bytes from 192.168.23.254: icmp_seq=4 ttl=64 time=0.0 ms
- 192.168.23.254 ping statistics ---
5 packets transmitted, 5 packets received, 0% packet loss
round-trip min/avg/max = 0.0/0.0/0.0 ms
--/ # ls -l
Using fallback suid method
drwxr-xr-x    2 1000     1000         1304 Aug 27  2005 bin
drwxr-xr-x    2 1000     1000          312 Aug 27  2005 dev
drwxr-xr-x    3 1000     1000          176 Aug 27  2005 etc
drwxr-xr-x    2 1000     1000           48 Jul  5  2005 lib
lrwxrwxrwx    1 1000     1000           11 Aug 27  2005 linuxrc -> bin/busybox
drwx------    2 1000     1000           48 Jul  5  2005 lost+found
drwxr-xr-x    2 1000     1000           48 Jul  5  2005 mnt
drwxr-xr-x    2 1000     1000           48 Jul  5  2005 proc
drwxr-xr-x    2 1000     1000           48 Jul  5  2005 root
drwxr-xr-x    2 1000     1000          344 Aug 27  2005 sbin
drwxr-xr-x    2 1000     1000           48 Jul  5  2005 tmp
drwxr-xr-x    4 1000     1000           96 Jul  9  2005 usr
drwxr-xr-x    2 1000     1000           48 Jul  5  2005 var

I am the greatest man IN THE ENTIRE WORLD.

August 08, 2005 NWR04B: Toolchain Problems Redux
- July 9th: arm-elf-tools-20030314.sh from uClinux.org. Busybox fails when run as init with "undefined instruction" and "unknown data abort code."
- July 10th: toolchain from hri.sourceforge.net. BB fails with "undefined instruction".
- July 11: uClibc buildroot script. No copy of elf2flt. Tried latest snapshot, which does have elf2flt, but it failed to install.
- July 24th: uClinux toolchain again. BB fails with "bad data abort", "unknown data abort code" and "obsolete system call". Possibly including different versions of unistd.h?
- July 26th: ptxdist. uClibc appears to be built with for CPU with MMU/FPU, ignoring values set in original menuconfig.
- July 30th: 3.4 toolchain from uClinux site (hidden!). Tried compiling the kernel with this toolchain, but far too many errors relating to change in behaviour from 2.95.3. BB failed with "relocation outside program". STL failed to build. God, this is pissing me off.

July 30, 2005 NWR04B: Toolchain problems

I thought I'd give PTXDist (latest version: 0.7.5) a try in my continuing attempts to get a proper toolchain going. I'm running into problems, though, when it comes time to compile uClibc:

make[5]: Entering directory
`/home/aardvark/bin/ptxdist-0.7.5/build/crosstool-0.32/build/arm-uclibc-linux-gnu/gcc-3.4.2-uClibc-0.9.27/build-libc/libc/sysdeps/linux/arm'
arm-uclibc-linux-gnu-gcc  -Wall -Wstrict-prototypes -Wno-trigraphs
-fno-strict-aliasing  -fstrict-aliasing -Os -funit-at-a-time
-mlittle-endian  -fno-builtin -nostdinc -D_LIBC -I../../../../include
-I. -isystem
/home/aardvark/bin/ptxdist-0.7.5/local/arm-uclibc-linux-gnu/gcc-3.4.2-uClibc-0.9.27/lib/gcc/arm-uclibc-linux-gnu/3.4.2/include
-DNDEBUG -fPIC -c __longjmp.S -o __longjmp.o
__longjmp.S: Assembler messages:
__longjmp.S:36: Error: selected processor does not support `lfmfd
f4,4,[ip]!'
make[5]: *** [__longjmp.o] Error 1
make[5]: Leaving directory
`/home/aardvark/bin/ptxdist-0.7.5/build/crosstool-0.32/build/arm-uclibc-linux-gnu/gcc-3.4.2-uClibc-0.9.27/build-libc/libc/sysdeps/linux/arm'
make[4]: *** [arm] Error 2
make[4]: Leaving directory
`/home/aardvark/bin/ptxdist-0.7.5/build/crosstool-0.32/build/arm-uclibc-linux-gnu/gcc-3.4.2-uClibc-0.9.27/build-libc/libc/sysdeps/linux'
make[3]: *** [_dir_linux] Error 2
make[3]: Leaving directory
`/home/aardvark/bin/ptxdist-0.7.5/build/crosstool-0.32/build/arm-uclibc-linux-gnu/gcc-3.4.2-uClibc-0.9.27/build-libc/libc/sysdeps'
make[2]: *** [_dir_sysdeps] Error 2
make[2]: Leaving directory
`/home/aardvark/bin/ptxdist-0.7.5/build/crosstool-0.32/build/arm-uclibc-linux-gnu/gcc-3.4.2-uClibc-0.9.27/build-libc/libc'
make[1]: *** [_dir_libc] Error 2
make[1]: Leaving directory
`/home/aardvark/bin/ptxdist-0.7.5/build/crosstool-0.32/build/arm-uclibc-linux-gnu/gcc-3.4.2-uClibc-0.9.27/build-libc'
make: *** [/home/aardvark/bin/ptxdist-0.7.5/state/crosstool.install]
Error 2

Googling didn't turn up much, but what I found suggested that uClibc was being compilied for the wrong processor, and/or for a processor that had an FPU -- which I don't believe the ADM5106 in this thing does. When configuring PTXDist at the beginning, I'd certainly told it to use softfloat, and GCC appeared to be compiled with that in mind (thus the error). Sure enough, when I took a look at the .config file in the uClibc build directory, it was pretty wrong:

ARCH_LITTLE_ENDIAN=y
# ARCH_BIG_ENDIAN is not set
# ARCH_HAS_NO_MMU is not set
ARCH_HAS_MMU=y
UCLIBC_HAS_FLOATS=y
HAS_FPU=y

I'm not sure why the configuration details make it to GCC and not uClibc; I imagine it's a bug, though it could also be that I'm pointing PTXDist at my uClinux source tree...I wouldn't think that'd be a problem, but it's not supported. I'm not sure I have the patience to follow this through, though, so I may just leave it rather than file a bug (I'm a bad person, I know).

One annoying thing about PTXDist is that make world, the do-it-all command, cleans everything out before starting, and there does not appear to be a make continue target to just keep going; it makes debugging things very difficult. I've been looking around at other scripts, and it's hard to find one that supports uClinux explicitly. I'm probably being superstitious, but I've had enough problems with toolchains that I'm reluctant to assume it'll work if I just drop in the uClinux sources. I may end up compiling my own ('cos that'll be easier...).

July 10, 2005 NWR04B: Further up the abstraction ladder

Well, I'm getting further along.

First off, I've managed to get the kernel mounting its root directory from my desktop machine. The trick to this was turning off the initrd option in the kernel config; if you don't, it doesn't matter what options you put in the kernel command line -- it'll try to read the ramdisk and then fail because it's not in JFFS2 format (though I'm sure that error could be got around somehow; I'm just not bothering right now 'cos NFS is more flexible).

So now this kernel command line works:

root=/dev/nfs nfsroot=192.168.23.254:/home/aardvark/nwr04b/nfsroot ip=192.168.23.12:192.168.23.254:::test:eth0:off

...and I can ping the thing, which is good. Now I just need to populate it, which means just compiling busybox. Easy, right?

Ha! Another big-ass set of problems is what it is. First, I tried a copy of Busybox I had lying around that I think I'd compiled as part of a previous toolchain attempt. Yeah, I know -- "Let's throw in random binaries and they'll work!" -- but I figured it was worth a try. file seemed hopeful:

ELF 32-bit LSB executable, ARM, version 1 (ARM), for GNU/Linux 2.0.0, statically linked

but when I tried it I got this error:

BINFMT_FLAT: Bad magic/rev (0x1010161, need 0x4)

God bless Free software; here's the comment from fs/binfmt_flat.c:

because a lot of people do not manage to produce good flat
binaries, we leave this printk to help them realise the problem.
We only print the rror if it's not a script file

Flat binary? Wha? And then came this FAQ from the excellent uCdot:

What causes 'BINFMT_FLAT: bad magic/rev (0xZZ, need 0xYY)' errors?

A lot of people encounter this error the first time they try to run a
program on a uClinux system. Usually this is caused by trying to run an
ELF or COFF executable rather than a "flat" executable. uClinux does not
support anything but the "flat" executable format.  ELF/COFF programs
are converted to "flat" format using elf2flt/coff2flt respectively.

To fix this problem with the ELF toolchain add -Wl,-elf2flt to the final
link line of your build and it will create a flat executable.

And why do we need that? Well, because this CPU has no MMU; the ELF format for executables won't work because (and I'm fuzzy on the details here) this means that the binary has to deal with being run from any memory address, rather than being lied to and told that it's at 0x0. Thus the special arguments to the compiler and linker, and the invocation of elf2flt afterward. So: to compile busybox I had to change the CFLAGS argument in make menuconfig to -D__PIC__ -fpic -msingle-pic-base, then run:

make dep
LDFLAGS=-Wl,-elf2flt make

I still got this error:

arm-elf-strip --remove-section=.note --remove-section=.comment busybox
arm-elf-strip: busybox: File format not recognized
make: *** [busybox] Error 1

but the strip command is the very last one in compiling the binary, and file busybox gave "busybox: BFLT executable - version 4 gotpic". I copied it into place, booted and got:

IP-Config: Guessing netmask 255.255.255.0
IP-Config: Complete:
      device=eth0, addr=192.168.23.12, mask=255.255.255.0, gw=255.255.255.255,
     host=test, domain=, nis-domain=(none),
     bootserver=192.168.23.254, rootserver=192.168.23.254, rootpath=
Looking up port of RPC 100003/2 on 192.168.23.254
Looking up port of RPC 100005/1 on 192.168.23.254
VFS: Mounted root (nfs filesystem).
Freeing init memory: 52K
Unhandled fault: external abort on linefetch (F4) at 0x00000001
fault-common.c(97): start_code=0x700040, start_stack=0x67ffbc)
[1] sh: bad data abort: code 33554432 instr 0x32005500
Code: 495f5fdf 0e97f38e (c9a303b1) 3ea3a3ad b294aa7c
fault-common.c(97): start_code=0x700040, start_stack=0x67ffbc)
Internal error: unknown data abort code: 32005500
CPU: 0
pc : [&amp;lt;0000ffff&amp;gt;]    lr : [&amp;lt;0000ffff&amp;gt;]    Not tainted
sp : 0000ffff  ip : 0000ffff  fp : 0000ffff
r10: 0004d04c  r9 : 00050e40  r8 : 001fa000
r7 : 00000000  r6 : 0000005b  r5 : 00169884  r4 : 0019a904
r3 : 001722c0  r2 : ffffffff  r1 : 20000010  r0 : 20010016
Flags: nzcv  IRQs off  FIQs off  Mode SYS_32  Segment kernel
Control: 0
Process sh (pid: 1, stackpage=001f9000)
Stack:
Backtrace: frame pointer underflow
Function entered at [<b1c9a2f3>] from [&amp;lt;8e0e97f3&amp;gt;]
Unhandled fault: alignment exception (93) at 0x00000001
fault-common.c(97): start_code=0x700040, start_stack=0x67ffbc)
Internal error: Oops: 0
CPU: 0
pc : [&amp;lt;0012cccc&amp;gt;]    lr : [&amp;lt;00058850&amp;gt;]    Not tainted
sp : 001f9e94  ip : 001f9e40  fp : 001f9ec8
r10: 00640004  r9 : 00000000  r8 : 00000010
r7 : 00000000  r6 : b1c9a2f3  r5 : 9c63fdbd  r4 : 0000ffff
r3 : 0014b478  r2 : 00000001  r1 : 00000001  r0 : 0000ffef
Flags: nZCv  IRQs off  FIQs on  Mode SVC_32  Segment kernel
Control: 0
Process sh (pid: 1, stackpage=001f9000)
Stack:
001f9e80:          00058850 0012cccc 60000093  ffffffff 0000ffff 001f8000 00000001
001f9ea0: 001f9fd4 0067ffc8 00053a08 001f8000  001f9fd4 0000ffff 32005500 001f9ee0
001f9ec0: 001f9ecc 00053b00 00053984 001f8000  001f9fd4 001f9ef0 001f9ee4 00053b50
001f9ee0: 00053a60 001f9f94 001f9ef4 00054080  00053b44 32005500 00000004 00000000
001f9f00: 00030001 0000ffff 00050cc8 001f9f2c  001f9f1c 000567a8 000551b4 00000000
001f9f20: 001f9f7c 001f9f30 000576f8 0005675c  0068a000 001f9f98 00000000 001f9f98
001f9f40: 0067ff78 20000013 000480a0 001f8000  0068a000 0014b1f8 00148000 0014a040
001f9f60: 00170a40 001f9f94 00055710 00000000  00000000 0067ffd0 32005500 00000000
001f9f80: 0067ffd0 00000001 00000000 001f9f98  00054ed8 00053ff8 00000009 00000000
001f9fa0: 00000009 0004d03c 00000000 00000000  0067ffd0 00000001 0067ffc8 00000000
001f9fc0: 00640004 00000000 00000000 0067ff88  0004d020 20010016 20000010 ffffffff
001f9fe0: 001722c0 0019a904 00169884 0000005b  00000000 001fa000 00050e40 0004d04c
Backtrace:
Function entered at [&amp;lt;00053974&amp;gt;] from [&amp;lt;00053b00&amp;gt;]
 r7 = 32005500  r6 = 0000FFFF  r5 = 001F9FD4  r4 = 001F8000
Function entered at [&amp;lt;00053a50&amp;gt;] from [&amp;lt;00053b50&amp;gt;]
 r5 = 001F9FD4  r4 = 001F8000
Function entered at [&amp;lt;00053b34&amp;gt;] from [&amp;lt;00054080&amp;gt;]
Function entered at [&amp;lt;00053fe8&amp;gt;] from [&amp;lt;00054ed8&amp;gt;]
 r7 = 00000001  r6 = 0067FFD0  r5 = 00000000  r4 = 32005500
Code: ebfcae37 e2440010 (e5961004) e1a03521 e59f20cc
Kernel panic: Attempted to kill init!

The punchline is that that's the best result I've got in a lot of experimentation I'm not writing down here. The one common thread, once I got the binary format figured out, is this:

Unhandled fault: external abort on linefetch (F4) at 0x00000001
fault-common.c(97): start_code=0x700040, start_stack=0x67ffbc)

The 0x00000001 is the same throughout. I tried this suggestion and ran flthdr -s 65535 busybox to increase the stack size from 0x1000 to 0xffff -- same result. Then I came across this message, which says that there's something wrong, F4 is the message from the CPU's fault register, and I need to figure out what it is. However, I've also come across (and lost the links to) another post which suggested it was a paroblem with a particular version of uClibc. So that means paying proper attention to a toolchain, which I'd skipped over earlier. I'm currently trying to get the HRI toolchain going, so we'll see how that turns out.

July 04, 2005 NWR04B: KERNEL PANIC!

HO yeah!

boot no options
Linux version 2.4.19-uc1-cx84200-4 (aardvark@rearden.saintaardvarkthecarpeted.com) (gcc version 2.95.3 20010315 (release)) #955
Processor: Conexant CX84200 revision 1
Architecture: cx84200
On node 0 totalpages: 2048
zone(0): 0 pages.
zone(1): 2048 pages.
zone(2): 0 pages.
Kernel command line: root=/dev/mtdblock1 ro console=ttyS0
Calibrating delay loop... 32.15 BogoMIPS
Memory: 8MB = 8MB total
Memory: 6668KB available (806K code, 324K data, 36K init)
Dentry cache hash table entries: 1024 (order: 1, 8192 bytes)
Inode cache hash table entries: 512 (order: 0, 4096 bytes)
Mount-cache hash table entries: 512 (order: 0, 4096 bytes)
Buffer-cache hash table entries: 1024 (order: 0, 4096 bytes)
Page-cache hash table entries: 2048 (order: 1, 8192 bytes)
POSIX conformance testing by UNIFIX
Linux NET4.0 for Linux 2.4
Based upon Swansea University Computer Society NET3.039
Initializing RT netlink socket
Starting kswapd
JFFS2 version 2.1. (C) 2001 Red Hat, Inc., designed by Axis Communications AB.
eth0: a0:98:76:54:32:10
NET4: Linux TCP/IP 1.0 for NET4.0
IP Protocols: ICMP, UDP, TCP
IP: routing cache hash table of 512 buckets, 4Kbytes
TCP: Hash tables configured (established 512 bind 512)
*******
VFS: test name =
VFS: fs_name = <jffs2>
VFS: root name <1f:01>
*******
VFS: tried fs_name = <jffs2> err= -19
VFS: Cannot open root device "mtdblock1" or 1f:01
Please append a correct "root=" boot option
Kernel panic: VFS: Unable to mount root fs on 1f:01

I am currently giggling like an idiot and scaring my wife.

July 01, 2005 NWR04B: Progress!

EMXIFFIXMUncompressing Linux...FIXME

                                   FIXMran out of input data
                                                            FIXME

                                                                 -- System haltedFIXM

I set ZTEXTADDR in arch/armnommu/boot/Makefile to 0x1000, and now I get this. Sweet!

00000000:00304C0C;0000106C=00000000
00303A68-00304C0C>00000000
EMXIFÃ Uncompressing Linux...Ã .Ã .Ã .Ã .Ã .Ã .Ã .Ã .Ã .Ã .Ã .Ã .Ã .Ã .Ã .Ã .Ã .Ã .Ã .Ã .Ã .Ã .Ã .Ã .Ã .Ã .Ã .Ã .Ã .Ã .Ã .Ã  done, booting the kernel.
                                                                                                                     Ã 
00000000:00000000;0017F4C8=00000000
0008BC8C-0017F48C>00008000
0017F48C
00008000: E1A0C000 E3A0105B E28F5028 E8952360  E3A04000 E1550008 34854004 3AFFFFFC
00008020: E59F2024 E5862000 E3A0205B E5892000  E3A0B000 EA0000F9 001337E0 00134184
00008040: 00163A8C 00134180 00116000 84200001  E1A0C00D E92DD800 E24CB004 E24B1014
00008060: E24DD008 E50B0010 E24B0010 EB0309CF  E3500000 159F200C 151B3014 15823000
00008080: E3A00001 EA000000 0013B720 E91BA800  E1A0C00D E92DD870 E24CB004 E1A06000
000080A0: E59F5050 E5950000 EB030401 E1A04000  E1A00006 E5951000 E1A02004 EB0303E7
000080C0: E3500000 1A000005 E0860004 E1A0E00F  E595F004 E3500000 13A00001 191BA870
000080E0: E59F3014 E2855008 E1550003 3AFFFFEC  E3A00000 E91BA870 00048880 00048930

00136E70: 00000000 00000000 00000000 00000000  00000000 00000000 00000000 00000000
00136E90: 00000000 00000000 00000000 00000000  00000000 00000000 00000000 00000000
00136EB0: 00000000 00000000 00000000 00000000  00000000 00000000 00000000 00000000
00136ED0: 00000000 00000000 00000000 00000000  00000000 00000000 00000000 00000000
00136EF0: 00000000 00000000 00000000 00000000  00000000 00000000 00000000 00000000
00136F10: 00000000 00000000 00000000 00000000  00000000 00000000 00000000 00000000
00136F30: 00000000 00000000 00000000 00000000  00000000 00000000 00000000 00000000
00136F50: 00000000 00000000 00000000 00000000  00000000 00000000 00000000 00000000
EMXIFÃ Uncompressing Linux...Ã

May 14, 2005 NWR04B: Adapter fubar'd
Arghh...the crappy little RS232 adapter I hacked together for the NWR04B got dropped the other day, and now I just see garbage. I spent two hours this afternoon re-soldering various connections, then gave up and ordered two of these (the 233 adapters, 3V version, in the DB9 shell). Even ordered 'em assembled. Work on that'll be on hold, though I may have some notes to put up. In the meantime, I'll be trying to get my PVR-500MCE working. Whee!
May 14, 2005 Must...not...purchase...
My uncle forwarded me an email from Tiger Direct, and I was sorely tempted to purchase the this. It's the Asante FR1104-G 802.11G router, and it's only 27 Canuckistan pesos. What's more, there is the rebate listed on Asante's website -- $20 US!, which would have almost made me a profit on the damned thing. Then I realized the rebate program had ended in April (damn!). And I couldn't find anyone else who'd tried to hack the thing, or run Linux on it. (Why else would I buy it?) The datasheet mentioned a 32-bit RISC microprocessor, but nothing more...almost certainly an ARM. And I remembered that I still have to get thing working with the wireless router I do have. (Dropped my crappy serial adapter the other day, so now I have to fix it.) Still, good deal if anyone wants one...
May 09, 2005 NWR04B: So that's what PC is set to
Finally figured out two things:
1. The macros debug_reloc_start and debug_reloc_end in head.S are not called by default -- which is why I haven't been seeing any output from them. Duh.
2. If I put them in and close a comment properly, I can print out pc -- which as near as I can figure is set to 0x1008 more than it should be if the image was being run from the beginning of memory.
Currently trying a truly horrible hack (.rept 4112 instead of .rept 8 at the beginning of the image) to see what heppens.
May 07, 2005 NWR04B: ZTEXTADDR vs. The World
I followed Cyberdyne's suggestion and looked at the link options for the kernel I'm making for the NWR04B. So far, it looks promising, though I'm not that much better off. The problem was that the argument for puts, which should've been the address of some text to print to the screen, was 'way off and as a result I was seeing garbage. A closer look (with some paying attention this time) showed that, instead of being passed the address 0x28a0 (where you'd see EXMIF -- FIXME backwards), it was being passed the argument 0x428a0. And sure enough, in arch/armnommu/boot/Makefile, whaddawe see but this:
```
ifeq ($(CONFIG_CX84200_SMC),y)
#ZRELADDR        = 0x00040000
#ZTEXTADDR       = 0x00000000
ZRELADDR        =0x00008000
ZTEXTADDR       =0x00040000
INITRD_PHYS     =0x00700000
endif
```
This page told me that ZTEXTADDR is, basically, the address in memory where the kernel should expect to start -- or in this case, where the decompressor (I'm doing make zImage here) should expect to start. That sounds like something that would affect where things get put, all right, so I tried changing ZTEXTADDR to just 0x0 -- and sure enough, the argument passed to puts has the right address this time. But still no joy: when I load the image, I still don't see that EXMIF, but just a single character (which is better than the 416 characters of crap I was seeing previously) of uncertain ancestry (because for some reason the capture of serial port output to a file upon which I could run hexdump was not working). And furtherly furthermorish, that 416 characters of crap I was talking about were found in the original image starting at 0x418a0 -- an offset of 0x3F000, or off by a thousand from what I would have expected. So, like, what, memory is starting at -0x1000? Arghhh.

April 21, 2005 NWR04B: Why won't it puts?

Still trying to figure out what the hell is going on here, and why it won't print the messages I expect it to (while still printing the FIXME I stuck in at the end of the puts routine w/o any problems). I've been looking at the disassembled zImage, and I'm scratching my head. Here's the deal: shortly after power-on, the decompress_kernel routine is run:

000074 EB000915 BL      &000024D0

In C, decompress_kernel looks like this:

ulg
decompress_kernel(ulg output_start, ulg free_mem_ptr_p, ulg free_mem_ptr_end_p,
                  int arch_id)
{
        output_data             = (uch *)output_start;  /* Points to kernel start */
        free_mem_ptr            = free_mem_ptr_p;
        free_mem_ptr_end        = free_mem_ptr_end_p;
        __machine_arch_type     = arch_id;

        puts("EMXIF");
        proc_decomp_setup();
        arch_decomp_setup();

        makecrc();
        puts("Uncompressing Linux...");
        gunzip();
        puts(" done, booting the kernel.\n");
        return output_ptr;
}

At 0x24D0, we've got some initilization, some register saving, and then the puts routine is called:

0024FC E59F0034 LDR     r0, &00002538
002500 EBFFF769 BL      &000002AC

This is the first call to puts: puts("EXMIF"); (which is FIXME backwards; I had it frontwards at first, and wanted to see if the output was any different if I changed the string; it's not). puts looks like this in C:

static void cx84200_puts(const char *s)
{
        while(*s != '\0')
                cx84200_putc(*s++);
        cx84200_putc('F');
        cx84200_putc('I');
        cx84200_putc('X');
        cx84200_putc('M');
        cx84200_putc('E');
}

(more checking to see what works) and like this in ARM assembly:

0002AC E1A0C00D MOV     ip, sp
0002B0 E92DD810 STMFD   sp!, {r4,r11,ip,lr,pc}

Save the registers for later...

0002B4 E1A04000 MOV     r4, r0
0002B8 E5D43000 LDRB    r3, [r4, #0]
0002BC E24CB004 SUB     r11, ip, #4
0002C0 E3530000 CMP     r3, #0
0002C4 0A000004 BEQ     &000002DC

r0 held the argument, and it's moved to r4. Check the first byte to see if it's zero (ie, if we're printing a null string), and jump ahead if it is. Not sure what we're doing with r11 here.

0002C8 E5D40000 LDRB    r0, [r4, #0]
0002CC EBFFFFEB BL      &00000280
0002D0 E5F43001 LDRB    r3, [r4, #1]!
0002D4 E3530000 CMP     r3, #0
0002D8 1AFFFFFA BNE     &000002C8

Load the first byte again into r0, then go to 0x280 (putc) with it. Increment r4 and see if it now points to a zero. If not, go through the routine again.

0002DC E3A00046 MOV     r0, #70
0002E0 EBFFFFE6 BL      &00000280
0002E4 E3A00049 MOV     r0, #73
0002E8 EBFFFFE4 BL      &00000280
0002EC E3A00058 MOV     r0, #88
0002F0 EBFFFFE2 BL      &00000280
0002F4 E3A0004D MOV     r0, #77
0002F8 EBFFFFE0 BL      &00000280
0002FC E3A00045 MOV     r0, #69
000300 EBFFFFDE BL      &00000280

This is the printing of FIXME at the end of puts.

000304 E59F0008 LDR     r0, &00000314
000308 E20000FF AND     r0, r0, #&FF
00030C EBFFFFDB BL      &00000280

Put 0x314 into r0, AND with 0xFF, then call putc again.

000310 E91BA810 LDMDB   r11, {r4,r11,sp,pc}
000314 00042548 ANDEQ   r2, r4, r8, ASR #10

And I think this is where we fall off the end of puts. Finally, a quick look at putc, first in C:

static int cx84200_putc(char c) {
        int i;
        int j = 10;

        CSR_WRITE(UART0_BASE, c);

    for (i= 0; i < 60000; i++)
                ;
}

(j left over from another bit of debugging) and assembly:

000280 E1A0C00D MOV     ip, sp
000284 E92DD800 STMFD   sp!, {r11,ip,lr,pc}
000288 E24CB004 SUB     r11, ip, #4
00028C E3A02CEA MOV     r2, #&EA00
000290 E2822060 ADD     r2, r2, #96
000294 E20000FF AND     r0, r0, #&FF
000298 E3A03209 MOV     r3, #&90000000
00029C E5830000 STR     r0, [r3, #0]
0002A0 E2522001 SUBS    r2, r2, #1
0002A4 1AFFFFFD BNE     &000002A0
0002A8 E91BA800 LDMDB   r11, {r11,sp,pc}

So here are my many bits of confusion:

That first call to puts should have r0 pointing to EXMIF, right? Only it doesn't: 0x28A0 is where you can find this string, and r0 points to 0x2538. There's no ASCII there, and certainly no copy of the string I want. If I change the instruction so that r0 points to 0x28A0, the thing crashes badly -- just spews out hexdumps of something (presumably memory).
That last call to putc from puts, where r0 points to 0x314. WTF? It explains why I'm seeing H% at the end of the strings, but as far as I can tell it certainly shouldn't be doing that. Again, there's nothing around 0x314 that would explain why we're trying to print it.
And in putc, what is with AND r0, r0, #&FF? As far as I can tell, this has absolutely no effect on r0: it's a NOP.

I can only think of three things:

I'm wrong.
The disassembler I'm using has a bug.
GCC has a bug.

If anyone has any insight to share, please let me know. This is really bugging me.

April 17, 2005 NWR04B: It's crashing!
I'm starting to make a bit of progress on the NWR04B -- almost to the point where I could comfortably say it's crashing. Yay! The problem so far has been that I wasn't getting any output on the serial port when (presumably) Linux was booting. Some judicious insertion of ARM assembly code into the kernel image showed me that it was at least running, but I had no clue what I was doing after that. I managed to follow the path of the assembly code through head.S, which is the very, very first bit of anything that gets run when you've got a compressed kernel. This is so early that it takes a while even to get around to decompressing the kernel. After that, I was able to follow it through to misc.c, where decompress_kernel gets run. There's some puts statements in there (like puts ("Uncompressing Linux...");), so how come I wasn't seeing anything? Well, because puts wasn't defined as anything. puts is defined in head.S as some assembly code to write a string to the serial port, but that wasn't referenced by misc.c; attempts to #include it failed horribly. There's an #ifdef STANDALONE_DEBUG that defines puts as printf, but attempts to build a kernel using that didn't work either. But follow the bouncing ball:
- arch/armnommu/boot/compressed/misc.c #includes asm/arch/uncompress.h
- which is actually include/asm/arch/uncompress.h
- which is, once you follow the symlinks, include/asm-armnommu/arch-cx84200/uncompress.h
- which defines puts as cx84200_puts
- and uncompress.c defines that as being an empty function
- but the equivalent definition in arch-samsung defines it as a bunch of calls to putc``
- and that is defined as CSR_WRITE(DEBUG_TX_BUFF_BASE, c)
- so the equivalent for arch-cx84200 would be CSR_WRITE(UART0_BASE, c)
- because UART0_BASE is defined in hardware.h as 0x90000000
- which I knew from reading the datasheet is the chunk of memory where you drop stuff when you want it written to UART0, aka serial port 1.
Whew! So I tried that, and hey -- I got garbage on the screen! But not very much: I was expecting something like "Uncompressing Linux..." and instead got maybe 20 characters of garbage. I took another look at the assembly version of puts, and one of the things it does is waits for 20,000 clock cycles -- which at 60MHz is a third of a second. Try adding a loop to count down 60,000 cycles (what the hell), and hey! more garbage -- though still a finite amount, and it stopped after maybe 5 seconds or so, and still nothing intelligible. What the hell's going on? I tried this:
```
static void cx84200_puts(const char *s)
{
        while(*s != '\0')
                cx84200_putc(*s++);
        cx84200_putc('F');
        cx84200_putc('I');
        cx84200_putc('X');
        cx84200_putc('M');
        cx84200_putc('E');
        cx84200_putc("\n");
}
```
and now I started to get garbage that had FIXME tossed in for good measure. A quick cat /dev/ttyS0 > logfile (have I mentioned that I love Unix?), and whaddaya know: I'm getting precisely four FIXMEs. If I threw an extra puts ("FIXME"); into misc.c, I got five, not six -- the string going to puts doesn't come out, but the extra putcs I put in do work. For the moment, this is where I'm stuck. The garbage/extra characters I'm seeing don't seem to have any relation to messages I might expect ("FOO", "Uncompressing Linux...", etc. There's places for the kernel to crap out after that, but I don't think there's any before. And why the hell am I seeing all the garbage, then a perfectly intelligible "FIXME" after that? What is happening to the strings to get them all fucked up?
March 29, 2005 NWR04B: Documentation, Take 2
I've started to post information on the Network Everywhere NWR04B recovered from the wiki here. Pretty rough at the moment, but I'm working on it. If you've contributed something to the wiki and would like your name in credits in the revived pages, please let me know.
March 28, 2005 NWR04B: Still trying to get Linux booting
I'm finally working again on the NWR04B. Right now my focus is trying to get a kernel booting, but I'll be satisfied with any kind of response from the damn thing. Right now, this is as far as I get: Verifying product code......PASS Boot Product Code!!! And there it sits until I power cycle the thing. Crap. I've got a pretty steep learning curve here. First off, I haven't worked with the ARM architecture before. Second, I haven't ported Linux (or anything) to another architecture before. (I'm not really porting stuff here -- the hard work was already done by the HRI and Codeman. But the experience would definitely help.) Third, I know very little about assembly; I've got a copy of a really good ARM assembly guide, but I'm just not used to thinking at such a low level. Fourth, I still have not disassembled the bootloader that comes with the existing, vendor-supplied firmware, so I really don't know what state everything's in when the kernel comes up. Fifth, I don't have a JTAG adapter on this thing. As a result, things are going slowly. I started by assuming this sequence of events:
1. The bootloader sets up the serial port, and decompresses application.bin.
2. Application.bin is copied to RAM.
3. The memory map is flipped. (This is in the datasheet. Before, flash memory starts at 0x00000000 and RAM starts at 0x20000000; afterward, it's the other way around.)
4. The CPU jumps to 0x0, and execution continues from there; this is the Linux kernel initialization and decompression routine.
By disassembling the compressed Linux kernel, I can see that it should work -- ie, there's no need to (say) jump to some random address within the kernel to start working. (It's good to confirm these things...) But the lack of any response at boot time, even with verbose kernel debugging messages turned on, is disheartening. I had a look at the uClinux file arch/armnommu/boot/compressed/head.S, and realized that it might be missing some definitions for putc; this is architecture-dependent, and everything's wrapped in if 0. I tried putting in this:
```
#elif 1 /* my attempt at cx84200 serial debugging -- assuming that the address for mov is uart0*/
                .macro  loadsp, rb
                mov     \rb, #0x90000000
                .endm
                .macro  writeb, rb
                strb    \rb, [r3, #0]
                .endm
```
According to the datasheet, the byte at 0x90000000 is where "UARTDR, data read or written from the interface" goes. I'm assuming that means you put a byte there, then magic happens, then that byte is written to the UART. Still no response. I tried taking out the #if/#endif statements around debugging statements, to make it all as verbose as possible -- still nothing. However, with the judicious use of dd I've been able to cobble together a silly little "Hello, world!" in ARM assembly, and I'm able to get that to boot (well, print). This confirmed I had the basic sequence of events correct. What's more, I was able to insert this little bit into various places at the beginning of the kernel, and confirm how far along things were going. The answer is: not very. I've been following along in head.S, and I can see where the debugging information should be printed -- but it just doesn't. What's strange is that by mistake, I inserted helloworld at a non-four-byte boundary -- at byte 70, not 72 -- and then I got a response from a routine in head.S that prints out the first 256 bytes of the uncompressed kernel...and then nothing after that. So close!
February 28, 2005 NWR04B: Checksum solved!
I finally figured out the last bit (well, at least the last bit that varied significantly) in the checksum for the NWR04B firmware. I've updated the wiki and the checksum program. The program not only lets me duplicate the firmware I've already got (ie, it puts the bits back together so that they match the original), but lets me crash the router in new and interesting ways.

Just for fun, I tried making an image from the original hack's root filesystem. I was able to get the router to apply the upgrade, but (surprise!) nothing happened when it rebooted -- it verified the checksum then did nothing, and I had to upload an old firmware image by Ymodem over the serial cable. But hey! Progress!

February 19, 2005 NWR04B: Checksum closer for new firmware

Ha! In the Runtop firmware, there's the strings "Repotec" and "ip2014". Sure enough, a Google on the latter turns up lots of references to the IP2014 router from Repotec. This version of their firmware has the same structure as the Network Everywhere and Runtop firmware: bootloader + application.bin.gz. However, the firmware is much more similar to the RT bootloader (the one I haven't figured out the checksum for yet). The length is the same, but different md5sum. A quick diff of the hexdump outputs turns up this:

diff ../original_runtop/bl.hd bl.hd
1,4c1,4
< 00000000  06 00 00 ea 02 00 00 00  03 00 00 00 03 13 00 02  |................|
< 00000010  02 00 00 00 5f 6c 0a 00  cd 33 6e 05 67 02 00 00  |...._l...3n.g...|
< 00000020  13 00 00 ea 02 00 00 00  03 00 00 00 03 13 00 02  |................|
< 00000030  02 00 00 00 3f 6c 0a 00  4b 30 6e 05 c2 01 00 00  |....?l..K0n.....|
---
>  00000000  06 00 00 ea 02 00 00 00  0a 00 00 00 02 12 00 1b  |................|
>  00000010  02 00 00 00 6c 6b 09 00  26 27 e7 04 55 02 00 00  |....lk..&'..U...|
>  00000020  13 00 00 ea 02 00 00 00  0a 00 00 00 02 12 00 1b  |................|
>  00000030  02 00 00 00 4c 6b 09 00  05 24 e7 04 11 02 00 00  |....Lk...$......|

...which means this is where the checksum must be!

February 16, 2005 NWR04B: Checksum for original firmware
Okay, so I think I've figured out the checksum for the original, available-from-ftp.networkeverywhere.com firmware (NWR04Bv1.02D1220.dlf).

First, the file has two parts: there's what I'm calling bootloader (probably a huge misnomer), and then there's a gzip archive file called archive.bin.gz. splitgzip.pl will pull out the latter; simple math and dd will extract the former. The length of application.bin.gz is 743898 bytes; in hex, that's 0x000b59da. The sum of all the bytes in application.bin.gz is 0x05fc5b7c.

Both of these numbers can be found (allowing for little-endianness) at 12 bytes and 8 bytes from the end of bootloader, respectively:
``` ```

00004ed0 02 00 00 00 da 59 0b 00 7c 5b fc 05 20 03 00 00 |.....Y..|[.. ...|

So this works for the NE firmware. However, loading this has caused problems before, so I'm reluctant to use it as a basis for uploading new firmware. And the pattern does not seem to hold for the Runtop firmware I've used to resuscitate the dead router; I still have to figure out how they're doing it.

Finally, even if I do figure out how to get the checksum working, will this let me boot Linux? Sure, I can upload a new filesystem, but how will I hand control to it? No idea. Still...fun puzzle!
February 12, 2005 NWR04B: Not so easy
The continuing saga of the NWR04b, um, continues.

As I mentioned, I was looking at using the rmem command on the NWR firmware to read out memory and maybe figure out the checksum code. I came up with a small expect script (well, grabbed rddmm.exp and butchered it 'til it did what I wanted) to do just that, but it seems to be a little buggy: after a while, the output freezes. If I fire up minicom, I get a whole crapload of memory output from the serial port -- the stuff the expect script had been reading all along. It continues until I reset the board, but after that the characters from the board are all messed up: you can see where the menu and prompts are, but every other character or so is wrong. If I exit minicom then start again, letting it reset the serial port in the process, everything's fine. This makes me think my expect script is maybe going too fast, or not grabbing the output fast enough, or something else that just messes up the state of the serial port temporarily.

I was going to work on it a bit and give it the option of starting at a particular offset (which would've taken a while, since I'm almost completely new to expect), but got distracted when I found the NWR's SEEKRIT MENU! At power-on, you see this prompt:
```
Got the 6HYNIX_16bits Flash ROM ADM5106
Boot:
```
Welp, turns out that if you hit the space bar three times right then, you get this menu:
```
Loader Menu ================
(a) Download POST ...
(b) Exit
Please enter your key :
```
Woohoo, a quick way to download ARMboot! Or so I hoped. (I did try UP LEFT UP LEFT RIGHT RIGHT DOWN RIGHT UP LEFT to see if that would run Linux automagically, but no.)

First, a cross-compiling toolchain was needed. I found this page, which had both a fully-compiled toolchain ready to download, or a script that would build everything for you and required lots of mysterious patches to be downloaded in advance. Since I'm more manly than smart, I went for the script. (Though obviously I'm not that manly, since I was depending on a script in the first place...) I ran into troubles with uClibc, though -- for some reason, the script would just refuse to build it. Eventually, I just gave up and downloaded the pre-compiled version.

Now, on to the actual compiling of ARMboot. Codeman, the original hacker, posted a bunch of files that included (I think) a modified version of ARMboot for the chip on the NWR. A quick make cx84200_config and make CROSS_COMPILE=/path/to/arm-uclinux-tools/bin/arm-uclinux- all worked, with a couple hiccups along the way. First off, I got this error:
```
cc1: error: invalid option `short-load-bytes'
```
A quick Google turned up this message from the CrossGCC project, saying that this option had been renamed alignment-traps. A bit of script-fu took care of that:
```
find . -type f -exec grep -l short-load-bytes {} ; | xargs perl -i.bak -pe's/-mshort-load-bytes/-malignment-traps/'
```
God, I love Unix.

I tried make again, and came up with this error:
```
flash.c: 181: error: label at end of compound statement
```
The code in question looked like this:
```
default:
    printf("Unknown Chip Typen");
    goto Done;
    break;
}
/* Some stuff I'm leaving out... */
printf ("n");
Done:
}
```
I moved the Done: label to before the last printf statement, and everything seemed to work fine: ARMboot compiled, and I had armboot.bin ready to go. Doubtless there's a better way of doing that, but this seemed to work well enough for now.

Now to try uploading:
```
Loader Menu
================
(a) Download POST ...
(b) Exit
Please enter your key : a
Downloading............PASS
Verifying file......file corrupt -- FAIL
```
Well, crap: I was able to upload it by Xmodem, as I suspected, but it's still checksumming the thing, which means I was busted again. I'm still not giving up, though. I'm hoping to figure out the checksum; I found this page, which has a lot of pointers on how to do it. I think I'll try some of the things he talks about and see if I can figure out more about the checksum.
February 05, 2005 NWR04B: Back from the dead!
Welp, thanks to a suggestion from Mike and Varu,I managed to rescuscitate the dead NWR04B router. It had gone silent and unreponsive -- no web server, no response to pings -- after applying the firmware on the Network Everywhere FTP site. (Some upgrade!)

Today, I picked up some header pins at the closest thing to a local electronics store. After a bit of work -- getting the solder out of header pins is tricky -- I got them attached, and sure enough the serial port worked fine. It was stuck at the bootloader menu, with this message:
```
Verifying product code...FAIL
* WARNING *
Need to reprogram the flash.
```
That reminded me of the bit on this page on the Linksys WAP-11. Apparently, firmware for other products using the same hardware would work much better than the Linksys firmware. To prevent this sort of thing, the bootloader was changed to check for a product code, to make sure it wasn't another company's firmware. Almost makes me wonder if that's what happened with the NE firmware. Pretty huge screwup, though...

So I tried uploading the Runtop firmware to the router via Xmodem...and it worked! I got the usual command line back, and everything seemed fine. I didn't try the web pages yet, but I don't expect any surprises there. I've checked the Runtop firmware with splitgzip, and it has the same kind of embedded Gzip archive the NE firmware does. It'll be interesting to compare the rest of it.

I've also tried fooling around with the rmem (read memory) command, and I think this might be promising. You can run "rmem 0 400", and it'll print out 0x400 bytes of memory, nicely formatted, starting at address 0. 0x400 seems to be the biggest chunk it'll print, but you can incrmement it and keep going. (Managed to crash it, too, by running "rmem 99900000 400"...the command line was completely unresponsive, and one of the LEDs on the front started flashing rapidly. Fortunately, the reset button set everything right.)

I'm thinking that this might be a way of reading out (what I hope will be) the bootloader code, and thus maybe getting the checksum code out of there somehow. I should be able to hack together an Expect script that'll cycle through the memory, capture the formatted output to a file, then turn that into a copy of the memory suitable for passing to a disassembler. And if that works, maybe we can look at overwriting flash with the wmem command...
February 01, 2005 Network Everywhere NWR04B: Serial port working!
At last! Paul has turned out to be a great help: he successfully hooked up a serial port to his NWR04B today and was able to get a shell on there. And after getting a lot of help from a couple coworkers of mine (thanks, Jim and Wayne!), I was able to duplicate his success! The embarrassing part is that it turns out the main reason I wasn't seeing anything from the serial port is that I wasn't powering the damn chip. For some reason I figured that the 3232 (from the good folks at Sipex Heavy Manufacturing Concern) would draw power from the serial port, or the board itself, or, I don't know, the luminiferous ether that surrounds us all. Jim set me straight on that. Quick transcript:

Got the 6HYNIX_16bits Flash ROM ADM5106 Boot: NetMall System Boot Copyright 2002 ADMtek, Inc. CPU: ADM5106 Home Gateway Processor POST Version: 2.00.0176 Creation Date: 2003.07.10 Press <space> key three times to stop autoboot... 0 Verifying product code......PASS Boot Product Code!!! DHCPS:DHCP Server Started. Enabled NAT mode ======================================================
Mars project:
Command Line Interface. 1.18.0001 v.2003.10.16
======================================================
cmd> update
Entered INIT state.
MAC failed to BOOT...
CardStop is called
Entered WAIT_OFFER state.
Timed out in WAIT_OFFER state.

Fascinating, isn't it? :-) So yeah, lots of updates about to hit the wiki page. Next step is maybe to try uploading Armboot, the way CodeMan did, or maybe go for the gusto and try uploading a Linux filesystem image. Of course, there's lots of stuff to be found out just by poking around in the command line, too...
January 30, 2005 Network Everywhere NWR04B: Still no serial port
I'm still having no luck getting a serial port going on this thing. I thought it might be because I was using a MAX 232 chip, instead of a MAX 3232 ("...and an extra 3 cubits for Linus, whose kernel this is...").

I also took the time to try to make a more permanent assembly by doing it up on a bit of perfboard -- so now I've got yellow wires (distinguishable connectors are for the weak!) poking out from perfboard instead of from breadboard. And still, nothing...not a goddamned peep, excep for a weird y-plus-umlaut character that pops up every now and then in Minicom and I'm blaming on either noise or acid flashbacks.

I'm at a loss here. As far as I can tell the connections are good (my three bits of electronics equipment are a soldering iron, a plastic box with many subdivisions, and a multimeter), and the circuit looks more or less like the circuit listed at the HRI site. That leaves connecting at the wrong place on the board, or maybe grounding. Not sure.

But hey! I got an offer to collaborate from pck; his electronic skills would be nice. And I'm going to shoot off an email to the guy who got it running in the first place to see if, a year later, he can help out.
January 25, 2005 Network Everywhere NWR04B: serial port || firmware info
I've put in a few hours tonight working on the Network Everywhere NWR04B, with mixed results. (The NWRO04B is the 802.11b router I picked up for $18 on sale; I'm trying to duplicate this guy's luck getting Linux to work on the thing.

I took the time tonight to get a slightly more permanent version of the RS232 adapter put together. Previously I've been putting stuff together on a breadboard, with wires all over the place; tonight I soldered things together and put wires all over the place. I tried to be careful, and all the connections seemed good, but I still had no luck: I saw absolutely nothing over the serial port at all, and from what I've read it should be pretty damned obvious. I'll have to ask some people at work about this.

One thing I'm still trying to figure out is how to treat all the different ground connections; I'm assuming that they all get connected together, and together with pin 5 on the DB9 connector, but I'm not sure. (If anyone's got any hints, please chip in.) That was about two hours tonight, and if that was it I'd chalk it up to experience and go to bed. But I did manage to find this page, which had a Perl script which extracts GZip archives from files. And guess what? It works on the NWR04B firmware! Woohoo!

It's embarrassing how simple this script is; I've been trying to figure out some way of doing exactly this, once I'd figured out that there was an archive in there. I want to understand how this works, but in the meantime it's exciting (hoo, what a life) to see all the stuff in there. strings | fmt | less shows tons of stuff going on: HTML, a reference to /dev/uart0, clitask (some kind of command-line interface, or just a dirty joke?), an XML UPNP description of the device...all sorts of information. And that's enough for now. I've got just enough energy to eat something, then go to bed.
December 29, 2004 Ports vs NWR04B
Got a bad feeling in the pit of my stomach this morning when I came back to work. I'd deliberately stayed away from the usual non-Slashdot news sources (Internet Storm Center, Bugtraq, Full Disclosure), so there was a lot of catching up to do. Let's see: eighty-four new remote holes in Windows -- always fun -- and it turns out the phpBB worm is no longer a phpBB worm but a PHP worm. Jesus Christ.

I checked the logs on my home server, and sure enough there were tons of the little bastards hitting me. (The server at work was completely clean.) It looked like there was nothing there, but I couldn't be sure without more time spent on it than a few minutes' grepping -- which meant leaving it 'til I got home tonight. (Update: looks like I was fine. I tried the URLs in the logs, and none of them tried to fetch anything. Dodged a bullet there.)

OpenBSD has the right idea when it chroots Apache, but there's also the matter of initiating connections out. And yes, I'm guilty of this: Thornhill + port 80 + tcp syn should be firewalled off, but was not. Changed now, of course. Still, it would be nice to have Thornhill not be locked down entirely. Why not let me initiate a connection out, but prevent Apache from doing the same?

This gets back to What's Wrong With Unix?, and I still say a good part of it is the lack of fine-grained permissions on both ports and files. (That, and my inability to type a good post when I'm in a hurry...God, that was incoherent.) The sheer idiocy of continuing to insist on root permissions to open a port under 1024 is just ridiculous. Why do we do this? In a world of Unix on the desktop, where anyone can get root, what does this mean anymore? Nothing at all: it's a totem, a fetish, and the Unix equivalent of knocking on wood for luck.

Worse, by insisting that you need to be root to open port 80, you invite all sorts of security problems. Better hope you drop privileges effectively; better hope no one figures out a way to extract r00t from any lingering privileges; better hope you didn't make one single mistake, or you'll get 0wned. Serving web pages, answering DNS queries or answering QOTD requests (ports 80, 53 and 17, respectively) do not require root permissions. (This is quite a different question from whether or not J. Random User should be able to modify web pages, zone files, or the QOTD database.) qmail, Postfix and others have shown that delivering mail doesn't need root, either. (Other applications can be taken on a port-by-port basis; the full extent of my hand-waving is left as an exercise to the reader.)

So why is there no way to let UID www send a syn+ack, but not a syn? Or to let some range of UIDs do both? Why, Lord, can't I change ownership, groups and permissions on /proc/net/ipv4/tcp/port/80 so that UID www can open this port and nothing else? How long, O Lord, how long?

There is a patch I came across today that supposedly offers this sort of thing, but again: it SHOULD NOT be an option; it SHOULD NOT be a patch; it SHOULD be built-in and used, just like we use UIDs to restrict privileges now. (The key words "MUST", "MUST NOT", "SHOULD", "SHOULD NOT", and "MAY" are to be interpreted as described in RFC 2119.)

Ahem. In other news: At Staples today I picked up a Network Everywhere BWR04B 802.11b wireless router. --I'm sorry, "Network Everywhere"? Looks like Cisco/Linksys in disguise. But it was 18 Soviet Canuckistan pesos! Boxing Day special! How could I possibly resist? Better yet, it turns out that the damn thing can run Linux. It's got 8MB of RAM, 2MB of flash memory, and something like a 60MHz ARM CPU.

The folks over at the Hardware Recycling Initiative are working on getting this and other broadband router boards running Linux. Sweet! Now to figure out how the hell to get it to work on this thing...I can identify a soldering iron six times out of ten, but that's about it.