NWR04B: Further up the abstraction ladder

Well, I'm getting further along.

First off, I've managed to get the kernel mounting its root directory from my desktop machine. The trick to this was turning off the initrd option in the kernel config; if you don't, it doesn't matter what options you put in the kernel command line -- it'll try to read the ramdisk and then fail because it's not in JFFS2 format (though I'm sure that error could be got around somehow; I'm just not bothering right now 'cos NFS is more flexible).

So now this kernel command line works:

root=/dev/nfs nfsroot=192.168.23.254:/home/aardvark/nwr04b/nfsroot ip=192.168.23.12:192.168.23.254:::test:eth0:off

...and I can ping the thing, which is good. Now I just need to populate it, which means just compiling busybox. Easy, right?

Ha! Another big-ass set of problems is what it is. First, I tried a copy of Busybox I had lying around that I think I'd compiled as part of a previous toolchain attempt. Yeah, I know -- "Let's throw in random binaries and they'll work!" -- but I figured it was worth a try. file seemed hopeful:

ELF 32-bit LSB executable, ARM, version 1 (ARM), for GNU/Linux 2.0.0, statically linked

but when I tried it I got this error:

BINFMT_FLAT: Bad magic/rev (0x1010161, need 0x4)

God bless Free software; here's the comment from fs/binfmt_flat.c:

because a lot of people do not manage to produce good flat
binaries, we leave this printk to help them realise the problem.
We only print the rror if it's not a script file

Flat binary? Wha? And then came this FAQ from the excellent uCdot:

What causes 'BINFMT_FLAT: bad magic/rev (0xZZ, need 0xYY)' errors?

A lot of people encounter this error the first time they try to run a
program on a uClinux system. Usually this is caused by trying to run an
ELF or COFF executable rather than a "flat" executable. uClinux does not
support anything but the "flat" executable format.  ELF/COFF programs
are converted to "flat" format using elf2flt/coff2flt respectively.

To fix this problem with the ELF toolchain add -Wl,-elf2flt to the final
link line of your build and it will create a flat executable.

And why do we need that? Well, because this CPU has no MMU; the ELF format for executables won't work because (and I'm fuzzy on the details here) this means that the binary has to deal with being run from any memory address, rather than being lied to and told that it's at 0x0. Thus the special arguments to the compiler and linker, and the invocation of elf2flt afterward. So: to compile busybox I had to change the CFLAGS argument in make menuconfig to -D__PIC__ -fpic -msingle-pic-base, then run:

make dep
LDFLAGS=-Wl,-elf2flt make

I still got this error:

arm-elf-strip --remove-section=.note --remove-section=.comment busybox
arm-elf-strip: busybox: File format not recognized
make: *** [busybox] Error 1

but the strip command is the very last one in compiling the binary, and file busybox gave "busybox: BFLT executable - version 4 gotpic". I copied it into place, booted and got:

IP-Config: Guessing netmask 255.255.255.0
IP-Config: Complete:
      device=eth0, addr=192.168.23.12, mask=255.255.255.0, gw=255.255.255.255,
     host=test, domain=, nis-domain=(none),
     bootserver=192.168.23.254, rootserver=192.168.23.254, rootpath=
Looking up port of RPC 100003/2 on 192.168.23.254
Looking up port of RPC 100005/1 on 192.168.23.254
VFS: Mounted root (nfs filesystem).
Freeing init memory: 52K
Unhandled fault: external abort on linefetch (F4) at 0x00000001
fault-common.c(97): start_code=0x700040, start_stack=0x67ffbc)
[1] sh: bad data abort: code 33554432 instr 0x32005500
Code: 495f5fdf 0e97f38e (c9a303b1) 3ea3a3ad b294aa7c
fault-common.c(97): start_code=0x700040, start_stack=0x67ffbc)
Internal error: unknown data abort code: 32005500
CPU: 0
pc : [<0000ffff>]    lr : [<0000ffff>]    Not tainted
sp : 0000ffff  ip : 0000ffff  fp : 0000ffff
r10: 0004d04c  r9 : 00050e40  r8 : 001fa000
r7 : 00000000  r6 : 0000005b  r5 : 00169884  r4 : 0019a904
r3 : 001722c0  r2 : ffffffff  r1 : 20000010  r0 : 20010016
Flags: nzcv  IRQs off  FIQs off  Mode SYS_32  Segment kernel
Control: 0
Process sh (pid: 1, stackpage=001f9000)
Stack:
Backtrace: frame pointer underflow
Function entered at [<b1c9a2f3>] from [&amp;lt;8e0e97f3&amp;gt;]
Unhandled fault: alignment exception (93) at 0x00000001
fault-common.c(97): start_code=0x700040, start_stack=0x67ffbc)
Internal error: Oops: 0
CPU: 0
pc : [&amp;lt;0012cccc&amp;gt;]    lr : [&amp;lt;00058850&amp;gt;]    Not tainted
sp : 001f9e94  ip : 001f9e40  fp : 001f9ec8
r10: 00640004  r9 : 00000000  r8 : 00000010
r7 : 00000000  r6 : b1c9a2f3  r5 : 9c63fdbd  r4 : 0000ffff
r3 : 0014b478  r2 : 00000001  r1 : 00000001  r0 : 0000ffef
Flags: nZCv  IRQs off  FIQs on  Mode SVC_32  Segment kernel
Control: 0
Process sh (pid: 1, stackpage=001f9000)
Stack:
001f9e80:          00058850 0012cccc 60000093  ffffffff 0000ffff 001f8000 00000001
001f9ea0: 001f9fd4 0067ffc8 00053a08 001f8000  001f9fd4 0000ffff 32005500 001f9ee0
001f9ec0: 001f9ecc 00053b00 00053984 001f8000  001f9fd4 001f9ef0 001f9ee4 00053b50
001f9ee0: 00053a60 001f9f94 001f9ef4 00054080  00053b44 32005500 00000004 00000000
001f9f00: 00030001 0000ffff 00050cc8 001f9f2c  001f9f1c 000567a8 000551b4 00000000
001f9f20: 001f9f7c 001f9f30 000576f8 0005675c  0068a000 001f9f98 00000000 001f9f98
001f9f40: 0067ff78 20000013 000480a0 001f8000  0068a000 0014b1f8 00148000 0014a040
001f9f60: 00170a40 001f9f94 00055710 00000000  00000000 0067ffd0 32005500 00000000
001f9f80: 0067ffd0 00000001 00000000 001f9f98  00054ed8 00053ff8 00000009 00000000
001f9fa0: 00000009 0004d03c 00000000 00000000  0067ffd0 00000001 0067ffc8 00000000
001f9fc0: 00640004 00000000 00000000 0067ff88  0004d020 20010016 20000010 ffffffff
001f9fe0: 001722c0 0019a904 00169884 0000005b  00000000 001fa000 00050e40 0004d04c
Backtrace:
Function entered at [&amp;lt;00053974&amp;gt;] from [&amp;lt;00053b00&amp;gt;]
 r7 = 32005500  r6 = 0000FFFF  r5 = 001F9FD4  r4 = 001F8000
Function entered at [&amp;lt;00053a50&amp;gt;] from [&amp;lt;00053b50&amp;gt;]
 r5 = 001F9FD4  r4 = 001F8000
Function entered at [&amp;lt;00053b34&amp;gt;] from [&amp;lt;00054080&amp;gt;]
Function entered at [&amp;lt;00053fe8&amp;gt;] from [&amp;lt;00054ed8&amp;gt;]
 r7 = 00000001  r6 = 0067FFD0  r5 = 00000000  r4 = 32005500
Code: ebfcae37 e2440010 (e5961004) e1a03521 e59f20cc
Kernel panic: Attempted to kill init!

The punchline is that that's the best result I've got in a lot of experimentation I'm not writing down here. The one common thread, once I got the binary format figured out, is this:

Unhandled fault: external abort on linefetch (F4) at 0x00000001
fault-common.c(97): start_code=0x700040, start_stack=0x67ffbc)

The 0x00000001 is the same throughout. I tried this suggestion and ran flthdr -s 65535 busybox to increase the stack size from 0x1000 to 0xffff -- same result. Then I came across this message, which says that there's something wrong, F4 is the message from the CPU's fault register, and I need to figure out what it is. However, I've also come across (and lost the links to) another post which suggested it was a paroblem with a particular version of uClibc. So that means paying proper attention to a toolchain, which I'd skipped over earlier. I'm currently trying to get the HRI toolchain going, so we'll see how that turns out.