The_wine_of_boot


title: THE WINE OF BOOT date: 2005-02-04 20:32:18

Came across a couple of interesting problems this week. The first was getting our bran' spankin' new dual G5 Power Mac running Gentoo Linux to boot without a monitor. It turned out to be surprisingly difficult. Of course, I found this out just before I was going to move it into its permanent location; it simply hadn't occurred to try beforehand and make sure it wasn't an issue. (Glad I tried this before moving it, though; it's got to be at least 800kg.)

As near as I could tell (no serial port), the thing simply would not boot with the monitor detached. It'd power on, give the little boot chime, and then...nothing. It wouldn't respond to pings, it wouldn't use the monitor once I plugged it back in, and it would go into airplane mode after a few minutes (fans full blast...man, this thing has got some serious cooling), which suggested it wasn't even getting into Linux.

What was truly strange was that it seemed to be sending out multiple DHCP requests, asking for a new address every few minutes. It never responded on those addresses, nor did any other traffic come from them. I asked the Mac guy at work about it. "It shouldn't do that," he said. Well, okay, good to confirm that, but what do we do now? Didn't know: we knew of the Xserve machines, and figured headless booting shouldn't be a problem, but couldn't figure out what the next step would be.

There turned out to be a great deal of silence on the issue; neither one of us could find anything remotely like this in Google. Oh sure, there were the Old World Macs where you had to use cross-connect a couple pins on the video cable (or buy a dongle that'd do it for you) because the things depended on a monitor (ugh!), but nothing about New World machines like this. I found out about nvsetenv, the command that shows you the environment variables that are set in the Parameter RAM for Open Firmware, and hey! this looks interesting:

skip-netboot: false

Well, okay then! Set that to true, that'll keep it from trying to do DHCP, right? Skip right on over to the Linux kernel, and that's that! Save, reboot, and -- no. Doing the same godammn thing. Back to square one.

Finally, I came across a mailing list message from one of the Yaboot developers that gave me a clue. I started looking at ofboot.b, the file used by yaboot to (hope I'm getting this right) feed Forth commands to the Apple Open Firmware in order to get things to boot. And then the Mac guy at work pointed out this ofboot.b file, from a page on YellowDog Linux' site about headless booting. Woohoo, off to the races!

...Yeah, right. Have you ever worked with Forth? I certainly haven't. I scratched my head, looked at the two versions of ofboot.b, then decided to sit down at the Open Firmware prompt (splat-alt-o-f) and do a bit of experimenting. It wasn't bad, actually; once I figured out "hello world" and simple addition, I was starting to understand what I'd seen.

I narrowed it down to two problem areas in Gentoo's version. The first was right up at the top: : .printf fb8-write drop ; This was a subroutine called printf being defined as two steps (I think): fb8-write and drop. Compare and contrast this with the usual way of printing stuff in Forth ("type"). I tried modifying this to read: : .printf type ; Gave it a try...and no, same problem.

Well, what about that dump-stdout bit from YDL? I tried that from the OF prompt and nothing: it complained that it was an unknown keyword. Okay, they're probably defining it somewhere else. Rather than bother looking it up, I moved on to the next bit, also close to the beginning of the file: " screen" output When I did this at the OF prompt, I got a seizure-inducing flicker followed by a prompt at a clear screen. Certainly doesn't seem good...so I tried removing that line, and success! I was able to boot without a monitor!

It's still strange to me that I haven't seen this mentioned anywhere. I guess it's possible no one's tried to boot one of these headless before, but I suspect that all my searches for "headless g5" were just too swamped in speculation about the Mini Mac. 'Tany rate, I'll submit a bug to Gentoo about this, and maybe send off something to the Yaboot folks as well.

The other problem was with Wine; we use it for a couple of Windows command-line tools as part of our compile. We had a guy whose instance of wine couldn't find a file in his PWD: wine -- "c:foo" bar.c
Error: can't find file bar.c
And of course, bar.c was right fucking there, just waiting to be turned into an object file. So WTF, right? WINEPREFIX: yep, set correctly. Other tools able to find the file: yep, no problem there. Move the file to /tmp and try there -- aha, works!

Now I knew what was going on. We've had recurring problems with amd on FreeBSD (and lo, this was most assuredly a FreeBSD workstation): there is some kind of symlink caching going on (and amd is all about the symlinks, baby) in the FreeBSD kernel that amd finds itself unable to cope with. We'd upgraded amd to a later version on our workstations and found that our problems went 'way down -- only looks like I missed one. I set it up, told him to reboot, and patted myself on the back for a job well done. Only it didn't change anything: wine was still unable to find the damn file. Well, fuck.

In desperation I tried moving his directory (~jdoe/cvs) to another place (~jdoe/test), on the assumption that a different inode or something would convince amd to just find the damn file. Yeah, I know -- but it worked. I did some more dances, embarrassing to relate, and gradually convinced myself that whatever amd's failings, they were not relevant to this problem. Nothing else for it; time to pull out the big guns: wine --debugmsg +all -- "c:foo" bar.c Holy crap, does that every present a lot of debugging info: piped to a file, it was around 600KB. A quick look showed that most of it was boring Wine/Windows intialization stuff, but I was able to narrow it down by grepping for "bar.c".

And this is where I found the interesting bit:

DOSFS_FindUnixName 'foo.c' not found in '/home/jdoe/CVS

In order to find a file, Wine needs to know what Windows drive it's on. The usual practice is to set up a drive for your home directory, and that's what we'd done with F:. But what was it doing looking in the wrong directory? This guy doesn't have anything named CVS -- -- except that he does, and this was confusing Wine: it was running in Unix and was case-sensitive (starting at the diretory called "CVS", which comes before "cvs"), but it was simulating Windows and so it was case-insensitive (it couldn't tell the difference between F:CVS and F:cvs). Not only that, but it was stopping at the first failure, giving up on the F: drive, then moving on to other drives, then throwing an error when it couldn't find the file. No wonder it would all work if I renamed the directory!

It took me a day and a half to figure this out (I wish that were a lie), but I got to look like a minor hero when I showed him the solution: mv CVS somewhere_else With any luck, this little tale will save someone else a day and a half of their life.

And holy crap: here's a link to a guy who's hacked the freaking BIOS in his IBM T-41. Disturbingly, it would not let him use the wireless card he's bought for it -- he'd gone aftermarket, rather than official IBM -- and it kept saying "Please remove unauthorized card". Given IBM's geek-friendly rep, I'm surprised something like this hasn't got more play. Maybe I'll submit this to Slashdot and see what happens.