The_wine_of_boot
04 Feb 2005title: THE WINE OF BOOT date: 2005-02-04 20:32:18
Came across a couple of interesting problems this week. The first was getting our bran' spankin' new dual G5 Power Mac running Gentoo Linux to boot without a monitor. It turned out to be surprisingly difficult. Of course, I found this out just before I was going to move it into its permanent location; it simply hadn't occurred to try beforehand and make sure it wasn't an issue. (Glad I tried this before moving it, though; it's got to be at least 800kg.)
As near as I could tell (no serial port), the thing simply would not boot with the monitor detached. It'd power on, give the little boot chime, and then...nothing. It wouldn't respond to pings, it wouldn't use the monitor once I plugged it back in, and it would go into airplane mode after a few minutes (fans full blast...man, this thing has got some serious cooling), which suggested it wasn't even getting into Linux.
What was truly strange was that it seemed to be sending out multiple DHCP requests, asking for a new address every few minutes. It never responded on those addresses, nor did any other traffic come from them. I asked the Mac guy at work about it. "It shouldn't do that," he said. Well, okay, good to confirm that, but what do we do now? Didn't know: we knew of the Xserve machines, and figured headless booting shouldn't be a problem, but couldn't figure out what the next step would be.
There turned out to be a great deal of silence on the issue; neither
one of us could find anything remotely like this in Google. Oh sure,
there were the Old World Macs where you had to use cross-connect a
couple pins on the video cable (or buy a dongle that'd do it for you)
because the things depended on a monitor (ugh!), but nothing about
New World machines like this. I found out about nvsetenv
, the
command that shows you the environment variables that are set in the
Parameter RAM for Open Firmware, and hey! this looks interesting:
skip-netboot: false
Well, okay then! Set that to true, that'll keep it from trying to do DHCP, right? Skip right on over to the Linux kernel, and that's that! Save, reboot, and -- no. Doing the same godammn thing. Back to square one.
Finally, I came across a mailing list message from one of the
Yaboot developers that gave me a clue. I started looking at
ofboot.b
, the file used by yaboot to (hope I'm getting this right)
feed Forth commands to the Apple Open Firmware in order to get things
to boot. And then the Mac guy at work pointed out this ofboot.b
file, from a page on YellowDog Linux' site about headless
booting. Woohoo, off to the races!
...Yeah, right. Have you ever worked with Forth? I certainly
haven't. I scratched my head, looked at the two versions of
ofboot.b
, then decided to sit down at the Open Firmware prompt
(splat-alt-o-f) and do a bit of experimenting. It wasn't bad,
actually; once I figured out "hello world" and simple addition, I was
starting to understand what I'd seen.
I narrowed it down to two problem areas in Gentoo's version. The first
was right up at the top: : .printf fb8-write drop ;
This was a
subroutine called printf
being defined as two steps (I think):
fb8-write
and drop
. Compare and contrast this with the usual way
of printing stuff in Forth ("type"). I tried modifying this to read:
: .printf type ;
Gave it a try...and no, same problem.
Well, what about that dump-stdout
bit from YDL? I tried that from
the OF prompt and nothing: it complained that it was an unknown
keyword. Okay, they're probably defining it somewhere else. Rather
than bother looking it up, I moved on to the next bit, also close to
the beginning of the file: " screen" output
When I did this at the
OF prompt, I got a seizure-inducing flicker followed by a prompt at a
clear screen. Certainly doesn't seem good...so I tried removing that
line, and success! I was able to boot without a monitor!
It's still strange to me that I haven't seen this mentioned anywhere. I guess it's possible no one's tried to boot one of these headless before, but I suspect that all my searches for "headless g5" were just too swamped in speculation about the Mini Mac. 'Tany rate, I'll submit a bug to Gentoo about this, and maybe send off something to the Yaboot folks as well.
The other problem was with Wine; we use it for a couple of Windows
command-line tools as part of our compile. We had a guy whose instance
of wine couldn't find a file in his PWD: wine -- "c:foo"
bar.c
And of course, bar.c
was right fucking there, just waiting to be turned into an object
file. So WTF, right? WINEPREFIX: yep, set correctly. Other tools able
to find the file: yep, no problem there. Move the file to /tmp and try
there -- aha, works!
Error: can't find file bar.c
Now I knew what was going on. We've had recurring problems with amd on FreeBSD (and lo, this was most assuredly a FreeBSD workstation): there is some kind of symlink caching going on (and amd is all about the symlinks, baby) in the FreeBSD kernel that amd finds itself unable to cope with. We'd upgraded amd to a later version on our workstations and found that our problems went 'way down -- only looks like I missed one. I set it up, told him to reboot, and patted myself on the back for a job well done. Only it didn't change anything: wine was still unable to find the damn file. Well, fuck.
In desperation I tried moving his directory (~jdoe/cvs) to another
place (~jdoe/test), on the assumption that a different inode or
something would convince amd to just find the damn file. Yeah, I
know -- but it worked. I did some more dances, embarrassing to relate,
and gradually convinced myself that whatever amd's failings, they were
not relevant to this problem. Nothing else for it; time to pull out
the big guns: wine --debugmsg +all -- "c:foo" bar.c
Holy crap,
does that every present a lot of debugging info: piped to a file, it
was around 600KB. A quick look showed that most of it was boring
Wine/Windows intialization stuff, but I was able to narrow it down by
grepping for "bar.c".
And this is where I found the interesting bit:
DOSFS_FindUnixName 'foo.c' not found in '/home/jdoe/CVS
In order to find a file, Wine needs to know what Windows drive it's on. The usual practice is to set up a drive for your home directory, and that's what we'd done with F:. But what was it doing looking in the wrong directory? This guy doesn't have anything named CVS -- -- except that he does, and this was confusing Wine: it was running in Unix and was case-sensitive (starting at the diretory called "CVS", which comes before "cvs"), but it was simulating Windows and so it was case-insensitive (it couldn't tell the difference between F:CVS and F:cvs). Not only that, but it was stopping at the first failure, giving up on the F: drive, then moving on to other drives, then throwing an error when it couldn't find the file. No wonder it would all work if I renamed the directory!
It took me a day and a half to figure this out (I wish that were a
lie), but I got to look like a minor hero when I showed him the
solution: mv CVS somewhere_else
With any luck, this little tale
will save someone else a day and a half of their life.
And holy crap: here's a link to a guy who's hacked the freaking BIOS in his IBM T-41. Disturbingly, it would not let him use the wireless card he's bought for it -- he'd gone aftermarket, rather than official IBM -- and it kept saying "Please remove unauthorized card". Given IBM's geek-friendly rep, I'm surprised something like this hasn't got more play. Maybe I'll submit this to Slashdot and see what happens.
3 Comments
From: hanz
06-February-2005-14:03:13
So I was wondering if you could expound on some details as I'm going through some headaches with a headless Xserve (only its a g4). Anyway like you my system will work fine with the video card in there, but as soon as I take it out, it'll just sit there and do nothing. I've tried what you were talking about and futzed with the open firmware, but all I could do is just make it seem like it would work, and not. Anyway I just want to confirm this was a case where you physically removed the card from the system, not a case where you just un-plugged the monitor. Also, when skip-netboot? is false, it will pick up the dhcp assigned address and I can ping it(very annoying), but when its true (off) no ping, but I can see that the 2 hard drive ligts are blinking back-n-forth as I think it is trying to find a mac bootable device. Anyway, any ideas on the matter would be helpful, otherwise I’ll just keep hitting the internet.
Thanks for the hope.
-hanz
From: Saint Aardvark
06-February-2005-20:00:53
Sorry -- this was for a G5 with a video card...not technically headless, I guess. My problem was trying to boot the G5 with the video card still installed, but no monitor attached. Sorry I can't be more help.
From: the life of a sysadmin » FYT
10-September-2005-06:58:53
[...] I mentioned two servers, though, so what about the second? Aha, that’s the symlink caching thing. We get around this by running a newer version of amd than is supplied w/FreeBSD; it doesn’t have quite so many problems. But I’d missed the second server, and it didn’t have the pointer to the newer version of amd. Again, my fault — I should’ve caught this a long time ago — but dangit, it shouldn’t be necessary to do this just to restart amd. (I’m setting up cfengine to catch this sort of thing. cfengine rox.) [...]
Add a comment:
Name and email required; email is not displayed.
Related Posts
QRP weekend 08 Oct 2018
Open Source Cubesat Workshop 2018 03 Oct 2018
mpd crash? try removing files in /var/lib/mpd/ 11 Aug 2018