This is going to be a long story, but I hope it'll be instructive. Bear with me.
Back at my last job, we had a Samba server, running on FreeBSD, acting as a Primary Domain Controller for around 35 W2K machines. The same machine also acted as NIS master for a similar number of FreeBSD machines. It also did printing, mail, DNS, and half a dozen other things. This machine was getting old; it's CPU usage was often pegged by a large print job, it was running out of disk space, and I was beginning to be worried about the inevitable day of death. I began planning for the upgrade: a new machine, faster and bigger hard drives, more memory and gigabit ethernet for the day we all moved to GigE. Oh, and rack-mounted...definitely rack-mounted.
The opportunity was taken to upgrade much of the software on the machine, including Samba. I decided to move from 2.2 to the 3.0 series; the speed differences seemed pretty impressive. I also wanted to get as many of the big upgrades done at once as possible: the prospect of going through the upgrade repeatedly did not appeal.
Of all the upgrades I was doing, Samba made me the most nervous. I read through the excellent (and Free) Samba HOWTO and made notes: how to move to the tdsam password database, changes in configuration options, and so on. I had the new server for a while, so I was able to run through many tests: getting a Windows machine to log on, DNS queries, and so on.
Finally, the big day came. I went in on a Saturday and made the move. Most of the rest of the day was spent testing, chasing down the inevitable mistakes, and testing some more. I tested by logging into machines after they'd joined the domain, and making sure that everyone could still log into their workstations. All told, things went pretty damned well, and I congratulated myself on a job well done.
Later though, a few things began to crop up that I haven't been able to explain. I could no longer add new domain accounts to SSH under Cygwin. A shared printer wasn't being shared any longer. In fact, shares weren't working at all. I banged my head against this for a while, but since the problems were pretty erratic they tended to fall to the wayside in favour of explaining, one more time, why the words "spare computer" were self-contradictory.
Finally, though, I put some more time into it. And it's a little hairy, especially for this Unix guy, so bear with me.
(Incidentally, I couldn't have figured out half of this without the help of Clarence Lee, a co-op student working with us. Sure, he uses IIS, but he firewalls it with OpenBSD and he got an internship at Microsoft. He's a good guy.)
The shared printer: could not figure out what was going on here. Guy
who had it could print to it, no problem. Used to work for everyone,
no problem. Now it wouldn't work. Broke the problem down to the point
where I was using smbclient
on FreeBSD, or net view
on W2K, to try
and list the shares, and that didn't work. Not any of them -- not
IPC$
or anything. I was fairly sure this wasn't supposed to be
happening.
There was a machine in limbo (not the same as spare, thenk yew!) while a coop student became permanent. I got it using the other networked printer, and tried sharing it. Again, command-line utilities would simply not list the shares. What's more, when I tried getting other people to log into the machine (I was fairly irritated at this point, and not at my most rational), they couldn't log in. WTF? I could log in, and there had been no complaints from the person whose machine it had been.In a moment of irritation, I got the test machine to rejoin the domain...and suddenly, everything was working: I could list shares on it, other people could list shares on it, people could log in, and everything. Yay! It's so simple! Rejoin the domain! Everything will be great!
Ha! It is to laugh. Profiles were not coming in when people logged
in. My Documents
was empty, they got that stupid, evil, vile "Let's
take a tour of Windows! And let me help you set up your network! DO
IT!" popup window. I couldn't figure it out.
Clarence and I banged out heads against it some more, and finally came to a conclusion.
When you migrate Samba, you're meant to take the old SID with you
using net(8) GETLOCALSID
and SETLOCALSID
. The SID is meant to
be a world-unique string/number that identifies a domain, or an
account -- think something like the DN in LDAP, or NIS domainname +
UID in Unix. (A user's SID has a part that belongs to the domain, and
another, smaller part that is unique to that user.) I didn't do that
-- screwup -- and so the Samba server had generated a new SID. As far
as Windows is concerned, the identity of your domain is solely
determined by the SID; the name is their just for your
convenience. (Insert snide remark here about how magic invisible
numbers have no business being that important.)
As a result, the machines that were present at the migration didn't know where their Primary Domain Controller (PDC-- the machine officially in charge of the domain) had gone, and were running on cached credentials, profiles and so on. (This is the same thing that allows you to log into a Windows laptop that belongs to a domain, even when you've taken it home and aren't able to reach your PDC any more.) Printing and shared resources from the Samba server continued to run because of open permissions or credentials (ie, user name and password) that don't depend on SIDs.
This also explained why I could log into the machines without problems: because, as sysadmin, I'd logged into all of them before to do maintenance. My credentials were cached, so the machines were able to authenticate me w/o consulting with their (now missing) PDC. And of course, everyone was able to log into their own workstations for the same reason.
So: machine rejoins the domain and people can log in, because now the
machine can find its PDC and verify their passwords. But profiles
aren't showing up because the profile's NTUSER.DAT
-- the user's
hive, loaded into the registry at HKEY_CURRENT_USER
when they log in
-- belonged to/was marked with/was owned by the account's old SID,
and Windows refused to load it and lots of stuff broke or was missing.
After some more searching, I finally figured out the way around this.
First, you need to use the profiles(1) tool in Samba to change
the SID on NTUSER.DAT
, which'll be wherever Samba keeps
profiles. You should check their SID in Samba by using
pdbedit(8), though odds are the user ID/group ID part will have
remained the same.
Second, you need to take care of the profile. There are a few ways of
doing this. The easiest way is to copy the modified NTUSER.DAT
to
their profile directory, then log into the machine as Administrator
and join the new domain, then get the user to log in. Their profile
will be copied over, just as if they'd logged into a machine for the
first time. However, this can cause problems with certain programs who
haven't been informed about the change.
To illustrate: if the domain name is named EXAMPLE
, and the user
account is jdoe
, then their profile will usually be at C:\Documents
and Settings\jdoe
(let's just call that D&S\jdoe
for
short). However, D&S\jdoe
will belong, after joining the new domain,
to an old account that's no longer around, which means that Windows
will put their profile somewhere else -- probably something like
D&S\jdoe.EXAMPLE
. Odds are, though, that the old path will still be
in the registry or other files, which means a lot of cycles of
"Why-did-that-break-let-me-fix-it". Another option is simply to move
D&S\jdoe
out of the way, so that paths can remain the same. Finally,
you can also change ownership recursively to the new account once
you've joined the domain; this will take a while, but it's probably
quicker than copying the profile over wholecloth if they've got a lot
of files. If you do this, it's best to remove the machine's copy of
their NTUSER.DAT
file; it'll just be copied over from the server.
This took a lot of work, of course, and usually there were things like
Outlook.pst
to screw things up further. But after much work, I
finally got everyone moved over to the new domain, and things were
good again.
Lessons learned:
(Note: this was actually written back in May.)
Top Tip: Filenames with a tilde in them can confuse Samba.
Case in point: last week a user was
having problems loading his profile: W2K kept choking and saying that
the file Local Data\Applications\foo\backup\~AvariciousMonkeys.c
was
in use. Naturally, lsof on the Samba server turned up nothing, and I
couldn't see any obvious problem. On a hunch, I tried renaming the
file to AvariciousMonkeys.c~
, and hey presto! goodness all
over.
This week I'm trying to get FAI going in seriousness. I've worked on it before, but now I've got three developers who want to switch to Linux. The last thing I want is another series of one-offs, so I'm taking the time to do it right. Now there's a CD version in beta, and so far it's working well. Cf. the usual way of doing it, which is to do PXE booting and grab everything off the network. I'm not opposed to that, but one of the things I wanted out of FAI before was the ability to do CD-based, kickstart-like Debian installs; looks like it's finally going to work.
Looks like we're having a problem with a Maxtor PCI IDE controller and the Intel mobo in our backup server. It's been mysteriously crashing in the middle of the night w/no log messages. Some checking in the BIOS turned up another problem: going to the hardware monitoring page to look at the CPU temperature made the damn thing freeze. WTF? Sure seems like the symptom we were seeing, and backups running at night make big use of the Vinum array that uses drives attached to the IDE adapter...long story short, taking out the card stopped the BIOS freezing. It remains to be seen if it'll work for the random midnight freezes, but it's good to have something to try. I'm hopeful that FreeBSD will be able to handle SATA drives attached to this thing...we'll have to see.
Which brings me to the next bit: fleshing out plans for server upgrades. As I mentioned, last week we had a power supply fail on our Very Important Server, and I want to try and keep that from happening again. Of course, adding umpty thousand dollars worth of hardware to your budget four months before the end of fiscal doesn't really work too well, so as much as possible I need to do this w/o new hardware. Ha! But I'll give it a try.
First off is setting up OpenLDAP and importing Samba's information into it. That'll be neat, since I've never worked w/LDAP before. Second is to set up some BDCs using OpenLDAP to query the master. (Or do they just suck over the whole database? Hm. Either way.) Third is to set up some Linux machines. Why? Two reasons:
LinuxHA and DRBD seem fantastic, and there just doesn't seem to be anything comparable on the FreeBSD side. As for the hardware...well, my first impression of server hardware from IBM, HP and the like (no, don't talk to me about Dell) is that I'm going to need a newer version of FreeBSD than we currently use in order to run SATA drives. (I know SCSI is the way to go, but I was quoted two thousand dollars for two IBM 73GB 15k drives! I know: 15k, IBM, etc, but even halving that means two -- two! -- 73GB drives for a thousand bucks, a/o/t two 200GB drives for, what, four hundred. Heh.)
We're using an older version of the 4-series FreeBSD here. I've already set up one server using a newer 4-series release, and it's a pain: too many differences, one more thing to keep in mind when making changes, and so on. I haven't worked with the 5-series yet, and I don't want to start now...not entirely sure that it'd work for us. Plus, we'll probably migrate to Linux anyway, so I don't mind doing it for a server.
Anyhow! Get a Real Server and throw Linux on it. Hook it up to our drive array and start migrating home directories to ReiserFS from UFS/FreeBSD. Not trivial, but doable. Add more Linux servers as budget allows.