I'm trying to get Bacula to make a separate copy of monthly full
backups that can be kept off-site. To do this, I'm experimenting with
its "Copy" directive. I was hoping to get a complete set of tapes
ready to keep offsite before I left, but it was taking much longer
than anticipated (2 days to copy 2 tapes). So I cancelled the jobs,
typed unmount
at bconsole, and went home thinking Bacula would just
grab the right tape from the autochanger when backups came.
What I should have typed was release
. release
lets Bacula grab
whatever tape it needs. unmount
leaves Bacula unwilling to do
anything on its own, and it waits for the operator (ie, me) to do
something.
Result: 3 weeks of no backups. Welcome back, chump.
There are a number of things I can do to make sure this doesn't happen
again. There's a thread on the Bacula-users mailing list (came up in
my absence, even) detailing how to make sure something's mounted. I
can use release
the way Kern intended. I can set up a separate
check that goes to my cel phone directly, and not through Nagios. I
can run a small backup job manually on Fridays just to make sure it's
going to work. And on it goes.
I knew enough not to make changes as root on Friday before going on vacation. But now I know that includes backups.
Thursday: Go to The Other University to do some prep for the move coming up next week. Check in with their computer store (where you pretty much have to buy things) to see how the order on the console server is going. The guy behind the counter looks up the order, frowns, and tells me that it seems their supplier does not have one in any of their three Canadian warehouses. Okay, so how long will it take to get one in? He looks at me earnestly and says that, sometimes, they never come in. I ask at what point I can count on the supplier a) giving up and b) informing me of that fact. He frowns again, and suggests that I check back in a couple weeks (four weeks after I've placed the order) just to be safe.
Friday: Get email from contractor/university liason for new building to say that network and electrical connections will not be ready in time because the requests were received so very late. While The Other Guy was supposed to get them in long ago, I should've been on top of this.
Monday, a stat in Canada: Go to the old building to do a serverectomy on a soon-to-be-formerly shared rack. The Other Guy mentions that the new server room has water on the floor. I go over to look, and it's a rapidly evaporating puddle, irregular in shape and maybe two metres across at its widest. I can't figure out where it's coming from. Turns out there's some other stuff that should become formerly shared as well, so I spend time poring over Sun Enterprise 1 workstations (which I like) and old inkjet cartridges for printers that may no longer be around (which I don't like). Ask The Other Guy, who's been involved with the move a lot longer than I have, what electrical connections he's asked for him and for me (long story) in the new building. He says that he gave them the model number of the Sun rack he's got (which has built-in, and very nice, PDUs) and asked them to figure out what he needs.
Tuesday: Moving day. As expected, network and electrical are not present; we've got 2 x 15A 120V circuits. Also, the leak is back, and we can see that it's coming from a small leak in the concrete roof. I move my rack into another room; The Other Guy spreads a blanket over his rack. The liason promises us that the contractors are on the job to fix the roof. The network connections (two fiber, two Cat5) get terminated, so I call the local network folks to get that taken care of. The university wireless network is not present in the new building.
Wednesday: The contractors show up to start fixing the leak. The network connections have been set up. The contractors have put in a big tube of plastic sheeting, taped to the roof at one end and a 40-gallon recycling barrel at the other. The Other Guy decides things are good enough and starts setting up his rack; I elect to hold off another day.
Thursday: The contractors say the roof is fixed, so I move the rack in and start hooking things up. The new OpenBSD firewall comes up nicely -- thank you, pf developers -- as does the main Sun server. Next up is the SunRays in the lab, only they're not. I take my laptop in and try to verify connectivity. I can't. The Other Guys suggests that the VLANs on my new switch are the problem and suggests just simplifying things. I do and keep testing. Traffic from the laptop's RFC 1918 address just never makes it to the server. In a fit of desperation I try using an address in our routable subnet, and it works. This takes me until 8pm to figure out. I email various bosses explaining how far I've got, and the campus network folks to ask if they're filtering this subnet in some way. (This isn't completely out of the question; this place has a reputation for a pretty locked-down network.)
Friday: I buttonhole the guy at the campus network office and ask him about this. He considers this and realizes that while he's forgotten to unblock DHCP (told you it was pretty locked down), the other behaviour I'm seeing can be explained if I've somehow got my interfaces crossed. I'm doubtful but give it a try, which is a good thing because suddenly everything works. I don't understand it or what I did wrong, but assume that I was simply too tired the previous night and thank him profusely for taking the time to talk to me. I am now where I should have been twenty hours before. Mighty battles emerge with Sun's DHCP and Sunray servers. In the end, I have to delete the Sunray configuration, delete all DHCP configurations, and then add the Sunray configuration back. This works, which annoys me; why are there all these opaque configurations around? Not a single plain-text file in sight. I manage to get a printer working, then another. DHCP is modified so that laptops work as well. I call it a night and head home.