Debugging Bacula FileSet exclusions -- an example
20 Apr 2012A user at $WORK was running a series of jobs on the cluster -- dozens at any moment. Other users have their quota set to 60 GB, but this user was not (long story). His home directory is at 400GB, but it was closer to a terabyte not so long ago....right when we had a hard drive and a tape drive fail at the same time on our backup server.
We do backups every night to tape using Bacula. Most backups are incremental (whatever changed since the last backup, usually the day before) and are small...maybe tens of GB per day. But backups for this user, because of the proliferation of logs from his jobs, were closer to the size of his home directory every day -- simply because all these log files were being updated as each job progressed.
Ordinarily this wouldn't be a problem, but the cluster of hardware failures have really fucked things up; they're better now, but I'm very slowly playing catchup backups. Eating a tape or more every day is not in my budget right this moment.
I asked him if any of the log files could be excluded from backups without any great loss. After talking it over with him, we came to this agreement:
- His home directory would be backed up (obvs)
- but within "projects/output", only files that contained "rep0" somewhere in the filename would be backed up.
This would exclude lots of other files like "1rep2.foo", "8rep9.log", etc, and would cut out about 200 GB of useless churn every day.
Bacula has the ability to do this sort of thing...but I found its methods somewhat counterintuitive, so I want to set down what I did and how I tested it.
First off, the original, let's-include-everything FileSet looked like this:
FileSet {
Name = "example"
Include {
File = /home/example
Options {
signature = SHA1
}
}
Exclude {
File = /proc
File = /tmp
File = /.journal
File = /.fsck
File = /.zfs
}
}
We back up everything under /home/example, we keep SHA1 signatures, and we exclude a handful of directories (most of which are boilerplate, applied to every FileSet by default).
In order to get Bacula to change the FileSet definition, you have to get the director to reload its configuration file. But some errors -- not all -- cause a running bacula-dir process to die. So before I started fiddling around, I added a Makefile to the /opt/bacula/etc directory that looked like this:
test:
@/opt/bacula/sbin/bacula-dir -t && echo "bacula-dir.conf looks good" || echo "problem with bacula-dir.conf"
reload: test
echo "reload" | /opt/bacula/sbin/bconsole
Whenever I made a change, I'd run "make reload", which would test the configuration first; if it failed, bacula would not be reloaded. (The "@" symbol, in a Makefile, discards standard output.)
Next, I needed a listing of what we were backing up now, before I started fiddling with things:
echo "estimate job=fileserver-example listing" | bconsole > /tmp/listing-before
The "estimate" command gets Bacula to estimate how big the job is; the "listing" argument tells it to list the files it'd back up. By default it gives you the info for a full backup. (You can also append a joblevel, so you can see how big a Differential or Incremental; I didn't need that here, but it's worth remembering for next time.)
After that, I made another Makefile that looked like this:
test: estimate shouldwork shouldfail
estimate:
@echo "estimate job=fileserver-example listing" | bconsole > /tmp/listing-after ; wc -l /tmp/listing*
shouldwork: estimate
grep rep0 /tmp/listing-before | grep projects/output | while read i ; do grep -q $$i /tmp/listing-after || exit 1 ; done
shouldfail:
grep rep2 /tmp/listing-before |grep projects/output | while read i ; do grep -q $$i /tmp/listing-after && exit 1 ; done ; true
This is a little hackish, so in detail:
The estimate target gets an updated listing of what Bacula will back up; the line count lets me eyeball how it compares to the old, all-inclusive listing.
The shoudwork target gives me a quick way to make sure that all the files with "rep0" in the name and "projects/output" in the path are still in that updated listing. We grep for these files in the new listing; it either works or exits with error code 1, which make will catch and declare an error.
The shouldfail target is similar, except I'm making sure that files with "rep2" in the name are excluded from the new listing and we're short-circuiting the loop if any line is found. The "true" at the end is there to give make a final success; we only make it to that command if the entire loop has not found anything, which is what we want. It's there to make this test a "MUST NOT". (That's probably not explained very well.)
Anyhow: after each change, I'd run "make reload" as root to make sure that the syntax worked. After that, I'd run "make test" as an ordinary user (no need for root privileges) to make sure that I was on the right track. After a while, I got this:
FileSet {
Name = "example"
Include {
File = /home/example
Include {
Options {
signature = SHA1
Wilddir = /home/example/projects/output
Exclude = yes
}
}
}
Include {
File = /home/example/projects/output
Options {
WildFile = "*rep0*"
Signature = SHA1
}
Options {
Exclude = yes
RegexFile = ".*"
}
}
Exclude {
File = /proc
File = /tmp
File = /.journal
File = /.fsck
File = /.zfs
}
}
Again, this is a little counterintuitive to me, so here's how it works out.
The first "Include" stanza is the same, except that in the "Options" section we're excluding "/home/example/projects/output". That's what the "Wilddir" and "Exclude = yes" directives are for.
The second "Include" stanza puts the "/home/example/projects/output" back in, but modified with two "Options" sections: the first to include "rep0" (a simple fileglob) and the second to exclude everything. What ends up being included by this stanza is the union of those two options: only files named "rep0" in the directory "/home/example/projects/output".
Last, the third stanza is our standard "Exclude" boilerplate.
After I was confident that I had the right set of files excluded, I sent the user a list of files to confirm that all was well:
cat /tmp/listing_before | while read i ; do grep -q $i /tmp/listing_after || echo $i ; done > /tmp/excluded
Now, I'm the first to admit that that is ugly. Diff, useless use of cat...lots of objections to raise. But it's been a long day and I got what I wanted. I pointed the user at it, made sure it was okay, and committed the changes.
All in all, this gave me a good loop for testing: it caught fatal errors before they happened, it let me be sure I was excluding the right things, and I was able to work in a stepwise fashion to get where I wanted.
Add a comment:
Name and email required; email is not displayed.
Related Posts
QRP weekend 08 Oct 2018
Open Source Cubesat Workshop 2018 03 Oct 2018
mpd crash? try removing files in /var/lib/mpd/ 11 Aug 2018