Why I'm starting to hate Bacula
11 Apr 2012This is an attempt to lay out my problems with Bacula, and to be explicit about what I hope to achieve by replacing it (if, in fact, I do go ahead with that). If I'm wrong, correct me.
Too many long jobs monopolize spool space, storage job slots, and generally hold up production.
My largest jobs right now are around 1-2 TB -- and in order to accomplish that, I need to manually split up filesystems using a messy syntax. A job running that long will cycle through
- spool for hours
- despool for hours
many, many times. During spooling, a slot of storage space jobs is used. During despooling, no other job can despool to that tape drive. Often, this ends up holding up a lot of other jobs. If there's a problem, I'm faced with a choice between killing a job that's been running for days, or letting lots of other stuff go swithout backups until/unless it finishes.
More generally, I'm faced with a choice between letting everything run forever at the beginning of the month (because it's simplest to schedule fulls for the first Saturday or some such), or juggling schedules manually to stagger things (which I'm doing now, and leads to schedules like FullBackupSecondSundayAfterLent).
Possible fixes:
- Disk -> Disk -> Tape
- More breaking up of jobs
- More/faster spool space
- Tweaking max jobs
- Smart ZFS snapshot scripts (so I can restart a job by pointing it at the snapshot; may be yet more baroqueness)
Bacula seems to get confused easily about what tapes are available for use.
Bacula's storage daemon seems to often hold on to outdated info about what tapes are in what state.
Example: the daily pool is full, so jobs are halted. Status storage shows it's waiting for a drive to be created for the daily pool. I move a volume from another pool, then have to attempt to mount it manually in the appropriate drive -- the storage daemon doesn't pick up on this change automatically.
Sometimes this works, and sometimes it doesn't. Sometimes both are waiting for a tape from the same pool; creating one doesn't let the jobs queued up on the other drive run on that new tape, but rather you need to create a second new tape and mount it. On top of that, sometimes the jobs hang around on the storage daemon still waiting for a new tape -- or something...because they don't get out of the way, and let other jobs run in their place, unless they're cancelled (and sometimes only when bacula-sd is restarted).
This may be fixed with the upgrade to 5.2.6. However....
The new version of Bacula crashes when I run too many jobs at once.
That's 5.2.6, upgraded to from 5.0.2 (time got away on me, yes). And by too many I mean, like, 50. That's not too many! I'm not sure what the hell's going on, though at least now I have a backtrace. I'm seriously pissed off about this point. Yes, I'll file a bug, but this is annoying.
All in all, I spend far too much time babysitting Bacula.
It's extremely high maintenance, and that's pissing me off. Understand, this is coming after a long weekend spent babysitting it, trying to make sure some jobs got written. There are other problems at work, yes, but this is not meant to be so hard.
Add a comment:
Name and email required; email is not displayed.
Related Posts
QRP weekend 08 Oct 2018
Open Source Cubesat Workshop 2018 03 Oct 2018
mpd crash? try removing files in /var/lib/mpd/ 11 Aug 2018