Come from
16 Mar 2010I've been off on vacation for a week while my parents have been visiting. It's been fun, relaxing, and generally a good time. But today it was back to work.
I found out that backups have not been running since the day after I went on vacation. There were something like 650 jobs in Bacula stacked up, waiting to run; the storage daemon was stuck trying to do something, and nothing was getting done.
Near as I can tell, the storage daemon was stuck in a deadlock. I've got some backtraces, I've posted to the devel mailing list, and it looks there's a bug, recently fixed, that addresses the problem. Inna meantime I've put in Nagios monitoring for the size of the run queue, and I'm figuring out how to watch for the extra bacula-sd process that showed up.
Tonight, at least, the same problem happened again. This is good, because now I have a chance to repeat it. Got another backtrace, killed the job, and things are moving on. Once things are finished here, I think I'm going to try running that job again and seeing if I can trigger the problem.
Sigh. This was going to be a good day and a good post. But it's late and I've got a lot to do.
1 Comment
From: Patrick
17 March 2010 15:30:41
Well at least you're on your way to solving the problem.
Add a comment:
Name and email required; email is not displayed.
Related Posts
QRP weekend 08 Oct 2018
Open Source Cubesat Workshop 2018 03 Oct 2018
mpd crash? try removing files in /var/lib/mpd/ 11 Aug 2018