Come from

I've been off on vacation for a week while my parents have been visiting. It's been fun, relaxing, and generally a good time. But today it was back to work.

I found out that backups have not been running since the day after I went on vacation. There were something like 650 jobs in Bacula stacked up, waiting to run; the storage daemon was stuck trying to do something, and nothing was getting done.

Near as I can tell, the storage daemon was stuck in a deadlock. I've got some backtraces, I've posted to the devel mailing list, and it looks there's a bug, recently fixed, that addresses the problem. Inna meantime I've put in Nagios monitoring for the size of the run queue, and I'm figuring out how to watch for the extra bacula-sd process that showed up.

Tonight, at least, the same problem happened again. This is good, because now I have a chance to repeat it. Got another backtrace, killed the job, and things are moving on. Once things are finished here, I think I'm going to try running that job again and seeing if I can trigger the problem.

Sigh. This was going to be a good day and a good post. But it's late and I've got a lot to do.