Carousel is a lie!

Entries from March 2010.

So *that's* what's going on
2nd March 2010

Sometimes random bits of my personality make themselves clear to me. Example: today, a good part of my day was spent trying to figure out exactly what a particular server did. I knew it ran a program, and what the program was called, but that was it; it had been shut down and sent over to me for installation in $WORK's server room. I had to:

All of which reminded me of a quote from the biography of Isaac Newton right now (which I can't track down, worse luck)...something about how, without understanding, someone whose tools break will be unable to figure out what's wrong. Much more succinct than that in the original. And that, in turn, reminded me of something I saw on a mailing list recently (or a website, or...) about how a sysadmin is someone who asks why something works, not just how to make it so.

Example: I came across this article, shortly after "Aurora Attack -- Resistance Is Futile, Pretty Much" on the howling void, and that made me realize that I'd been thinking in the panicky sort of way outlined. And that reminded me of the time I learned about The Rapture as a kid and went home and told everyone about it. I was told something, it was compelling, and I took it in wholesale, like snorting cocaine, and ran to tell everyone about it. And that made me realize that I've done no checking at all on that Wired article, none, but it's compelling and here I am telling everyone about it.

And I was relating this all to my wife and she said yeah, you dive in, but it's a process. You spent the next n years of your life learning about American fundamentalist Christianity and Bible history and you didn't just hide under the bed scared forever.

Random note: Ken MacLeod's The Execution Channel just got scarier. (It's a process, remember?)

'Nother random note: Glad to see Planet Sysadmin is back up; I missed it.

No tags
Postfix cleanup
4th March 2010

Did you know that Postfix's cleanup daemon consolidates messages going to the same envelope recipient? I did not.

Tags: postfix.
Come from
17th March 2010

I've been off on vacation for a week while my parents have been visiting. It's been fun, relaxing, and generally a good time. But today it was back to work.

I found out that backups have not been running since the day after I went on vacation. There were something like 650 jobs in Bacula stacked up, waiting to run; the storage daemon was stuck trying to do something, and nothing was getting done.

Near as I can tell, the storage daemon was stuck in a deadlock. I've got some backtraces, I've posted to the devel mailing list, and it looks there's a bug, recently fixed, that addresses the problem. Inna meantime I've put in Nagios monitoring for the size of the run queue, and I'm figuring out how to watch for the extra bacula-sd process that showed up.

Tonight, at least, the same problem happened again. This is good, because now I have a chance to repeat it. Got another backtrace, killed the job, and things are moving on. Once things are finished here, I think I'm going to try running that job again and seeing if I can trigger the problem.

Sigh. This was going to be a good day and a good post. But it's late and I've got a lot to do.

1 comments. Tags: backups, bacula, debugging.
Not quite me but almost
18th March 2010

This is not quite me but a) I did have a phall, once, long ago, and b) he really does look like me. (Or I look like him...what are the rules of precedence in these things?)

I have been very, very busy over the last couple of days but I am finally starting to see a way through it. I'm hoping that Bacula problems are behind me for now.

Update: Nope, locked up two minutes after I wrote that. But I got the backtrace I needed!

2 comments. Tags: meta.
What I've been banging my head against
23rd March 2010

I think I've finally figured out what's going on with my bacula-sd hangs. At the risk of repeating myself, this post is adapted from the bug report I've just filed.

Here's the situation: with the latest Bacula release (5.0.1), I regularly see bacula-sd hang when running jobs; it happens often and at seemingly random times; and when this happens, I see two bacula processes, a parent and a child. (Since bacula is multi-threaded, I usually just see one storage daemon process.) This came to my attention when I came back from vacation to find a week's worth of backups stalled (sigh).

When bacula-sd hangs, the traceback of the relevant thread in the parent process's looks like this:

Thread 10 (Thread 0x466c0940 (LWP 12926)):
#0  0x00000035aa4c5f3b in read () from /lib64/libc.so.6
#1  0x00000035aa46cc07 in _IO_new_file_underflow (fp=<value optimized out>) at fileops.c:590
#2  0x00000035aa46d5ce in _IO_default_uflow (fp=<value optimized out>) at genops.c:435
#3  0x00000035aa468e8b in _IO_getc (fp=<value optimized out>) at getc.c:41
#4  0x00002b76479565c0 in bfgets (s=0x466bf710 "", size=<value optimized out>, fd=0x60080a0) at bsys.c:617
#5  0x000000000040d432 in release_device (dcr=0x60988b8) at acquire.c:533
[snip]

Here, bacula's storage daemon has just finished running a job, and before it releases the tape drive to someone else runs the "Alert" command. This is specified in the config file for the storage daemon, and is meant to see if the drive has, say, run out of magnetism during the last job. Here's the source code in stored/acquire.c:

  alert = get_pool_memory(PM_FNAME);
  alert = edit_device_codes(dcr, alert, dcr->device->alert_command, "");
  bpipe = open_bpipe(alert, 0, "r");
  if (bpipe) {
     while (fgets(line, sizeof(line), bpipe->rfd)) {  /* AardvarkNote: This is where the parent hangs */
        Jmsg(jcr, M_ALERT, 0, _("Alert: %s"), line);
     }
     status = close_bpipe(bpipe);
  }

Meanwhile, the child process stack looks like this:

Thread 1 (Thread 0x466c0940 (LWP 13000)):
#0  0x00000035aa4df9ee in __lll_lock_wait_private () from /lib64/libc.so.6
#1  0x00000035aa4d06a5 in _L_lock_1206 () from /lib64/libc.so.6
#2  0x00000035aa4d05be in closelog () at ../misc/syslog.c:419
#3  0x00002b764795cd35 in open_bpipe (prog=<value optimized out>, wait=0, mode=<value optimized out>) at bpipe.c:138
#4  0x000000000040d3f1 in release_device (dcr=0x60988b8) at acquire.c:531
[snip]

open_bpipe() can be found in lib/bpipe.c; it's a routine for forking a child process (FORESHADOWING: not another thread!) and sets up a pipe between parent and child. The relevant bits look like this:

/* Start worker process */
switch (bpipe->worker_pid = fork()) {
[snip]
case 0:                            /* child */
   if (mode_write) {
      close(writep[1]);
      dup2(writep[0], 0);          /* Dup our write to his stdin */
   }
   if (mode_read) {
      close(readp[0]);             /* Close unused child fds */
      dup2(readp[1], 1);           /* dup our read to his stdout */
      dup2(readp[1], 2);           /*   and his stderr */
   }
   /* AardvarkNote: This is where the child hangs: */
   closelog();                     /* close syslog if open */
   for (i=3; i<=32; i++) {         /* close any open file descriptors */
      close(i);
   }
   execvp(bargv[0], bargv);        /* call the program */

closelog() itself is simple: it "closes the current Syslog connection, if there is one." But running strace on the child process just shows a lot of futex calls...nothing very useful there at all. So what the hell is going on, and why it hanging at closelog()?

Some background info on threads: In Linux at least, they're user-land things, and the kernel doesn't know about them. To the kernel, it's just another process with a PID. Implementing threads is left as an exercise to the reader...or in this case, to glibc and NPTL.

Since threads are part of the same process, they share memory. (A fork(), by contrast, copies over parent memory to the child -- and then the child has its own copy of everything, separate from the parent.) glibc/NPTL implements locks for certain things to make sure that one thread doesn't stomp all over another thread's memory willy-nilly. And those locks are done with futexes, which are provided by the kernel...which explains why it would show up in strace, which tracks system calls.

Why is this relevant? Because in the glibc code, closelog() looks like this:

void
closelog ()
{
  /* Protect against multiple users and cancellation.  */
  __libc_cleanup_push (cancel_handler, NULL);
  __libc_lock_lock (syslog_lock);        /* AardvarkNote:  This is where things hang */

  closelog_internal ();
  LogTag = NULL;
  LogType = SOCK_DGRAM; /* this is the default */

  /* Free the lock.  */
  __libc_cleanup_pop (1);
}

That __libc_lock_lock (syslog_lock) call is there to prevent two threads trying to mess with the syslog file handle at one time. Sure enough, the info for frame #2 of the child process shows that the process is trying to get the syslog_lock mutex:

#2  0x00000035aa4d05be in closelog () at ../misc/syslog.c:419
419      __libc_lock_lock (syslog_lock);
ignore1 = <value optimized out>
ignore2 = <value optimized out>
ignore3 = <value optimized out>

As noted in the mailing list discussion, closelog() should be a noop if there's no descriptor open to syslog. However, in my case there is such a file descriptor, because I've got bacula-sd configured to log to syslog.

Well, as the Bible notes, mixing fork() and threading is problematic:

There are at least two serious problems with the semantics of fork() in a multi-threaded program. One problem has to do with state (for example, memory) covered by mutexes. Consider the case where one thread has a mutex locked and the state covered by that mutex is inconsistent while another thread calls fork(). In the child, the mutex is in the locked state (locked by a nonexistent thread and thus can never be unlocked). Having the child simply reinitialize the mutex is unsatisfactory since this approach does not resolve the question about how to correct or otherwise deal with the inconsistent state in the child.

And hey, doesn't that sound familiar?

Now that I had an idea of what was going on, I was able to find a number of similar problems that people have encountered:

So: what seems to be happening is that, when the stars align, the child process is being created at the same moment that the parent process is logging to syslog. The child is created with a locked syslog_lock mutex, but without the thread that had been holding it...and thus, without anything that can release it. The child blocks waiting for the mutex, the parent blocks on the child, and backup jobs halt (well, at least the spooling of jobs to tape) until I kill the child manually.

This was complicated to find for a number of reasons:

So what to do?

For now, I'm removing the log-to-syslog option from bacula-sd.conf. When I do that, I see no problems at all running jobs.

On the programming side -- and keep in mind I'm no programmer -- it looks like there are a number of options here:

  1. Don't call closelog() after fork() in open_bpipe(). This may leave an open descriptor to syslog, which may be a problem. (Or maybe it's just ugly. Don't know.)

  2. Don't fork() a child process when running the alert command, but start another thread instead. I have no idea why a fork() is preferred over a new thread in this scenario, but I presume there's a good reason.

  3. Use pthread_atfork() to set things up for a fork(). That's what The Bible says to do, but I don't know what Bacula would actually need to do with it in order to make this deadlock go away.

Good lord, I'm closing in on 1700 words here. If you've stuck with me this long, I hope it was interesting; once I got over the panic it was a fascinating problem, and it's taught me a lot about processes, threads and deadlocks. And how can that be a bad thing?

Tags: backups, debugging.
Signs o' the times
27th March 2010

Removing the bookmark that shows you how to crimp your own RJ45 cables.

No tags
PDF does what now?
31st March 2010

Holy crap. PDF supports multimedia, javascript, and launching arbitrary programs. Haven't checked the standard (warning: PDF) yet to see if they support the evil bit.

Tags: security.
Embedded embeddedness
31st March 2010

The bge driver for OpenBSD says that the Broadcom BCM5700 series of interfaces has two MIPS R4000 cpus. And you can run Linux on an R4000, or NetBSD.

Must...stop...recursion...

Tags: hardware, networking.
Wikipedia over DNS
31st March 2010

Okay, last micro-update today I swear: you can query Wikipedia over DNS. Here's the description, a short presentation on how it's done, and a bash function. Now to see if I can get something hacked up in Emacs.

Tags: dns, emacs.

RSS Feed