The Life of a Sysadmin

Carousel is a lie!

Entries tagged "debugging".

PHP debugging -- there has to be a better way
Fri Jan 5 11:54:45 PST 2007

I have just spent two and a half hours trying to track down the reason a page on a Joomla site suddenly started saying "You are not authorized to view this resource". In the end it turned out to be a known problem with the OpenSEF plugin, but it took me a stupidly long time to even guess that might be the problem. (Probably shoulda searched for the error message first…)

There has to be a better way to do this. The only way I could figure to trace the problem was by sprinkling lots of print "FIXME: Made it here\n"; throughout the code. I know, it's a terrible way of doing it, but running php index.php didn't seem to work — I couldn't get the code to see the arguments I was trying to pass on. What am I missing?

Tags: debugging.
Jumbo frames again
Wed Feb 3 11:27:19 PST 2010

Arghh...I just spent 24 hours trying to figure out why shadow migration was causing our new 7310 to hang. The answer? Because jumbo frames were not enabled on the switch the 7310 was on, and they were on the machine we're migrating from. Arghh, I say!

1 comments. Tags: debugging, jumboframes, networking.
Fishworks and LDAP
Tue Feb 16 16:02:18 PST 2010

Remember: when adding access to your Fishworks/Unified Storage System 7310, LDAP entries must include objectClass: shadowAccount. That took me a while to track down.

Tags: debugging.
It's a race to the finish
Thu Feb 25 10:04:52 PST 2010

I mentioned that I've been having problems with Bacula recently. These have been aggravated by the fact that the trigger seems to be a job that takes 53 hours to finish.

Well, I think I've got a handle on one part of the problem. See, when Bacula is doing this big job, other jobs stack up behind it -- despite having two tape drives, and two separate pools of tapes, and concurrent jobs set up, the daily jobs don't finish. The director says this:

9279 Full    BackupCatalog.2010-02-20_21.10.00_10 is waiting for higher priority jobs to finish
9496 Full    BackupCatalog.2010-02-23_21.10.00_13 is waiting execution
9498 Full    bigass_server-d_drive.2010-02-24_03.05.01_15 is running
9520 Increme  little_server-var.2010-02-24_21.05.00_38 is waiting on Storage tape
9521 Increme  little_server-opt.2010-02-24_21.05.00_39 is waiting on max Storage jobs

but storage says this:

Running Jobs:
Writing: Full Backup job bigass_server-d_drive JobId=9498
Volume="000031"
pool="Monthly" device="Drive-0" (/dev/nst1)
spooling=1 despooling=0 despool_wait=0
Files=708,555 Bytes=1,052,080,331,191 Bytes/sec=11,195,559
FDReadSeqNo=22,294,829 in_msg=20170567 out_msg=5 fd=16
Writing: Incremental Backup job little_server-var JobId=9508 Volume="000017"
pool="Daily" device="Drive-1" (/dev/nst0)
spooling=0 despooling=0 despool_wait=1
Files=156 Bytes=3,403,527,093 Bytes/sec=72,415
FDReadSeqNo=53,041 in_msg=52667 out_msg=9 fd=9
Writing: Incremental Backup job little_server-etc JobId=9519 Volume="000017"
pool="Daily" device="Drive-1" (/dev/nst0)
spooling=0 despooling=0 despool_wait=0
Files=9 Bytes=183,606 Bytes/sec=3
FDReadSeqNo=72 in_msg=50 out_msg=9 fd=10
Writing: Incremental Backup job other_little_server-etc JobId=9522 Volume="000017"
pool="Daily" device="Drive-1" (/dev/nst0)
spooling=0 despooling=0 despool_wait=1
Files=5 Bytes=182,029 Bytes/sec=3
FDReadSeqNo=45 in_msg=32 out_msg=9 fd=19
Writing: Incremental Backup job other_little_server-var JobId=9525 Volume="000017"
pool="Daily" device="Drive-1" (/dev/nst0)
spooling=0 despooling=0 despool_wait=0
Files=0 Bytes=0 Bytes/sec=0
FDSocket closed

Out of desperation I tried running "unmount" for the drive holding the daily tape, thinking that might reset things somehow...but the console just sat there, and never returned a prompt or an error message. Meanwhile, storage was logging this:

cbs-01-sd: dircmd.c:218-0 <dird: unmount SL-500 drive=1
cbs-01-sd: dircmd.c:232-0 Do command: unmount
cbs-01-sd: dircmd.c:596-0 Try changer device Drive-0
cbs-01-sd: dircmd.c:617-0 Device SL-500 drive wrong: want=1 got=0 skipping
cbs-01-sd: dircmd.c:596-0 Try changer device Drive-1
cbs-01-sd: dircmd.c:612-0 Found changer device Drive-1
cbs-01-sd: dircmd.c:625-0 Found device Drive-1
cbs-01-sd: block.c:133-0 Returning new block=39cee10
cbs-01-sd: acquire.c:647-0 JobId=0 enter attach_dcr_to_dev

...and then just hung there. "Aha, race condition!" I thought, and sure enough a bit of searching found this commit in November: "Fix SD DCR race condition that causes seg faults". No, I don't have a segfault, but the commit touches the last routine I see logged (along with a buncha others).

This commit is in the 5.0.1 release; I wasn't planning to upgrade to this just yet, but I think I may have to. But I'm going on vacation week after next, and I'm reluctant to do this right before I'm away for a week. What to do, what to do...

1 comments. Tags: backups, bug, debugging, upgrades.
Late night
Thu Feb 25 21:29:25 PST 2010

Ugh, late night...which these days, means anything past 9:30pm. Two machines down at work with what I think are unrelated problems.

First one appears to have had OOM-killer run repeatedly and leave the ethernet driver in a bad state; I know, but the OOM-killer kept killing things until we got this bug.

Second one appears to have crashed and/or rebooted, but the hardware clock got reset to December 2001 in the process -- which meant that when it tried to contact the LDAP servers, none of their certificates were valid yet.

Again, ugh. But I did come across this helpful addition to my toolkit:

 openssl s_client -CAfile /path/to/CA_cert -connect host:port

which, I just realized, I've rediscovered, along with having the same fucking problem again.

And did I mention I'm up at 5am tomorrow to move some equipment around at work? Ah well, I have safety boots now. I'll be suitably rewarded in Valhalla.

2 comments. Tags: debugging.

RSS Feed