Where'd that bridge go? Redux
28 Oct 2009So this morning, again, I got paged about machines in our server room dropping off the network. And again, it was the bridge that was the problem. This time, though, I think I've figured out what the problem is.
The firewall has two interfaces, em0
(on the outside) and em1
(on
the inside) , which are bridged. em1
has an IP address. I was able
to SSH to the machine from the outside and poke around a bit. I still
didn't find anything in the logs, but I did notice this (edited for brevity):
$ ifconfig
em0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> mtu 9000
lladdr 00:15:17:ab:cd:ef
media: Ethernet autoselect (1000baseT full-duplex)
status: active
inet6 fe80::215:17ff:feab:cdef%em0 prefixlen 64 scopeid 0x1
em1: flags=8d43<UP,BROADCAST,RUNNING,PROMISC,OACTIVE,SIMPLEX,MULTICAST> mtu 9000
lladdr 00:15:17:ab:cd:ee:
groups: egress
media: Ethernet autoselect (1000baseT full-duplex)
status: active
inet 10.0.0.1 netmask 0xffffff80 broadcast 10.0.0.1
inet6 fe80::215:17ff:feab:cdee%em1 prefixlen 64 scopeid 0x2
See that? em1
has OACTIVE
set. A quick search turned up
some interesting hits, so for fun I tried resetting the
interface:
$ sudo ifconfig em1 down
$ sudo ifconfig em1 up
and huzzah! it worked.
When I got to work I did some more digging and figured out that this
and the earlier outage were almost certainly caused by running
a full backup, via Bacula, of the /home
partition on the machine.
The timing was just about exact. The weird thing, though, is that
the partition itself is smaller than var
, which was backed up
successfully both times:
$ df -hl
Filesystem Size Used Avail Capacity Mounted on
/dev/sd0a 509M 42.4M 442M 9% /
/dev/sd0g 106G 11.4G 89.1G 11% /home
/dev/sd0d 3.9G 6.0K 3.7G 0% /tmp
/dev/sd0f 15.7G 2.4G 12.5G 16% /usr
/dev/sd0e 15.7G 13.6G 1.4G 91% /var
The bacula file daemon logged this on the firewall:
Oct 28 02:46:15 bacula-fd: backup-fd JobId 3761: Fatal error: backup.c:892 Network send error to SD. ERR=Broken pipe
Oct 28 02:46:15 bacula-fd: backup-fd JobId 3761: Error: bsock.c:306 Write error sending 36841 bytes to Storage daemon:backup.example.com:9103: ERR=Broken pipe
With the earlier outage it was 65536 bytes, but otherwise the same error.
Okay, so the firewall's working again...now what? I'm about to head off to LISA in three days, so I can't very well upgrade to the latest OpenBSD right now. I settled for:
- turning off full backups on the firewall (everything important is kept in Subversion anyhow), and
- running a script from cron every 10 minutes that checks for the
OACTIVE
flag and, if found, resets the interface.
Hopefully that'll keep things going 'til I get back.
4 Comments
From: Matt Simmons
29 October 2009 01:37:50
Are you going to be at my Blogger BoF? I can't wait to meet everyone finally :-)
From: Saint Aardvark
29 October 2009 12:37:39
You bet...see you there!
From: Stephen P. Schaefer
29 October 2009 16:12:05
You were ssh'd in from the outside - on a third interface? That's what I infer from the fact that the system didn't become inaccessible after "sudo ifconfig em1 down", which would appear to drop the only IP address you mention. Or is the address not actually associated with a particular device in this config?
From: Saint Aardvark the Carpeted
29 October 2009 17:34:52
Actually, I was connected over a serial port, which was why I could do that sort of thing without losing access.
Add a comment:
Name and email required; email is not displayed.
Related Posts
QRP weekend 08 Oct 2018
Open Source Cubesat Workshop 2018 03 Oct 2018
mpd crash? try removing files in /var/lib/mpd/ 11 Aug 2018