Posts tagged “openbsd”

November 02, 2009 True Dreams of Wichita
Monday morning:
```
I've seen the rains of the real world come forward on the plains
I've seen the Kansas of your sweet little myth...
I'm half-drunk on babble you transmit
Through your true dreams of Wichita.

"True Dreams of Wichita", Soul Coughing
```
This morning I had the SELinux tutorial, held by Rik Farrow. I took a moment to shake hands with Rik Farrow, who's teaching this class, and tell him that ;login: magazine, like, changed my life, man, you know?. If you haven't picked up copies of that magazine/journal, you owe it to yourself to do so. (And if you have and you agree with me, send him an email -- he usually only gets email as editor when there's a problem.)

Matt was there, as was Jay, who I met back in 2006.

The course was quite interesting. Some choice bits:
- "How many of you are using SELinux?" (Two hands) "How many of you have disabled SELinux?" (a hundred hands and six tentacles; yes, even Cthulhu disables SELinux) "See, that's why I came up with this course; I kept seeing instructions that started with 'Disable SELinux' and I wanted to know why."
- Telling Matt about Jay's firewall testing script.
```
  Me:  So how to the big guys test their firewall changes?
  Matt:  I dunno...probably separate routers, duplicate hardware...
  Me:  Probably golden coffee cup holders, too.
  Matt:  Jerks.
```
- You don't write SELinux policy. SELinux policy is hard. It's NP-complete and makes baby Knuth cry. Instead, you use what other people have written, and make use of booleans to toggle different bits of policy.
- However, the size of the SELinux policy is big and is only getting bigger. There are something like 85,000 or more rules in recent versions of RHEL/CentOS. This is very close to RF's rule of thumb that a really, really smart and experienced person, who's been intimately involved in its creation, can only comprehend about 100,000 lines of code. This worries him.
- Also, the problem of using SELinux is complicated by a lack of up-to-date documentation; like everything else it's a fast-moving target, and a book published in 2007 is now half out-of-date.
- But this should not stop you from using SELinux now,; it's handy, it's here, get used to it. Example of SELinux stopping ntpd from running /bin/bash; the SELinux audit file was the only sign.
- "In a multi-level secure system, files tend to migrate to higher security levels, and the system becomes less unusable. But that's beyond the scope of this class."
- (On programs with long histories of serious security problems) "Flash is the Sendmail of -- what do we call this decade? the naughts?"
- (On the difficulty of trying to decode SELinux audit logs) "It says the program 'local' had a problem. 'Local'. What the heck is that? Part of Postfix. Oh, good. Thanks for the descriptive name, Wietse."
- Something I hope to quiz him further on: "Most Linux systems have a single filesystem." Really?
During the break I met a guy who works with the Norwegian Meteorological service. This was interesting. He's got 250TB in production right now, and increasing CPU power means that their models can increase their spatial resolution, which means increasing (doubling?) their storage requirements. He talked briefly about running into problems with islands of storage, but I got distracted before I could quiz him further...

...by his story of building a new server room where they were capturing the waste heat and using it to heat the building. Interesting; what kind of contribution would it be making to the overall heating budget? Probably not much, but it all just goes on the grid anyhow, like the hot water from the garbage dump. What?

Turns out that there is a city-wide network of hot-water pipes that collects heat from, among other places, water heaters powered by waste methane from rotting garbage. So they don't use the methane to make electricity and dump it in the electrical grid; they use it to heat hot water and dump that in the hot water grid, consisting of insulated water pipes buried in the ground, which places around the city (and beyond!) will use. We've got what you could call a steam grid at UBC and probably other universities, but I'd never thought of doing this city-wide.

Oh, and he signed my LISA card, which was the second time he got asked today; he was wearing a LISA t-shirt and so he was fair game.

At lunch I buttonholed Jay a bit. I asked him about his coworker's firewall unit testing scheme. He said he's no longer working at that place, but it ended up being a lot less useful than they thought it would be. When I asked why, he said that 90% worked but 10% didn't; that 10% was things like network isolation (to avoid problems with using real IP addresses), and the fact that the interface to the three machines was QEMU serial connections...less than ideal.

The conversation shifted to firewalling, and another guy who was there mentioned that he loved OpenBSD's pf, but had to use iptables because of driver problems that prevented getting full performance out of 10GigE NICs with OpenBSD. Jay said they'd looked at the same problem at his place o' work, and in his words "It was cheaper to throw 8 GigE NICs in a box and pay someone to make Linux interface bonding not suck."
October 30, 2009 There it was, gone
Following in Matt's footsteps, I ran into a serious problem just before heading to LISA.

Wednesday afternoon, I'm showing my (sort of) backup how to connect to the console server. Since we're already on the firewall, I get him to SSH to it from there, I show him how to connect to a serial port, and we move on.

About an hour later, I get paged about problems with the database server: SSH and SNMP aren't responding. I try to log in, and sure enough it hangs. I connect to its console and log in as root; it works instantly. Uhoh, I smell LDAP problems...only there's nothing in the logs, and id <uid> works fine. I flip to another terminal and try SSHing to another machine, and that doesn't work either. But already-existing sessions work fine until I try to run sudo or do ls -l. So yeah, that's LDAP.

I try connecting via openssl to the LDAP server (stick alias telnets='openssl s_client -connect' in your .bashrc today!) and get this:
```
CONNECTED(00000003)
```
...and that's all. Wha? I tried connecting to it from the other LDAP server and got the usual (certificate, certificate chain, cipher, driver's license, note from mom, etc). Now that's just weird.

After a long and fruitless hour trying to figure out if the LDAP server had suddenly decided that SSL was for suckers and chumps, I finally thought to run tcpdump on the client, the LDAP server and the firewall (which sits between the two). And there it was, plain as day:
- 3-way handshake
- client says "I speak SSL!"
- server says "I speak SSL too! Here you go!"
- but the client never sees that packet
- and neither does the firewall.
Near as I can figure, this was the sequence of events:
- We SSH'd from the firewall, with its two bridged Intel GigE jumbo-enabled NICs
- to the console server, which only does 10/100
- which somehow prompted a renegotiation of the link speed on the firewall's interface
- which settled on 100 MBit, full duplex, but with jumbo frames
- which the switch saw as completely bogus
- which prompted the switch to (silently, natch) drop all jumbo frames directed at the firewall's outside interface
- which, in the context of an LDAP lookup done by a client inside the firewall, meant that the first packet that failed was the "I speak SSL too! Here you go!" packet
- which left the client with an established TCP connection to the LDAP server, waiting for a certificate
- which meant that it never actually failed over to the other LDAP server.
This took me two hours to figure out, and another 90 minutes to fix; setting the link speed manually on the firewall just convinced the nic/driver/kernel that there was no carrier there. In the end the combination that worked was telling the switch it was a gigabit port, but letting it negotiate duplexiciousnessity.

Gah. Just gah.
October 28, 2009 Where'd that bridge go? Redux
So this morning, again, I got paged about machines in our server room dropping off the network. And again, it was the bridge that was the problem. This time, though, I think I've figured out what the problem is.

The firewall has two interfaces, em0 (on the outside) and em1 (on the inside) , which are bridged. em1 has an IP address. I was able to SSH to the machine from the outside and poke around a bit. I still didn't find anything in the logs, but I did notice this (edited for brevity):
```
$ ifconfig
em0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> mtu 9000
```
```
    lladdr 00:15:17:ab:cd:ef
    media: Ethernet autoselect (1000baseT full-duplex)
    status: active
    inet6 fe80::215:17ff:feab:cdef%em0 prefixlen 64 scopeid 0x1
```
```
em1: flags=8d43<UP,BROADCAST,RUNNING,PROMISC,OACTIVE,SIMPLEX,MULTICAST> mtu 9000
```
```
    lladdr 00:15:17:ab:cd:ee:
    groups: egress
    media: Ethernet autoselect (1000baseT full-duplex)
    status: active
    inet 10.0.0.1 netmask 0xffffff80 broadcast 10.0.0.1
    inet6 fe80::215:17ff:feab:cdee%em1 prefixlen 64 scopeid 0x2
```
See that? em1 has OACTIVE set. A quick search turned up some interesting hits, so for fun I tried resetting the interface:
```
$ sudo ifconfig em1 down
$ sudo ifconfig em1 up
```
and huzzah! it worked.

When I got to work I did some more digging and figured out that this and the earlier outage were almost certainly caused by running a full backup, via Bacula, of the /home partition on the machine. The timing was just about exact. The weird thing, though, is that the partition itself is smaller than var, which was backed up successfully both times:
```
$ df -hl
Filesystem     Size    Used   Avail Capacity  Mounted on
/dev/sd0a      509M   42.4M    442M     9%    /
/dev/sd0g      106G   11.4G   89.1G    11%    /home
/dev/sd0d      3.9G    6.0K    3.7G     0%    /tmp
/dev/sd0f     15.7G    2.4G   12.5G    16%    /usr
/dev/sd0e     15.7G   13.6G    1.4G    91%    /var
```
The bacula file daemon logged this on the firewall:
```
Oct 28 02:46:15 bacula-fd: backup-fd JobId 3761: Fatal error: backup.c:892 Network send error to SD. ERR=Broken pipe
Oct 28 02:46:15 bacula-fd: backup-fd JobId 3761: Error: bsock.c:306 Write error sending 36841 bytes to Storage daemon:backup.example.com:9103: ERR=Broken pipe
```
With the earlier outage it was 65536 bytes, but otherwise the same error.

Okay, so the firewall's working again...now what? I'm about to head off to LISA in three days, so I can't very well upgrade to the latest OpenBSD right now. I settled for:
- turning off full backups on the firewall (everything important is kept in Subversion anyhow), and
- running a script from cron every 10 minutes that checks for the OACTIVE flag and, if found, resets the interface.
Hopefully that'll keep things going 'til I get back.
October 05, 2009 Where'd that bridge go?
Yesterday I got paged by one of my two Nagios boxes (learned that trick the hard way): a bunch of the machines in our server room had dropped off the network. Weirdly, this did not include the other Nagios box that's over there. WTF?

I logged into the server room's Nagios box, and sure enough couldn't ping the servers or the firewall. I could ping the console server...which was also on the Outside VLAN along with the monitoring box, as opposed to the Inside VLAN with the servers, which sat behind our firewall.

I was also able to ping the management cards/ILOMs/SPs/whatever the kids are calling them in the servers. Thankfully they're Sun boxes, so no Vista-like maze of flavours there...they all come with console redirection. I logged in and fired up a console, panicing because I thought that perhaps the newly-installed NUT clients had shut down the machines because I'd overlooked something.

But no...the machines were up, though hung if you tried to do any LDAP lookups. (Through an oversight, the LDAP server was also on the Outside VLAN. I'll be fixing that today.) Modulo that, they seemed fine.

So I logged into the firewall, which runs OpenBSD 4.3 in bridging mode. And this is where the weirdness lay: the bridge, and/or its component cards, was not working. ifconfig and brconfig said they were up and fine, and the ARP table was still populated (not sure what the lifetime of entries is -- isn't it around 20 minutes or so? must check -- but by this time the problem had been going on for about an hour). Yet I couldn't ping the firewall (one of those cards has an address) from either side, and I couldn't ping anything from the firewall.

pfctl -s all didn't show anything suspicious. There were no obvious problems in dmesg or /var/log/messages. I disabled, then re-enabled, the firewall to no effect. I ran /etc/netstart to no effect.

I even checked on the switches to see if the firewall's MAC address was showing up anywhere, and it was not -- not even directly after pinging it (and getting no response).

In the end I rebooted the machine and all was well.

The NIC in question is a dual-port Intel Pro 1000 (MT, I believe) that I've never had problems with. I've never come across problems like this before on OpenBSD (or, I think, anywhere else). The onboard Broadcom (boo, hiss) was acting fine...it was also on the ILOM's VLAN, and could see the other ILOMs just fine. (In fact, I should have just SSHd to the firewall using that VLAN from the Nagios box, rather than futz around with a 9600 bps console. Next time.)

So...that's my mystery for the weekend.

In other news, my older son (3.25 yrs) has taken to the stage in a big way: he now stands on top of the steps going up from our living room and sings us songs into one of at least two microphones. "Barbara Ann", anything by The Wiggles, and "Yo Gabba Gabba!" songs are prominent. This is after at least three solid weeks of guitar playing, where anything and everything gets strummed while being cradled in his arms while he sings, or maybe makes feedback sounds that'd make Yo La Tengo proud.

Meanwhile, my younger (1.5 yrs) has started saying lots of different phonemes, which is a real contrast to using "Dat!" for monkey, cereal, ball, yes, no, President Barack Obama's attempted health care reforms, and Linux. He has also begun sleeping in 'til 6:30 or 7:00 in the morning, which lets me write things like this. Both are infinitely endearing.

And incidentally, I really need to set up Nagios dependencies. I've had to ACK 27 services in a row (unrelated (I think) problem with ILOM temperature taking means SNMP checks are timing out). Either that or there's some way that you can select n services in Nagios to ack all at once. Anyone?
September 09, 2009 OpenBSD needs help
I just saw on Undeadly.org that orders for OpenBSD CDs are 'way down this year. Without OpenSSH and pf, I wouldn't be able to do my job nearly as well as I do. I've ordered a set for work (good excuse to upgrade that firewall), and ordered a set for home and tossed 'em $50 as well. I encourage you to do the same.

In the words of the original rant:

Do you use OpenBSD for fun? Contribute. Do you use OpenBSD for work? Contribute. Does OpenBSD allow you to worry about the problem you are trying to solve rather rather than the tools? Contribute. Do you wish your employer used the OpenBSD quality standard in your work? Contribute. Does your employer use OpenBSD? Ask them to contribute (after you do, of course). Do you bundle OpenBSD or subprojects like OpenSSH into your product? Contribute big! (you won't, you rarely do, but hey, I'll ask anyway) Do you find yourself wondering why so few take computer software quality seriously? Contribute!

January 09, 2007 What have I got myself into?

 Do you have any idea how fucking insane the h.323 protocol is?  Anyone
 who runs a h.323 should get shoved out a window, beaten, flayed,
 spanked, shot, disembowled, hung, and forced to listen to hummpa music.  If
 you  want to firewall h.323, go commit yourself to an asylum with
 straight jackets and with padded walls -- at least you'll be in common
 company with the other linux wacko's.

January 05, 2005 Holy crap, pf rocks
Sat down tonight to create a firewall for a new OpenBSD web server I'm setting up, and holy crap is pf ever good. I got to test the firewall syntax before loading it, and as a result I had a working firewall the first fucking time I loaded it. That's never happened before; I full expected that this time, as every other time with a new firewall (let alone a new firewall language!), I'd have to reboot or log in with a keyboard or serial cable, or something.

But no: not only did I not lock myself out, not only was this the first time (well, nearly) that I'd read the FAQ, the firewall does everything I wanted it to: no extra packets in, no extra packets out. Wow.

Alioth was right: pf just rocks.