Rack tip

If you have space for two PDUs and you put one on each side of the rack, you will have no separate space for network cables and you'll get interference. If you put those two PDUs on one side of the rack, you'll put it on the wrong side and your power cords will interfere with your network cables. If you put those two PDUs on the correct side of the rack, you'll find that racking new items is a pain because the cords block the post holes on that side.

Tags: serverroom

Tour, FC

Gave a tour of the new server room today to about 30-odd people in the department. Ended on a bit of a low note ("and that's the end! Any questions?") but other than that it went well. Even got an ounce of champagne at the end of it.

Oh, and yesterday I found out that our SL-500 has three fibre channel interfaces, compared to the one interface in the server we bought. I think the sales folks assumed we had a fibre switch, and I didn't realize it all (data + control) wouldn't go over one cable. Arghh.

Just saw a character named Terence on "Entourage" who was not Terrance Stamp. Now I want to see "Bowfinger" and "The Limey", in that order.

Tags: hardware serverroom backups

New server room ours at last

Given the recent hoo-ha about abandoned blogs, and my own tendency to lose interest in writing about something the longer I put it off (I haven't graphed it, but I suspect it's a nice exponential decay), I figured I should finally write up what I've been doing the last week: the move at $WORK to our new server room.

So: construction finally got finished on our new server room. Our UPS was installed, our racks set up, and the keys handed over (though they were to be changed again twice). Our new netblock was assigned, the Internet access at the new location was in place, and movers were booked.

Things I did in advance which helped immensely:

  • Checklist in Org mode, plus printed copies; the ability to constantly edit a nice todo list, complete with checkboxes and statistics, was wonderful.
  • Printed copies of the spreadsheet showing rack assignment, cabling requirements, VLAN changes, etc
  • Tested new firewall with VMs (thus pointing out that "antispoof quick" is not a good thing to do with a bridging OpenBSD firewall)
  • Cardboard for the floor of the new server room to lay the servers on (since we weren't going to be able to rack the machines as quickly as they came from the movers)

Last Thursday morning, it all started. I got the machines shut down (thank you, SSH and ubiquitous wireless access at UBC) before the two volunteers who were helping me showed up. We started getting machines unracked; since it was only about 20 machines, I figured it wouldn't take too long. While that was true, I had not counted on the rat's nest of power cables (our power requirements were such that we had to connect machines to PDUs in adjacent racks), or the fact that we wouldn't be able to disassemble that 'til we'd got the machines out.

There was one heartstopping moment: a 1U server, while extended on its rails, came off one of the rails while no one was supporting it. Amazingly the other rail held on while it rotated quickly through 90 degrees to bang loudly against the rack. "You swear quickly," the movers remarked. (Doubly amazingly, the machine seems to be fine, though the rails for the thing are shot.)

The movers were big and burly, which was wonderful when it came to moving the Thumper. I weigh more than it does, but not by much, and I'd had the bad fortune to screw up my back a week before the move. It was tricky trying to figure out how to remove it from the rails, but the movers' trick of supporting it with a couple of big blankets, while fully extended from the rack, made such considerations less urgent. Eventually we got it figured out. I don't know how that could have gone smoother, since we'd got Sun to rack the thing and, frankly, it's not like you spend a lot of time un- and re-racking something like that. Anyhow, a minor point.

The new location was right around the corner, which was handy. The movers had put the servers in these big laundry-like carts on wheels; in the end, we only had four of em. We got the machines unloaded, racked the Thumper with the movers help, signed the paper, then went off for lunch where we picked up two more volunteers.

After that, we started racking servers. Having only one sysadmin around (me) proved to be a bottleneck; the volunteers had not worked with rackmounted machines before, and I kept having to stop what I was doing to explain something to them. It would have been a great help to have another admin around; in fact, I think this is the biggest move I'd want to make without some other admin around.

Problems we ran into:

  • Cage nut pullers are small and get lost easily. (Moral: designate one place for tools, just like it sez here)
  • Mounting brackets didn't work. One of 'em, I just figured out today, we had in backwards. The other wasn't threaded for the bolts from APC, and I had only the right bolts — no cage nuts to fit. (Moral: photograph the racks for anything non-standard; if you have to ask, it's non-standard)
  • One of the things we couldn't mount was a Very Important Disk Array. Fortunately it held a database which had been mirrored on another Very Important Disk Array, which also couldn't be mounted in its brackets. Instead, we used a rack shelf I happened to have around, and that worked well….but its advertised capacity wasn't enough to hold all four trays (2 trays per array), so we made do with one. (Moral: have a spare rack shelf or two on hand)
  • The bolts from APC had these enormous heads, which would end up impinging on the rack unit above/below. This got to be a pain. Only today did I discover that there were plenty of bolts and cage nuts provided by the contractor who installed the racks. (Moral: dress rehearsal includes putting cage nuts and bots in adjacent holes to see how they fit)
  • We had to re-hang the PDUs so they'd reach the power supplies. There were two in each rack, and both were on the right; the power supplies were all on the left, and I'd bought a bunch of 2' power cords to help with cable management. (Moral: Think about cable management for power, not just network)
  • Another thing about the PDUs: The outlets don't stretch throughout the length of the bar, but instead are clustered such that there's a dead space at the bottom/top 8" or so. The power cables had to be chained together sometimes to reach the extremes. (Moral: dress rehearsal includes plugging things in)
  • My plan to mount the switch in the middle of the rack with all the equipment has the advantages of shorter network cables (no running back to front, and no running top to bottom). But I should have noticed the middle empty spot in the PDUs adn mounted it there; as it is, there's a block of outlets in the PDUs I can't use because the power cables will get too close to the network cables. (Moral: think about cable management for network, not just power)
  • Underestimated the amount of time it'd take to get things racked. I suppose this can only be bettered with experience.
  • Underestimated the amount of time it'd take to get cables dressed; did not realize how important this was for working with things.
  • Did not bring warm shirt for when the cooling was turned on. Mistake!
  • Did not have lots of water on hand; did not figure out in advance where bathroom was (important in a building where you only have access to one room)
  • Really could have used a phone in advance in the room; cel coverage was spotty
  • Ratchet set very handy when tightening screws in awkward places (ie, behind power bar); last resort: hold bit in jaws of pliars/Leatherman. (Moral: dress rehearsal includes looking for tight corners and figuring out how you're going to work in them)
  • Preserve all bits and label them; carry masking tape/removeable labels and sharpies; label anything and everything you haven't already; use ziplock bags for stuff and tape them to the machines they're associated with
  • Firewall not modified to allow LDAPS to LDAP server from new netblock
  • Monitoring machine came up with no ethernet interfaces; modprobe tg3 gave "probe of 0000:04:04.0 failed with error -22". (Moral: figure out how you're going to get information off a machine with no network)
  • Anyone else notice that C13-C14 power cords are just plain wobbly in the PDU sockets? I had more than one pop out on me while moving cords around. (Moral: Andy Rooney lives!)
  • Coulda used more printouts of the rack assignments.
  • One cable was flaky: it worked for a while, then didn't. This was the cable that connected our firewall to the ILOMs for the servers, which meant I was unable to work from home on getting them up and running. This was probably for the best; I sorely underestimated just how wired I was when I went home. (Moral: you're more tired than you think)
  • One of the racks was designated as the networking rack; however, since we didn't have that many switches to mount, I figured I'd use it for other stuff too. This turned out not to work: the distance between the front and back rails had been shortened to make room for network cables, and that meant the rack rails for the equipment I wanted to mount didn't fit.

Things that went well:

  • Ripwrap is awesome. So are cordless drills that come with two batteries.
  • The rack rails from Sun that just clip in are also awesome. Man, that makes things fast.
  • There was good beer in the fridge when I got home. Thanks, Pre.
  • Frankly, all the prep meant that things went pretty well overall. This was good.

I'm going to post this now because if I don't, it'll never get done. I may come back and revise it later, but better this than nothing at all.

Tags: serverroom emacs work hardware

Squint

This has been one of those days where all I've done is stare at monitors too closely.

I know, I'm a sysadmin, what do I expect? But some days I get up, move around; I'm sedentary (and introverted) by nature but I try to talk to people, stare off into the distance, get away from my desk. Going to the server room is always a good break.

Not today, though. My carefully-chosen ATI video card (the Radeon 4550) is giving me headaches, metaphorical and real:

  • the proprietary fglrx drivers work if you want a cloned display, but enabling Xinerama makes X segfault
  • or, interestingly, the fglrx driver will show the desktop on one monitor, and an "uninitialized" (X checker pattern, chunky X cursor) screen on the other
  • the radeonhd drivers work perfectly for VGA out, but the DVI out is flickery and "noisy"

Dual monitors is important. My own damn fault for not getting something old enough...

Tags: work hardware linux

Bacula over TLS at last!

I'm testing Bacula 3; the new release has just come out, and I'm very much looking forward to rolling it out here.

One of the things I've been doing is trying to get TLS working, which I utterly failed at in my last job. I must've failed to see these pages, which a) point out that the otherwise-excellent Bacula manual is (ahem) sparing when it comes to TLS, and b) you need to put the cert files in places that strike me as unexpected.

Thus, in bacula-dir.conf you put the directives listing the director's cert/key in the client section — IOW, you say "and use this key/cert combo when connecting to client foo." Meanwhile, on client foo, you add the client's cert/key directives in the director section ("and use this key/cert when talking to the director"), along with things like the CA cert and required CNs.

Oh, and did you know that you can debug SSL handshakes with openssl? True story.

Tags: backups toptip

Signs of the times

I really dug Charlie Stross' Halting State (link throws the author a few shekels). But now he's declared it obsolete.

Tags: reading

Grey java windows fix for Awesome

Thanks to Undeadly and ossowicki for the pointer to wmname, which fixes the grey java windows problem when using Awesome or other tiling window managers. No more starting up Gnome or IceWM to use NetID or Strangebrew, hurrah!

Tags:

Happy Document Freedom Day!

Document Freedom Day 2009

Tags: politics

I had no idea...

...that TCP Offload Engines (TOE) were so detested by Linux kernel folks. The arguments here make interesting reading and seem convincing to me.

(From Andy Grover's blog.)

Tags: networking linux reading

Oh, joy

NetSNMP uses 32-bit counters for disk sizes. Guess what happens when you've got one of these?

Due to be fixed in the next release, so at least that's something.

Tags: networking hardware

Case study for a server room move

Actually for a whole office. Excellent reading. Wish I'd known about this at $JOB-2...

Tags: hardware

Rack design tools

With the move to the server room coming up in a couple months, I've been spending some time trying to lay out the racks we'll have there. My current layout is in an OpenOffice spreadsheet; I thought I'd try some other tools and see how they shape up.

  • APC Configurator - Windows only. I did try this a while ago and found it wasn't bad, but no power calculations — one of the reasons I went to a spreadsheet in the first place.
  • AusrackID - flash based, so works on a Firefox on Linux. Very nice, but a limited number of hardware choices — so there are Apple Xserves but no Thumpers. There's no way to add hardware and no real generic choices ("4U server", "12-disk 8U array", etc). Also no power calculations.
  • RackTables - GPL'd LAMP app, so dead easy to install. Not bad at all, but it's an early version (0.16.6) so the interface is a bit clumsy interface and there are lots of features planned RSN. Aims to be a cross between a server room planner and an asset tracker, so that might not fit in with my planned use of GLPI. No power caculations, though there is a request to add SNMP monitoring of APC PDUs.

Still sticking with a spreadsheet for now; it's not the best, but it is flexible and quick. Any other tools I missed?

Tags: hardware

Cable organization porn

We've got a new server room being built right now; it should be done in about six weeks, so I'm putting together an order for bits and pieces that I'll need.

I've mentioned before that cable management is one thing I get obsessed about, so this site is like porn for me. I'm not shilling for them; haven't ordered from them, no idea if they kill puppies in their spare time or what, but holy CRAP this is all the stuff I've ever wanted: RipWrap (so that's what it's called!), label printers, 87 varieties of zap straps, and I don't know what all.

Wow. Just wow.

Edit: Okay, seriously. There's some really good stuff in here among the advertisements.

Tags: hardware

FSF putting together command-line intro for newbies

I just noticed that the Free Software Foundation is putting together what they call a "book sprint" — kind of like the 3-day novel writing contest — to write an intro to the command line for newbies. They're hoping to get it done by next Monday (!).

I like the idea of this project a lot; if I can get some spare time this weekend, I'll definitely be dropping by.

Tags: reading

Laptop suspend mode

Okay, I feel like a bit of a tool for never realizing how cool suspend-to-ram is in a laptop. My new laptop for work is a Dell D630, which I'd got 'cos its hardware is pretty much completely compatable w/Linux. However, I've also figured out that a) Ubuntu does suspend-to-ram quite nicely (aside from a couple times when the keyboard doesn't work, but closing/reopening the lid makes it work), and b) it just sips — sips, I tell you! — from the battery.

Now to try and make it work on my own laptop, which is currently sitting at the shop waiting for me to pick it up.

Today's agenda:

  • Install new 48-port switch in server room
  • Update Fedora Directory Server wiki page on building RPMs for/on CentOS
  • Set up mail server to accept mail for older, semi-deprecated domain
  • Drink coffee, catch up on sleep

See? I am still a sysadmin.

Tags: linux hardware networking ldap

Environmental leakage

Just spent an hour trying to debug why a simple Nagios check script was not working. It basically ran lynx -dump | grep desired string, but for some reason was utterly failing to work.

Eventually I thought to get the script to print out its environment. It turned out that my own environment variables had leaked to the nagios program itself; as a result, lynx was trying (and failing) to open /home/hugh. /etc/init.d/nagios did not (properly? perhaps) clean the environment as I assumed it had. I changed my Makefile to run env -i /etc/init.d/nagios restart, and now it works just fine.

(Incidentally, I love Makefiles as a way of scripting stuff you run over and over and over again. Yeah, they're clumsy and I'm not doing anything I couldn't do with a simple script -- but it's a timesaver to just run "make" and be done with it.)

Tags:

git push and the reasons not to do it

After this entry about the difference between push and pull for Mercurial, and how that doesn't fit with the way I instinctively want to use a repository, it's interesting to read Ted Tso responding to a similar complaint from a git user. Tso explains the discrepancy well:

Part of the problem here is that for most git workflows, most people
don't actually use "git push". ....in most large projects, the number
of people [who] need to use the "scm push" command is a very small
percentage of the developer population, just as very few developers
have commit privileges...

Ah, but in a distributed SCM world, things are more
democratic....While this is true, the number of people who need to be
able to publish their own branch is small....

There is one exception to this, of course, and this is a developer
who wants to get started using git for a new project which he or she
is starting and is the author/maintainer, or someone who is
interested in converting their project to git.

The whole entry, plus the comments, are worth reading.

Tags: revisioncontrol

OpenBSD needs donations

As mentioned on Undeadly.org and openbsd-misc, OpenBSD is asking for donations for BGP routers and a new CVS server. I've donated, since I wouldn't be able to do half my job without them; if you feel the same and can spare some money, I urge you to do the same.

Tags: bsd wontyoupleaselendahand

Cooling

Last week was reading week here at UBC. Monday I was off sick. Tuesday we got an email from the folks at the building where we've got guest access to one of their server rooms: the cooling was being shut down from 7am on Wednesday to 3pm on Thursday, so we'd have to turn off our servers. We're guests, so it's not like we've got a lot of say in the matter.

Natch, Thursday 3pm came and went. We got an email at 3:45pm from a manager there, saying that unexpected problems had arisen; they were hoping to have things back up by the weekend. That night I pointed our website at a backup server; it was not serving my boss' big web app, as there was no way to make that tiny little box serve a nearly 1TB database.

Friday I obsessed over the ambient temperature on our firewall (which I'd left turned on); it hovered around 35C. Around 10am we were told that they were hoping to have it on later that day, but that another shutdown might need to be scheduled for the next week (this week). At noon we were told that things were looking hopeful, but they couldn't guarantee cooling over the weekend.

At 2pm I found a local A/C rental agency who told us they'd be out to look at the room on Monday. 4pm I emailed my contact at the other department, plus his manager, to ask for updates and whether any further shutdowns could be scheduled after we'd arranged for cooling.

Over the weekend I obsessed over the temperature some more; it had dropped to 21C and stayed there, but without feedback from the facilities people I was reluctant to trust it.

Monday (yesterday; wow, time flies) we were told that the cooling system should perform well; however, a part still needed to be replaced. It was on order and would be coming in late this week or early next, and would require a four-hour outage.

This morning the cooling guy visited (he was at a funeral yesterday, so fair enough) and said that, yep, we could get a nice portable unit in for around $400 for a week.

I'm not writing this down because I'm proud of how I handled this. I'm writing this down so that someone else can maybe learn the things I should've known:

  • If the cooling is going to be down, arrange for backup. This can be cheap if it's a small room, and it's a hell of a lot nicer than being at other people's mercy.
  • Outage times are estimates, and you should treat them as such.
  • 4pm on a Friday afternoon is not the time to bring up questions that should have been raised on Tuesday.

I have a habit of thinking "There's not much that can be done about that." Actually, it goes even further than that; it doesn't occur to me sometimes to think about what can be done. I'm not sure if this is lack of confidence, or trying too hard to get along, or just sheer laziness, but I'm trying hard to stop doing that.

Tags: hardware warstory

Mercurial for dotfiles

Nicks' post on customizing your home was interesting. Over the last year or so, I've been slowly improving the way I do this. My results have been mixed, probably because of the way I use Mercurial.

So I've got a repo to keep my dotfiles. There's a truly awful script that will symlink the real files to the repo, and doesn't clobber the originals more than one time out of three. I clone to work, or to a laptop, and start customizing. Overall, I feel like this should work…but it's decidely awkward.

Let's take the case of bash init files. I've got mine divided into .bashrc and .bashrc_local. The latter, as you'd expect, is machine/situation-specific — ssh aliases, commands for work, etc. .bashrc sets various aliases and functions that are unlikely to change. Just before exporting all the environment variables, .bashrc_local is sourced, which gives me a chance to override anything.

.bashrc should be in the repo — no question about that. But .bashrc_local should be there too, since I may clone my repo at work (say) to another filesystem. Since Mercurial is distributed, there's no problem with this — except when it comes to merging things back home. Since I think about home as The One True Repo, I want to keep everything there. But usually I've run hg push ssh://home, which promptly clobbers .bashrc_local there (at least when I do an hg update. Or if I merge from home, I end up creating new heads in my repo, and a multi-headed repo can't be pushed. (I'm fuzzy on the details; usually when this happens I bang away at it randomly until merges happen, and swear until I'm blind.)

As outlined here, the difficulty is probably in the way I use Mercurial and the way I've become used to SVN's (and CVS's) idea of branches that look like directories (and are thus very, very visible and easy for me to think about). xyld says, "I'm fed up with having to do hg merge and not actually merge anything, but just to satisfy the Mercurial internals." That's pretty much how I'm starting to feel. There's the option of doing pull, rather than push, to cherrypick the changes I want, but it's still a bit awkward for me to think about.

I understand SVN; it fits well with my brain, which is not a developer's. I understand hg, and I like the idea of distributed repos for certain things. But xyld's comments about switching to git resonate with me, and I may start trying that out.

Tags: revisioncontrol