The Life of a Sysadmin

Carousel is a lie!

Entries tagged "work".

Sigh
12 February 2003 12:00:00 PST

Maybe someone else can use this:

http://www.redhat.com/about/careers/raleigh/index.html#raleigh2

Original entry

Tags: work.
Holy crap, I made the Globe and Mail!
2006-06-12 08:20:56

Tags: meta, work.
Third install!
2006-06-14 05:40:42

In preparation for my new job, I've installed OpenSolaris on Pouxie, my wife's old desktop machine (a nice 2GHz Athlon). I've used Belenix, a live CD that includes a driver for Pouxie's onboard NForce ethernet interface.

So far I'm having a lot of fun. It took me three hours (spread over four days...damn this commute) to get a static IP address assigned to the thing, and then to get DNS working. But after a reinstall (a newer version of Belenix had come out that included the Sun packaging tools, which should let me use Blastwave to grab Emacs...a good first project, I think), I had it up and running in just a few minutes. Progress!

For those playing the home game, here's what I had to do:

  1. modinfo | grep nfo: yep, the module has been loaded.
  2. ifconfig -a | grep nfo0: Not there.
  3. dladm show-link: But it is here.
  4. echo "192.168.23.40 pouxie-2" >> /etc/inet/hosts
  5. echo "pouxie-2" > /etc/hostname.nfo0 ; echo "netmask 255.255.255.0" >> /etc/hostname.nfo0
  6. echo "192.168.23.254" > /etc/defaultrouter
  7. reboot -- -r: to get Solaris to find the new interface (?)
  8. ifconfig -a: Now it shows up configured.
  9. svcadm --disable svc:/network/inetmenu: Otherwise, it interferes with the change to nsswitch.conf I'm going to do up ahead.
  10. svcadm --enable svc:/network/dns/client: I long to know what this actually turns on.
  11. cp /etc/nsswitch.dns /etc/nsswitch.conf
  12. echo "nameserver 192.168.23.254" >> /etc/resolv.conf
  13. ping www.saintaardvarkthecarpeted.com: It's alive!

Happy birthday, OpenSolaris!

Tags: solaris, work.
Sigh ----
Wed Apr 9 11:32:25 PDT 2008

This is one of the few things that would make me consider moving to the US right now.

Tags: work.
Back at work...
Mon May 26 13:47:34 PDT 2008

...after a month off, and almost no emergencies in my absence. Sweet!

Now if only I could catch up on sleep. I remember this from the first kid: you never know just how much you can accomplish on so little sleep.

Tags: geekdad, work.
Memo to myself
Wed Jul 16 08:47:41 PDT 2008

How to quiet noisy cron entries that send far too much to STDERR:

exec 3>&1 ; /path/to/script 2>&1 >&3 3>&- | egrep -v 'useless|junk' ; exec 3>&-

I've been very busy of late, but the biggest news is that I've started a 3-month temporary part-time assignment here. It's a neat place, and feels a lot like a software startup. Even though it's a small group, they've got certain hardware requirements that are a lot bigger than what I've worked with before; it'll be interesting, to say the least.

Tags: toptip, work.
That's a mighty big catchup I got goin' there
Thu Sep 25 06:14:13 PDT 2008

Work...hell, life is busy these days.

At work, our (only) tape drive failed a couple of weeks ago; Bacula asked for a new tape, I put it in, and suddenly the "Drive Error" LED started blinking and the drive would not eject the tape. No combination of power cycling, paperclips or pleading would help. Fortunately, $UNIVERSITY_VENDOR had an external HP Ultrium 960 tape drive + 24 tapes in a local warehouse. Hurray for expedited shipping from Richmond!

Not only that, the Ultrium 3 drive can still read/write our Ultrium 2 media. By this I mean that a) I'd forgotten that the LTO standard calls for R/W for the last generation, not R/O, and b) the few tests I've been able to do with reading random old backups and reading/writing random new backups seem to go just fine.

Question for the peanut gallery: Has anyone had an Ultrium tape written by one drive that couldn't be read by another? I've read about tapes not being readable by drives other than the one that wrote it, but haven't heard any accounts first-hand for modern stuff.

Another question for the peanut gallery: I ended up finding instructions from HP that showed how to take apart a tape drive and manually eject a stuck tape. I did it for the old Ultrium 2. (No, it wasn't an HP drive, but they're all made in Hungary...so how many companies can be making these things, really?) The question is, do I trust this thing or not? My instinct is "not as far as I can throw it", but the instructions didn't mention anything one way or the other.

In other news, $NEW_ASSIGNMENT is looking to build a machine room in the basement of a building across the way, and I'm (natch) involved in that. Unfortunately, I've never been involved in one before. Fortunately, I got training on this when I went to LISA in 2006, and there's also Limoncelli, Hogan and Chalup to help out. (That link sends the author a few pennies, BTW; if you haven't bought it yet, get your boss to buy it for you.)

As part of the movement of servers from one data centre across town to new, temporary space here (in advance of this new machine room), another chunk of $UNIVERSITY has volunteered to help out with backups by sucking data over the ether with Tivoli. Nice, neighbourly think of them to do!

I met with the two sysadmins today and got a tour of their server room. (Not strictly necessary when arranging for backups, but was I gonna turn down the chance to tour a 1500-node cluster? No, I was not.) And oh, it was nice. Proper cable management...I just about cried. :-) Big racks full of blades, batteries, fibre everywhere, and a big-ass robotic Ultrium 2 tape cabinet. (I was surprised that it was 2, and not U3 or U4, but they pointed out that this had all been bought about four or five years ago…and like I've heard about other government-funded efforts, there's millions for capital and little for maintenance or upgrades.)

They told me about assembling most of it from scratch...partly for the experience, partly because they weren't happy with the way the vendor was doing it ("learning as they went along" was how they described it). I urged them to think about presenting at LISA, and was surprised that they hadn't heard of the conference or considered writing up their efforts.

Similarly, I was arranging for MX service for the new place with the university IT department, and the guy I was speaking to mentioned using Postfix. That surprised me, as I'd been under the impression that they used Sendmail, and I said so. He said that they had, but they switched to Postfix a year ago and were quite happy with it: excellent performance as an MTA (I think he said millions of emails per day, which I think is higher than my entire career total :-) and much better Milter performance than Sendmail. I told him he should make a presentation to the university sysadmin group, and he said he'd never considered it.

Oh, and I've completely passed over the A/C leak in my main job's server room…or the buttload of new servers we're gonna be getting at the new job…or adding the Sieve plugin for Dovecot on a CentOS box...or OpenBSD on a Dell R300 (completely fine; the only thing I've got to figure out is how it'll handle the onboard RAID if a drive fails). I've just been busy busy busy: two work places, still a 90-minute commute by transit, and two kids, one of whom is about to wake up right now.

Not that I'm complaining. Things are going great, and they're only getting better.

Last note: I'm seriously considering moving to Steve Kemp's Chronicle engine. Chris Siebenmann's note about the attraction of file-based systems for techies is quite true, as is his note about it being hard to do well. I haven't done it well, and I don't think I've got the time to make it good. Chronicle looks damn nice, even if it does mean opening up comments via the web again…which might mean actually getting comments every now and then. Anyhow, another project for the pile.

Tags: backups, hardware, lisa, meta, networking, work.
I love working at UBC
Wed Nov 19 09:01:16 PST 2008

Just now from the window, over the sound of a stupid high-pressure washer, I heard a Canada goose fly by, honking its head off.

Tags: work.
s/$job\_1/$job\_2/g
Tue Jan 13 05:37:52 PST 2009

I've been hlding off mentioning this 'til all my ducks were in a row, but at last it's settled. The job I've been working at part-time for the last six months will be my full-time job starting next Wednesday. w00t!

I've been spending my time at $job_1 making sure the documentation is complete, getting a spare workstation set up and ready to go, and dumping my brain into the sysadmin who will be helping fill in 'til a new person is hired (which might take a while).

I'm really excited about this. First off, I'll get my lunch hours back; I've been walking between the two offices (mornings at one, afternoons at the other, back to the first for the last half hour), and it'll be nice to have an hour to myself again. But the new job is exciting for me: nice big servers used for scientific computation, the chance to build an infrastructure from scratch, and some big projects. The people are friendly. The boss is nice. The place has funding for the next five years or so. It's all good. About the only thing missing is a rocket pack so I can cut down on this 90-minute commute.

And on top of all that, they're open to the idea of sending me to LISA this year. Now that would be nice…have to see if it works with the family, but I'm keeping my fingers crossed.

In other news:

Tags: beer, ldap, meta, reading, work.
Only a couple days late
Sat Jan 24 14:34:29 PST 2009

Okay, so the other thing I was going to do was blog regularly. And now it's three days later.

But I've been meaning to mention another aspect of the new job as well. When, previous to working here, I'd thought about what I'd like my next job to be like, it was pretty consistent:

The last point needs a bit of expansion. See, my first job in IT was on the helpdesk of a small ISP. There were three of us on helpdesk, one webmaster, one sysadmin, one database guy, one secretary and one manager; I got some mentoring from the sysadmin (who split his time betwen us and a sister company), but not lots. My second was at a startup company; the guy who hired me was a good mentor, and then after a while after he left I got to hire a junior and be a mentor to him. The job I just left was pretty much just me, though I'm lucky enough to have other people I could talk to; UBC's a big place, but I was in a small department.

So my next job was going to be bigger (as in a bigger installation — maybe a whole data centre, even) and have more people — because I really, really wanted to hang out with my peers and learn from them. I envied the people I'd met at LISA in 2006 who were part of a team, who had people to teach and people to learn from.

Well, at this job it's...just me. Sort of; the folks I've been working with for the last six months (one lab out of the five that make up the centre) are pretty technical. They know way more about Java and MySQL and web development and how the latest CPUs from Intel compare with AMD than I do. But I'm the sysadmin. There might be another in the future, but there isn't now.

But! But, there are two sysadmins on the floor above me who work in another department. For various reasons, we're going to be working closely for the forseeable future. On Friday, I went up to talk with them about how that was going to work out.

They knew stuff I didn't know -- no surprise there -- but it turned out I could show them a trick or two as well. We swapped war stories, discussed our very different backgrounds (saved for another entry), and just shot the shit. It was wonderful.

It's weird, because I'm an introvert, and not very socially apt. (Or ept. As in "opposite of inept".) But it's really, really nice to get together with people who like being a sysadmin the way I do.

(This entry brought to you by the number i, the letter Ve, and my youngest son's 90-minute nap.)

Tags: geekdad, work.
Squint
Tue Apr 28 16:34:11 PDT 2009

This has been one of those days where all I've done is stare at monitors too closely.

I know, I'm a sysadmin, what do I expect? But some days I get up, move around; I'm sedentary (and introverted) by nature but I try to talk to people, stare off into the distance, get away from my desk. Going to the server room is always a good break.

Not today, though. My carefully-chosen ATI video card (the Radeon 4550) is giving me headaches, metaphorical and real:

Dual monitors is important. My own damn fault for not getting something old enough...

Tags: hardware, linux, work.
New server room ours at last
Wed Jun 10 21:07:30 PDT 2009

Given the recent hoo-ha about abandoned blogs, and my own tendency to lose interest in writing about something the longer I put it off (I haven't graphed it, but I suspect it's a nice exponential decay), I figured I should finally write up what I've been doing the last week: the move at $WORK to our new server room.

So: construction finally got finished on our new server room. Our UPS was installed, our racks set up, and the keys handed over (though they were to be changed again twice). Our new netblock was assigned, the Internet access at the new location was in place, and movers were booked.

Things I did in advance which helped immensely:

Last Thursday morning, it all started. I got the machines shut down (thank you, SSH and ubiquitous wireless access at UBC) before the two volunteers who were helping me showed up. We started getting machines unracked; since it was only about 20 machines, I figured it wouldn't take too long. While that was true, I had not counted on the rat's nest of power cables (our power requirements were such that we had to connect machines to PDUs in adjacent racks), or the fact that we wouldn't be able to disassemble that 'til we'd got the machines out.

There was one heartstopping moment: a 1U server, while extended on its rails, came off one of the rails while no one was supporting it. Amazingly the other rail held on while it rotated quickly through 90 degrees to bang loudly against the rack. "You swear quickly," the movers remarked. (Doubly amazingly, the machine seems to be fine, though the rails for the thing are shot.)

The movers were big and burly, which was wonderful when it came to moving the Thumper. I weigh more than it does, but not by much, and I'd had the bad fortune to screw up my back a week before the move. It was tricky trying to figure out how to remove it from the rails, but the movers' trick of supporting it with a couple of big blankets, while fully extended from the rack, made such considerations less urgent. Eventually we got it figured out. I don't know how that could have gone smoother, since we'd got Sun to rack the thing and, frankly, it's not like you spend a lot of time un- and re-racking something like that. Anyhow, a minor point.

The new location was right around the corner, which was handy. The movers had put the servers in these big laundry-like carts on wheels; in the end, we only had four of em. We got the machines unloaded, racked the Thumper with the movers help, signed the paper, then went off for lunch where we picked up two more volunteers.

After that, we started racking servers. Having only one sysadmin around (me) proved to be a bottleneck; the volunteers had not worked with rackmounted machines before, and I kept having to stop what I was doing to explain something to them. It would have been a great help to have another admin around; in fact, I think this is the biggest move I'd want to make without some other admin around.

Problems we ran into:

Things that went well:

I'm going to post this now because if I don't, it'll never get done. I may come back and revise it later, but better this than nothing at all.

Tags: emacs, hardware, serverroom, work.
What to ask when taking over external servers?
Mon Sep 21 16:16:42 PDT 2009

At $WORK, I'm going to be taking over the administration of four servers that currently do stuff for a variety of researchers scattered around the province. There are a number of players here:

The owning agency has also ponied up for an upgrade to the four servers; I'll be taking delivery some time next week.

I've got some preliminary information -- what the servers do, how the users use the thing, etc -- but I'm preparing a more detailed plan. In the meantime, I've compiled a list of questions for my local contact.

In the middle of that, it occurred to me that this would be a good discussion topic. Have I missed anything? Let me know!

2 comments. Tags: migration, work.
Zounds
Mon Dec 7 16:19:57 PST 2009

Busy day:

Tonight, bed at 8.30pm. And there's no shame in that.

2 comments. Tags: geekdad, work.
Xmas maintenance
Thu Dec 31 05:57:47 PST 2009

A nice thing about working at a university is that you get all this time off at Xmas, which is really nice; however, it's also the best possible time to do all the stuff you've been saving up. Last year my time was split between this job and my last; now, the time's all mine, baby.

Today will be my last of three days in a row where the machines have been all mine to play with^W^Wupgrade. I've been able to twiddle the firewall's NIC settings, upgrade CentOS using Cfengine, and set up a new LDAP server using Cobbler and CentOS Directory Server. I've tested our UPS' ATS, but discovered that NUT is different from APCUPSD in one important way: it doesn't easily allow you to say "shut down now, even though there's 95% battery left". I may have to leave testing of that for another day.

It hasn't all gone smoothly, but I've accomplished almost all the important things. This is a nice surprise; I'm always hesistant when I estimate how long something will take, because I feel like I have no way of knowing in advance (interruptions, unexpected obstacles...you know the drill). In this case, the time estimates for individual tasks were, in fact, 'way paranoid, but that gave me the buffer that I needed.

One example: after upgrading CentOS, two of our three servers attached to StorageTek 2500 disk arrays reported problems with the disks. Upon closer inspection, they were reporting problems with half of the LUNs that the array was presenting to them -- and they were reporting them in different ways. It had been a year or longer since I'd set them up, and my documentation was pretty damn slim, so it took me a while to figure it out. (Had to sleep on it, even.)

The servers have dual paths to the arrays. In Linux, the multipath drivers don't work so well with these, so we used the Sun drivers instead. But:

  1. You have to rebuild the drivers after a kernel change.
  2. This only showed up on two servers because the third server had not upgraded its kernel (or indeed, any of its packages). Why? cfservd had refused its connection because I had the MaxConnections parameter too low.
  3. And of the two that did upgrade, the one machine we'd tested the Linux drivers on still had an old multipath.conf file in /etc, which even though the multipathd. service wasn't starting up was enough to get drivers loaded. This took a while to figure out because I'd completely forgotten how to tell which driver was in use.

I got it fixed in the end, and I expanded the documentation considerably. (49,000 words and counting in the wiki. Damn right I'm bragging!)

Putting off 'til next time, tempted though I am: reinstalling CentOS on the monitoring machine, which due to a mix of EPEL and Dag repos and operator error appears to be stuck in a corner, unable to upgrade without ripping out (say) Cacti. I moved the web server to a backup machine on Tuesday, and I'll be moving it back today; this is not the time to fiddle with the thing that's going to tell me I've moved everything back correctly.

(Incidentally, thanks to Matt for the rubber duck, who successfully talked me down off the roof when I was mulling this over. Man, that duck is so wise...)

Last day today. (Like, ever!) If I remember correctly I'm going to test the water leak detector...and I forget the rest; it's all in my daytimer and I'm too lazy to get up and look right now. Wish me luck.

And best of 2010 to all of you!

Tags: centos, cfengine, monitoring, packagemanagement, serverroom, upgrades, work.
Must change title
Tue Jan 12 05:38:13 PST 2010

Happy 2010 everyone! Now that it seems to be well and truly under way, I feel I can say that safely.

It's been busy so far. All the stuff I didn't do in 2009 is still on my plate...which is obvious, right? but it still caught me by surprise after the 3 days doing Xmas maintenance on my own. It was easy to forget that there are, you know, people waiting to show up and do work.

Like the new students we've got for one of the faculty members. I'd upgraded OpenSuSE on their new workstations over the holidays, then when they came in yesterday the carefully-tweaked dual monitor displays weren't working. Arghh.

Or the guy who's let me know that he wants to get moving on the MySQL/PHP website he's building...which reminds me that I've still got to move the website to a virtualized machine. I'm tempted to do that RIGHT NOW and put his site in there, but I don't think that'll be the best way to do it.

Or the new project my boss is part of, which involves researchers from across Canada. For me, it's a new website, hardware recommendation and purchases, maybe a new LDAP server. I could add a new root suffix to the existing LDAP server, but

a. we don't need it yet a. that seems like it'll make it more difficult to move later a. while I can create one in the existing LDAP server (Fedora/389/CentOS DS), the cn=config tree seems suspiciously empty of any entries related to the new root...so I'm leery of trusting it.

I still haven't sat down yet and tried to plan my year. Partly I've been busy, partly my planning tools are a bit of a mess (daytimer + orgmode + RT). But at some point I need to get my priorities straight and oh, how I long to have them straight. I feel a bit like I'm spinning my wheels right now.

Ah well. In other news, Xmas was good; my kids got two guitars (one acoustic with an Elmo sticker, one fake double-neck electric) which makes four guitars they have now. Since they no longer have that to fight over, they've taken to fighting over a microphone (cardboard tube stuck in a toy that acts like a stand). But damnit, they're still cute.

Family

Finally: Just for fun right now I did a word count of all my blog entries. I've been blogging since 2004, and I've got something like 158,000 words. Amazing. And there are still some entries I've got to grab from my old Slashdot journal.

Tags: geekdad, work.
Hopping
Tue Apr 27 16:26:40 PDT 2010

Been busy lately:

But hey! Turns out we live in a constitutional democracy after all. There was some debate about this at 24 Sussex Drive, I understand. Score one for the good guys.

Tags: dell, hardware, politics, work.
Rule #3 of sysadmin club
Tue Aug 17 06:16:55 PDT 2010

I'm trying to get Bacula to make a separate copy of monthly full backups that can be kept off-site. To do this, I'm experimenting with its "Copy" directive. I was hoping to get a complete set of tapes ready to keep offsite before I left, but it was taking much longer than anticipated (2 days to copy 2 tapes). So I cancelled the jobs, typed unmount at bconsole, and went home thinking Bacula would just grab the right tape from the autochanger when backups came.

What I should have typed was release. release lets Bacula grab whatever tape it needs. unmount leaves Bacula unwilling to do anything on its own, and it waits for the operator (ie, me) to do something.

Result: 3 weeks of no backups. Welcome back, chump.

There are a number of things I can do to make sure this doesn't happen again. There's a thread on the Bacula-users mailing list (came up in my absence, even) detailing how to make sure something's mounted. I can use release the way Kern intended. I can set up a separate check that goes to my cel phone directly, and not through Nagios. I can run a small backup job manually on Fridays just to make sure it's going to work. And on it goes.

I knew enough not to make changes as root on Friday before going on vacation. But now I know that includes backups.

Tags: backups, fail, work.

RSS Feed