Working with Rocks
01 Dec 2010So we want a cluster at $WORK. I don't know a lot about this, so I figure that something like Rocks or OSCAR is the way to go. OSCAR didn't look like it had been worked on in a while, so Rocks it is. I downloaded the CDs and got ready to install on a handful of old machines.
(Incidentally, I was a bit off-base on OSCAR. It is being worked on, but the last "production-ready" release was version 5.0, in 2006. The newest release is 6.0.5, but as the documentation says:
Note that the OSCAR-6.0.x version is not necessarily suitable for production. OSCAR-6.0.x is actually very similar to KDE-4.0.x: this version is not necessarily "designed" for the users who need all the capabilities traditionally shipped with OSCAR, but this is a good new framework to include and develop new capabilities and move forward. If you are looking for all the capabilities normally supported by OSCAR, we advice you to wait for a later release of OSCAR-6.1.
So yeah, right now it's Rocks.)
Rocks promises to be easy: it installs a frontend, then that frontend installs all your compute nodes. You install different rolls: collections of packages. Everything is easy. Whee!
Only it's not that way, at least not consistently.
I'm reinstalling this time because I neglected to install the Torque roll last time. In theory you can install a roll to an already-existing frontend; I couldn't get it to work.
A lot of stuff -- no, that's not true. Some stuff in Rocks is just not documented very well, and it's the little but important and therefore irritating-when-it's-missing stuff. For example: want the internal compute nodes to be LDAP clients, rather than syncing /etc/passwd all around? That means modifying /var/411/Files.mk to include things like /etc/nsswitch and /etc/ldap.conf. That's documented in the 4.x series, but it has been left out of the 5.x series. I can't tell why; it's still in use.
When you boot from the Rocks install CD, you're directed to type "Build" (to build a new frontend) or "Rescue" (to go into rescue mode). What it doesn't tell you is that if you don't type in something quickly enough, it's going to boot into a regular-looking-but-actually-non-functional CentOS install and after a few minutes will crap out, complaining that it can't find certain files -- instead of either booting from the hard drive or waiting for you to type something. You have to reboot again in order to get another chance.
Right now I'm reinstalling the front end for the THIRD TIME in two days. For some reason, the installation is crapping out and refusing to store the static IP address for the outward-facing interface of the front end. Reinstalling means sitting in the server room feeding CDs (no network installation without an already-existing front end) into old servers (which have no DVD drives) for an hour, then waiting another half hour to see what's gone wrong this time.
Sigh.
2 Comments
From: Daniel Howard
2 December 2010 00:33:43
Well-titled. :>
I have no Rocks experience, but mayhaps the installation testing could be done on a modern workstation within a virtualization package, the test servers munching through ISO files. Test cycles in the server room get old quick.
If you can't virtualize, I hope you have some earplugs, and / or something like the "Peltor H10A Optime 105 Over-the-Head Earmuff" Amazon.com is selling for $20.
Have fun! Stay safe! Keep it sane!
-danny
From: Paul
2 December 2010 20:08:22
That default CD boot looks for an already configured head node for all configuration info. It's designed that way so that you can easily add many nodes to the cluster with no need to attach a keyboard, mouse, or even a screen. Yes, it make's it important to be there watching the screen while you first set up the frontend server, but it saves a lot of time when you're adding many (400+ in our setup) nodes to the cluster.
I do agree that documentation could be improved, I end up doing searches on http://marc.info/?l=npaci-rocks-discussion and google. At some point I'm going to clean up the notes I've taken over the years and posting them somewhere.
To be fair, this isn't a commercial product, so no one's getting paid to write complete documentation. They were granted some funds from the National Science Foundation and I'm sure they're focusing their attention to improving the software. Maybe there's some company that will sell support for it.
I've looked around at a few clustering alternatives, and rocks really is the easiest of the bunch. If you used versions 3 and 4, you'd appreciate the fixes and ease of use of version 5 a lot more.
I think i once had a server with just a CD on it.. ended up attaching a DVD drive by USB..
Good luck,
-Paul
Add a comment:
Name and email required; email is not displayed.
Related Posts
QRP weekend 08 Oct 2018
Open Source Cubesat Workshop 2018 03 Oct 2018
mpd crash? try removing files in /var/lib/mpd/ 11 Aug 2018