Rocks Lesson Part 1 -- BLAS

There's a lot to clusters. I'm learning that now.

At $WORK, we're getting a cluster RSN -- rack fulla blades, head node, etc etc. I haven't worked w/a cluster before so I'm practicing with a test one: three little nodes, dual core CPUs, 2 GB memory each, set up as one head node and two compute nodes. I'm using Rocks to manage it.

Here a few stories about things I've learned along the way.

BLAS

You find a lot of references to BLAS when you start reading software requirements for HPC, and not a lot explaining it.

BLAS stands for "Basic Linear Algebra Subprograms"; the original web page is here. Wikipedia calls it "a de facto application programming interface standard for publishing libraries to perform basic linear algebra operations such as vector and matrix multiplication." This is important to realize, because, as in the article, common usage of the term seems to refer to an API than anything else; there's the reference implementation, but it's not really used much.

As I understand it -- and I invite corrections -- BLAS chugs through linear algebra and comes up with an answer at the end. Brute force is one way to do this sort of thing, but there are ways to speed up the process; these can make a huge difference in the amount of time it takes to do some calculation. Some of these are heuristics and algorithms that allow you to search more intelligently through the search space. Some are ways of compiling or writing the library routines differently, taking advantage of the capabilities of different processors to let you search more quickly.

There are two major open-source BLAS implementations:

As such, compilation of ATLAS is a big deal, and the resulting binaries are tuned to the CPU they were built on. Not only do you need to turn off CPU throttling, but you need to build on the CPU you'll be running on. Pre-built packages are pretty much out.

ATLAS used to be included in the HPC roll of the Rocks 4 series. Despite [irritatingly out-of-date information][13], this has not been the case in a while.

[LAPACK] "is written in Fortran 90 and provides routines for solving systems of simultaneous linear equations, least-squares solutions of linear systems of equations, eigenvalue problems, and singular value problems." It needs a BLAS library. From the FAQ:

Why aren’t BLAS routines included when I download an LAPACK routine?

It is assumed that you have a machine-specific optimized BLAS library already available on the architecture to which you are installing LAPACK. If this is not the case, you can download a Fortran77 reference implementation of the BLAS from netlib.

Although a model implementation of the BLAS in available from netlib in the blas directory, it is not expected to perform as well as a specially tuned implementation on most high-performance computers -- on some machines it may give much worse performance -- but it allows users to run LAPACK software on machines that do not offer any other implementation of the BLAS.

Alternatively, you can automatically generate an optimized BLAS library for your machine, using ATLAS http:>www.netlib.org/atlas/

(There is an RPM called "blas-3.0" available for rocks; given the URL listed (http://www.netlib.org/lapack/), it appears that this is the model implementation listed above. This version is at /usr/lib64/libblas.so*, and is in ldconfig.)

Point is, you'll want a BLAS implementation, but you've got two (at least) to choose from. And you'll need to compile it yourself. I get the impression that the choice of BLAS library is something that can vary depending on religion, software, environment and so on...which means you'll probably want to look at something like modules to manage all this.

Tomorrow: Torque, Maui and OpenMPI.