Packaging science

So the other day I was asked to help get a bioinformatics tool working. Tarball was up on Sourceforge, so it shouldn't be a problem, right? Right. Download, skim the instructions, run "make" and we're done. Case closed!

Only I had to look. Which was a mistake. Because inside the tarball was another tarball. It was GNU coreutils, version 8.22. Which was dutifully compiled and built as part of the toolchain. It was committed about 18 months ago because:

this will create a new sort that is used by chrysalis to run sort in parallel speedup on hour system running a 13g dataset was from 46min to 6min runtime

That is a significant speedup. Yes. And sure, it's newer than the version in the last Ubuntu LTS (8.13), and 'way newer than the version in CentOS 5 (5.97). But that is a tarball, even if it is only 8 MB, in the subversion repo for a project that was published in Nature Protocols. Why in hell wasn't it written up as a dependency in the README? So yeah, I got angry: "I think I'm gonna submit a patch with an Ubuntu ISO in it, see if they accept it."

I'm struggling with what to write here. This is bad practice, yes, but what constructive, helpful alternative do I have to offer? The scientists I work with are brilliant, smart people who do amazing research, but their knowledge of proper (add scare quotes if you like) development practice is sorely lacking. It's not their fault, and folks like Software Carpentry are doing the angel's work to get them up to speed. But riddle me this: if you're trying to get a tool into the hands of a pretty new Linux user -- one who's going to base the next 18 months of their work on how well your tool works -- how do you handle this sort of thing?

We have no good alternative to offer. I can be snotty all I want (confession: IT'S SO MUCH FUN) but the truth is this is a hard problem, and people who just want to get shit done are doing it the best they can because they just want to get shit done. We have -- are -- failing them. And I don't know what to do.