Last year I came across the issue of reproducible science:
the question of how best to ensure that published science can be
reproduced by other people, whether because you want to fight
fraud or simply be sure that there's something really real
happening. I'm particularly interested in the segment of this debate
that deals with computer-driven science, its data and its source code.
Partly that's because I very much believe in Free as in Freedom,
and partly it's because of where I work: a lot of the people I
work with are doing computer-driven scientific research, and dealing
with source code (theirs and others) is very much part of my job.
So it was interesting to read a blog post from Iddo Friedberg called
"Can we make accountable research software?" Friedberg is an
actual scientist and everything (and an acquaintance of my
boss, which was a surprise to me). He wrote about two aspects of
scientific programming that I hadn't realized before:
the prevalence -- no, requirement -- of quick, prototype code,
much of it to be discarded when the hypothesis it explores doesn't
the inability to get funding for maintaining code, or improving it
for public use
As a result, code written by scientists (much of it, in his words,
"pipeline hacks") is almost never meant to be robust, or used by
others, or broadly applicable. Preparing that code for a paper is
even more work than writing the paper itself. But just dumping
all the code is not an option, either: "No one has the time or energy
to wade through a lab's paper- and magnetic- history trail. Plus, few
labs will allow it: there is always the next project in the lab's
notebooks and meetings, and no one likes to be scooped." And even if
you get past all that, you're still facing trouble. What if:
a reviewer can't make the code compile/run and rejects the paper?
the code really isn't fit for human consumption, and
there are lots of support demands his lab can't fulfill?
someone steals an idea latent in his code and his lab gets scooped?
Friedberg's solution: add an incentive for researchers to provide
tested, portable code. The Bionformatics Testing Consortium, of
which he's a member, will affix gold stars to papers with code that
volunteer reviewers will smoke-test, file bugs against and verify
against a sample dataset. Eventually, the gold star will signify a
paper that's particularly cool, everyone will want one, and we'll have
all the code we need.
However, even then he's not sure that all code needs to be released.
He writes in a follow-up post:
If the Methods section of the paper contain the description and
equations necessary for replication of research, that should be
enough in many cases, perhaps accompanied by code release
post-acceptance. Exceptions do apply. One notable exception would
be if the paper is mostly a methods paper, where the software -- not
just the algorithm -- is key.
Another exception would be the paper Titus Brown
and Jonathan Eisen wrote about: where the software is so
central and novel, that not peer-reviewing it along with he paper
makes the assessment of the paper's findings impossible.
(More on Titus Brown and the paper he references ahead.)
There were a lot of replies, some of which where in the Twitter
conversation that prompted the post in the first place (yes,
these replies TRAVELED THROUGH TIME): things like, "If it's not good
enough to make public, why is it good enough to base publications on?"
and "how many of those [pipeline] hacks have bugs that change
Then there was this comment from someone who goes by "foo":
I'm a vanilla computer scientist by training, and have developed a
passion for bioinformatics and computational biology after I've
already spent over a decade working as a software developer and -- to
make things even worse -- an IT security auditor. Since security and
reliability are two sides of the same coin, I've spent years
learning about all the subtle ways software can fail.
During my time working in computational biology/bioinformatics
groups, I've had a chance to look at some of the code in use there,
and boy, can I confirm what you said about being horrified. Poor
documentation, software behaving erratically (and silently so!)
unless you supply it with exactly the right input, which is of
course also poorly documented, memory corruption bugs that will
crash the program (sucks if the read mapper you're using crashes
after three days of running, so you have to spend time to somehow
identify the bug and do the alignment over, or switch to a different
read mapper in the hope of being luckier with that), or a
Perl/Python-based toolchain that will crash on this one piece of
oddly formatted input, and on and on. Worst of all, I've seen bugs
that are silent, but corrupt parts of the output data, or lead to
invalid results in a non-obvious way.
I was horrified then because I kept thinking "How on earth do people
get reliable and reproducible results working like this?" And now
I'm not sure whether things somehow work out fine (strength in
numbers?) or whether they actually don't, and nobody really notices.
The commenter goes on to explain how one lab he worked at hired a
scientific programmer to take care of this. It might seem
extravagant, but it lets the biologists do biology again. (I'm
reminded of my first sysadmin job, when I was hired by a programmer
who wanted to get back to programming instead of babysitting
machines.) foo writes: "It's also noteworthy that having technical
assistants in a biology lab is fairly common -- which seems to be a
matter of the perception of "best practice" in a certain discipline."
Deepak Singh had two points:
- "Scientific programmers are either poor programmers or lazy
programmers. That means that a lot of the reasons scientific code is
not robust or maintainable is because they don't know how to write
- "Technical debt" is the name for this situation, and it will bite
you in the ass. "I wonder if people would take such shortcuts with
their lab protocols?"
Meanwhile, Greg Wilson got some great snark in:
I might agree that careful specification isn't needed for research
programming, but error checking and testing definitely are. In fact,
if we've learned anything from the agile movement in the last 15
years, it's that the more improvisatory your development process is,
the more important careful craftsmanship is as well -- unless, of
course, you don't care whether your programs are producing correct
answers or not.
[Rapid prototyping rather than careful, deliberate development] is
equally true of software developed by agile teams. What saves them
from [code that is difficult to distribute or maintain] is
developers' willingness to refactor relentlessly, which depends in
turn on management's willingness to allow time for that. Developers
also have to have some idea of what good software looks like, i.e.,
of what they ought to be refactoring to. Given those things, I think
reusability and reproducibility would be a lot more tractable.
Kevin Karplus doubted that the Bioinformatics Testing Consortium
would do much:
- getting it ready to test is "90% of the problem" in the first place
- it'll be hard to get volunteers, especially ones good at testing software
(He also writes that the volunteers who are careful software
developers are not the main problem -- which I think misses the point,
since the job of reviewer is not meant to be punishment for causing a
He worries that providing the code makes it easy to forget that proper
verification of computational methods comes from an independent
re-implementation of the method:
I fear that the push to have highly polished distributable code for
all publications will result in a lot less scientific validation of
methods by reimplementation, and more "ritual magic" invocation of
code that no one understands. I've seen this already with code like
DSSP, which almost all protein structure people use for identifying
protein secondary structure with almost no understanding of what
DSSP really does nor exactly how it defines H-bonds. It does a good
enough job of identifying secondary structure, so no one thinks
about the problems.
C. Titus Brown jumped in at that point. Using the example of a
software paper published in Science without the code being
released, he pointed out that saying "just re-implement it
independently" glosses over a lot of hard work with little reward:
[...] we'd love to use their approach. But, at least at the moment,
we'd have to reimplement the interesting part of it from scratch,
which will take a both solid reimplementation effort as well as
guesswork, to figure out parameters and resolve unclear algorithmic
choices. If we do reimplement it from scratch, we'll probably find
that it works really well (in which case Iverson et al. get to claim
that they invented the technique and we're derivative) or we'll find
that it works badly (in which case Iverson et al. can claim that we
implemented it badly). It's hard to see this working out well for
us, and it's hard to see it working out poorly for Iverson et al.
But he also insisted that the code matters to science. To quote at
All too often, biologists and bioinformaticians spend time hunting
for the magic combination of parameters that gives them a good
result, where "good result" is defined as "a result that matches
expectations, but with unknown robustness to changes in parameters
and data." (I blame the hypothesis-driven fascista for the
attitude that a result matching expectations is a good thing.) I
hardly need to explain why parameter search is a problem, I hope;
read this fascinating @simplystats blog post for some
interesting ideas on how to deal with the search for parameters that
lead to a "good result". But often the result you achieve are only a
small part of the content of a paper -- methods, computational and
otherwise, are also important. This is in part because people need
to be able to (in theory) reproduce your paper, and also because in
larger part progress in biology is driven by new techniques and
technology. If the methods aren't published in detail, you're
short-changing the future. As noted above, this may be an excellent
strategy for any given lab, but it's hardly conducive to advancing
science. After all, if the methods and technology are both robust
and applicable to more than your system, other people will use them
-- often in ways you never thought of.
What's the bottom line? Publish your methods, which include your
source code and your parameters, and discuss your controls and
evaluation in detail. Otherwise, you're doing anecdotal science.
I told you that story so I could tell you this one.
I want to point something out: Friedberg et al. are talking past each
other because they're conflating a number of separate questions:
When do I need to provide code? Should I have to provide code for a
paper as part of the review process, or is it enough to make it
freely available after publication, or is it even needed in the
If I provide it for review, how will I ensure that the reviewers
(pressed for time, unknown expertise, running code on unknown
platforms) will be able to even compile/satisfy dependencies for
this code, let alone actually see the results I saw?
If I make the code available to the public afterward, what
obligations do I have to clean it up, or to provide support? And
how will I pay for it?
Let's take those in order, keeping in mind that I'm just a simple
country sysadmin and not a scientist.
When do I need to provide code? At the very least, when the paper's
published. Better yet, for review, possibly because it gets you a
badge. There are too many examples of code being
important to picking out errors or fraud; let's not start thinking
about how to carve up exceptions to this rule.
I should point out here that my boss (another real actual scientist
and all), when I mentioned this whole discussion in a lab meeting,
took issue with the idea that this was a job for reviewers. He says
the important thing is to have the code available when published, so
that other people can replicate it. He's a lot more likely to know
than I am what the proper role of a reviewer is, so I'll trust him on
that one. But I still think the earlier you provide it, the better.
(Another take entirely: Michael Eisen, one of the co-founders of the
Public Library of Science, says the order is all wrong, and we
should review after publication, not before. He's written this
before, in the wonderfully-titled post "Peer review is f***ed up,
let's fix it".)
How do I make sure the code works for reviewers? Good question, and
a hard one -- but it's one we have answers for.
First, this is the same damn problem that autotools, CPAN,
pip and all the rest have been trying to fix. Yes, there are
lots of shortcomings in these tools and these approaches, but these
are known problems with at least half-working solutions. This is not
Second, this is what VMs and appliances are good at. The Encode
project used exactly this approach and provided a VM with
all the tools (!) to run the analysis. Yes, it's another layer of
complexity (which platform? which player? how to easily encapsulate a
working pipeline?); no, you don't get a working 5000-node Hadoop
cluster. But it works, and it works on anyone's machine.
What obligation do I have to maintain or improve the code? No more
than you can, or want to, provide.
Look: inherent in the question is the assumption that the authors will
get hordes of people banging on the doors, asking why your software
doesn't compile on Ubuntu Scratchy Coelacanth, or why it crashes when
you send it input from /dev/null, or how come the man page is out of
date. But for most Free Software projects of any description, that
day never comes. They're used by a small handful of people, and a
smaller handful than that actually work on it...until they no longer
want to, and the software dies, is broken down by soil bacteria,
returns to humus and is recycled into water snails. (That's where new
Linux distros come from, btw.)
Some do become widely used. And the people who want, and are able,
to fix these things do so because they need it to be done. (Or the
project gets taken over by The Apache Foundation, which is an
excellent fate.) But these are the exception. To worry about
becoming one of them is like a teenage band being reluctant to play
their first gig because they're worried about losing their privacy
when they become celebrities.
...Fuck it, I hate conclusions (another rant). Just publish the damned
code, and the earlier, the better.