What I've been reading

High-throughput biological assays let us ask very detailed questions about how diseases operate, and promise to let us personalize therapy. Data processing, however, is often not described well enough to allow for reproduction, leading to exercises forensic where raw data and reported results are used to infer what the methods must have been. Unfortunately, poor documentation can shift from an inconvenience to an active danger when it obscures not just methods but errors.

In this talk, we examine several related papers using array-based signatures of drug sensitivity derived from cell lines to predict patient response. Patients in clinical trials were allocated to treatment arms based on these results. However, we show in several case studies that the reported results incorporate several simple errors that could put patients at risk. One theme that emerges is that the most common errors are simple (e.g., row or column offsets); conversely, it is our experience that the most simple errors are common. We briefly discuss steps we are taking to avoid such errors in our own investigations.

So, I make an open call for these two tasks: a simple tool to pin together slides and audio (and sides and video), and an effort to collate video from scientific conference talks and film them if it t exist, all onto a common distribution platform. S-SPAN could start as raw and underproduced as C-SPAN, but I am sure it would develop from there.

I'm looking at you, YouTube.