$foo =~ s/baz/bum/;
This week I've been taking Python3 training at work: 4 days of
staying at home and concentrating on Python. The result? 4 days to
work on Python, sharpening my skills, and that's a good thing. The
lecture was not that hot, but what was useful was having the
exercises in front of me, waiting to be done and no distractions to
keep me from them. And after all that, the biggest difference I
notice between Python 2.7 and Python 3 is print "foo"
vs
print("foo")
. (Which shows you how much Python I know. But
still.) I finished the exercises a few hours early, so I spent the
time trying to solve the coding challenge we give new people at
OpenDNS. (I didn't get that one; instead, I got the "this machine
is borked in 12 different ways, please solve it" challenge.) This
has been a wonderful way to stretch my brain, and work on something
very very different from what I do every day. I wish work had the
same sort of course for Ruby and Go.
Have I mentioned that I've come to love Bandcamp? Lots of excellent music, and I keep finding lots of excellent music. I mean, really really excellent music.
Like Hairy Hands.
Or Mars, Etc.
Or Snail Mail.
Also on the music front, one really excellent station I've found is Popadelica.
But back to Python: despite the click bait title, O'Reilly's "20 Python Libraries You Aren't Using But Should" is wonderfully informative for this Python n00b.
I loved showing this video to my kids, demonstrating how bacteria evolve.
Set up a Tor node last week for the cause.
Did you know there was a fork of Bacula named Bareos? Not I. Not sure whether to pronounce it "bar-ee-os" or "bear-o-s". Got Kern Sibbald, Bacula's creator, rather upset. He promises to bring over "any worthwhile features"...which is good, because there are a lot.
Post by Matthew Green titled "How does the NSA break SSL?". Should be reading that now but I'm writing this instead.
I have not read The Phoenix Project, which makes me a bad person for reacting so viscerally to things like "A personal reinterpretation of the three ways" and the self-congratulatory air of the headshot gallery. I'm trying to figure out why I react this way, and whether it's justified or just irrational dislike of people I perceive as outsiders. Seriously, though, the Information Technology Process Institute?
Got Netflix at home? Got IPv6? That might be why they think you're in Switzerland and change your shows accordingly. In my case, they thought I was in the US and offered to show "30 Rock" and "Europa Report"...until I tried to actually stream them and they figured out the truth. Damn.
Test-Driven Infrastructure with Chef. Have not used Chef before, but I like the approach the author uses in the first half of the book: here's what you need to accomplish, so go do it. The second half abandons this...sort of unfortunate, but I'm not sure explaining test infrastructure libraries (ServerSpec, etc) would work well in this approach. Another minor nitpick: there's a lot of boilerplate output from Chef et al that could easily be cut. Overall, though, I really, really like this book.
Mencius Moldbug, one of the most...I mean...I don't even. Jaw-droppingly weird. Start with "Noam Chomsky killed Aaron Swartz".
I can't remember where I came across it, but this Bash Pitfalls page is awesome.
First day back at $WORK after the winter break yesterday, and some...interesting...things. Like finding out about the service that didn't come back after a power outage three weeks ago. Fuck. Add the check to Nagios, bring it up; when the light turns green, the trap is clean.
Or when I got a page about a service that I recognized as having, somehow, to do with a webapp we monitor, but no real recollection of what it does or why it's important. Go talk to my boss, find out he's restarted it and it'll be up in a minute, get the 25-word version of what it does, add him to the contact list for that service and add the info to documentation.
I start to think about how to include a link to documentation in Nagios alerts, and a quick search turns up "Default monitoring alerts are awful" , a blog post by Jeff Goldschrafe about just this. His approach looks damned cool, and I'm hoping he'll share how he does this. Inna meantime, there's the Nagios config options "notes", "notesurl" and "actionurl", which I didn't know about. I'll start adding stuff to the Nagios config. (Which really makes me wish I had a way of generating Nagios config...sigh. Maybe NConf?)
But also on Jeff's blog I found a post about Kaboli, which lets you interact with Nagios/Icinga through email. That's cool. Repo here.
Planning. I want to do something better with planning. I've got RT to catch problems as they emerge, and track them to completion. Combined with orgmode, it's pretty good at giving me a handy reference for what I'm working on (RT #666) and having the whole history available. What it's not good at is big-picture planning...everything is just a big list of stuff to do, not sorted by priority or labelled by project, and it's a big intimidating mess. I heard about Kanban when I was at LISA this year, and I want to give it a try...not suure if it's exactly right, but it seems close.
And then I came across Behaviour-driven infrastructure through Cucumber, a blog post from Lindsay Holmwood. Which is damn cool, and about which I'll write more another time. Which led to the Github repo for a cucumber/nagios plugin, and reading more about Cucumber, and behaviour-driven development versus test-driven development (hint: they're almost exactly the same thing).
My god, it's full of stars.
This cool: 37 Signals released sub, a framework for easy subcommands. Think "git [whatever]", "git help [whatever]", etc. Code's up on github.
Last year I came across the issue of reproducible science: the question of how best to ensure that published science can be reproduced by other people, whether because you want to fight fraud or simply be sure that there's something really real happening. I'm particularly interested in the segment of this debate that deals with computer-driven science, its data and its source code. Partly that's because I very much believe in Free as in Freedom, and partly it's because of where I work: a lot of the people I work with are doing computer-driven scientific research, and dealing with source code (theirs and others) is very much part of my job.
So it was interesting to read a blog post from Iddo Friedberg called "Can we make accountable research software?" Friedberg is an actual scientist and everything (and an acquaintance of my boss, which was a surprise to me). He wrote about two aspects of scientific programming that I hadn't realized before:
the prevalence -- no, requirement -- of quick, prototype code, much of it to be discarded when the hypothesis it explores doesn't pan out
the inability to get funding for maintaining code, or improving it for public use
As a result, code written by scientists (much of it, in his words, "pipeline hacks") is almost never meant to be robust, or used by others, or broadly applicable. Preparing that code for a paper is even more work than writing the paper itself. But just dumping all the code is not an option, either: "No one has the time or energy to wade through a lab's paper- and magnetic- history trail. Plus, few labs will allow it: there is always the next project in the lab's notebooks and meetings, and no one likes to be scooped." And even if you get past all that, you're still facing trouble. What if:
a reviewer can't make the code compile/run and rejects the paper?
the code really isn't fit for human consumption, and there are lots of support demands his lab can't fulfill?
someone steals an idea latent in his code and his lab gets scooped?
Friedberg's solution: add an incentive for researchers to provide tested, portable code. The Bionformatics Testing Consortium, of which he's a member, will affix gold stars to papers with code that volunteer reviewers will smoke-test, file bugs against and verify against a sample dataset. Eventually, the gold star will signify a paper that's particularly cool, everyone will want one, and we'll have all the code we need.
However, even then he's not sure that all code needs to be released. He writes in a follow-up post:
If the Methods section of the paper contain the description and equations necessary for replication of research, that should be enough in many cases, perhaps accompanied by code release post-acceptance. Exceptions do apply. One notable exception would be if the paper is mostly a methods paper, where the software -- not just the algorithm -- is key.
[snip]
Another exception would be the paper Titus Brown and Jonathan Eisen wrote about: where the software is so central and novel, that not peer-reviewing it along with he paper makes the assessment of the paper's findings impossible.
(More on Titus Brown and the paper he references ahead.)
There were a lot of replies, some of which where in the Twitter conversation that prompted the post in the first place (yes, these replies TRAVELED THROUGH TIME): things like, "If it's not good enough to make public, why is it good enough to base publications on?" and "how many of those [pipeline] hacks have bugs that change results?"
Then there was this comment from someone who goes by "foo":
I'm a vanilla computer scientist by training, and have developed a passion for bioinformatics and computational biology after I've already spent over a decade working as a software developer and -- to make things even worse -- an IT security auditor. Since security and reliability are two sides of the same coin, I've spent years learning about all the subtle ways software can fail.
[snip]
During my time working in computational biology/bioinformatics groups, I've had a chance to look at some of the code in use there, and boy, can I confirm what you said about being horrified. Poor documentation, software behaving erratically (and silently so!) unless you supply it with exactly the right input, which is of course also poorly documented, memory corruption bugs that will crash the program (sucks if the read mapper you're using crashes after three days of running, so you have to spend time to somehow identify the bug and do the alignment over, or switch to a different read mapper in the hope of being luckier with that), or a Perl/Python-based toolchain that will crash on this one piece of oddly formatted input, and on and on. Worst of all, I've seen bugs that are silent, but corrupt parts of the output data, or lead to invalid results in a non-obvious way.
I was horrified then because I kept thinking "How on earth do people get reliable and reproducible results working like this?" And now I'm not sure whether things somehow work out fine (strength in numbers?) or whether they actually don't, and nobody really notices.
The commenter goes on to explain how one lab he worked at hired a scientific programmer to take care of this. It might seem extravagant, but it lets the biologists do biology again. (I'm reminded of my first sysadmin job, when I was hired by a programmer who wanted to get back to programming instead of babysitting machines.) foo writes: "It's also noteworthy that having technical assistants in a biology lab is fairly common -- which seems to be a matter of the perception of "best practice" in a certain discipline." Touche!
Deepak Singh had two points:
Meanwhile, Greg Wilson got some great snark in:
I might agree that careful specification isn't needed for research programming, but error checking and testing definitely are. In fact, if we've learned anything from the agile movement in the last 15 years, it's that the more improvisatory your development process is, the more important careful craftsmanship is as well -- unless, of course, you don't care whether your programs are producing correct answers or not.
[snip]
[Rapid prototyping rather than careful, deliberate development] is equally true of software developed by agile teams. What saves them from [code that is difficult to distribute or maintain] is developers' willingness to refactor relentlessly, which depends in turn on management's willingness to allow time for that. Developers also have to have some idea of what good software looks like, i.e., of what they ought to be refactoring to. Given those things, I think reusability and reproducibility would be a lot more tractable.
Kevin Karplus doubted that the Bioinformatics Testing Consortium would do much:
(He also writes that the volunteers who are careful software developers are not the main problem -- which I think misses the point, since the job of reviewer is not meant to be punishment for causing a segfault.)
He worries that providing the code makes it easy to forget that proper verification of computational methods comes from an independent re-implementation of the method:
I fear that the push to have highly polished distributable code for all publications will result in a lot less scientific validation of methods by reimplementation, and more "ritual magic" invocation of code that no one understands. I've seen this already with code like DSSP, which almost all protein structure people use for identifying protein secondary structure with almost no understanding of what DSSP really does nor exactly how it defines H-bonds. It does a good enough job of identifying secondary structure, so no one thinks about the problems.
C. Titus Brown jumped in at that point. Using the example of a software paper published in Science without the code being released, he pointed out that saying "just re-implement it independently" glosses over a lot of hard work with little reward:
[...] we'd love to use their approach. But, at least at the moment, we'd have to reimplement the interesting part of it from scratch, which will take a both solid reimplementation effort as well as guesswork, to figure out parameters and resolve unclear algorithmic choices. If we do reimplement it from scratch, we'll probably find that it works really well (in which case Iverson et al. get to claim that they invented the technique and we're derivative) or we'll find that it works badly (in which case Iverson et al. can claim that we implemented it badly). It's hard to see this working out well for us, and it's hard to see it working out poorly for Iverson et al.
But he also insisted that the code matters to science. To quote at length:
All too often, biologists and bioinformaticians spend time hunting for the magic combination of parameters that gives them a good result, where "good result" is defined as "a result that matches expectations, but with unknown robustness to changes in parameters and data." (I blame the hypothesis-driven fascista for the attitude that a result matching expectations is a good thing.) I hardly need to explain why parameter search is a problem, I hope; read this fascinating @simplystats blog post for some interesting ideas on how to deal with the search for parameters that lead to a "good result". But often the result you achieve are only a small part of the content of a paper -- methods, computational and otherwise, are also important. This is in part because people need to be able to (in theory) reproduce your paper, and also because in larger part progress in biology is driven by new techniques and technology. If the methods aren't published in detail, you're short-changing the future. As noted above, this may be an excellent strategy for any given lab, but it's hardly conducive to advancing science. After all, if the methods and technology are both robust and applicable to more than your system, other people will use them -- often in ways you never thought of.
[snip]
What's the bottom line? Publish your methods, which include your source code and your parameters, and discuss your controls and evaluation in detail. Otherwise, you're doing anecdotal science.
I told you that story so I could tell you this one.
I want to point something out: Friedberg et al. are talking past each other because they're conflating a number of separate questions:
When do I need to provide code? Should I have to provide code for a paper as part of the review process, or is it enough to make it freely available after publication, or is it even needed in the first place?
If I provide it for review, how will I ensure that the reviewers (pressed for time, unknown expertise, running code on unknown platforms) will be able to even compile/satisfy dependencies for this code, let alone actually see the results I saw?
If I make the code available to the public afterward, what obligations do I have to clean it up, or to provide support? And how will I pay for it?
Let's take those in order, keeping in mind that I'm just a simple country sysadmin and not a scientist.
When do I need to provide code? At the very least, when the paper's published. Better yet, for review, possibly because it gets you a badge. There are too many examples of code being important to picking out errors or fraud; let's not start thinking about how to carve up exceptions to this rule.
I should point out here that my boss (another real actual scientist and all), when I mentioned this whole discussion in a lab meeting, took issue with the idea that this was a job for reviewers. He says the important thing is to have the code available when published, so that other people can replicate it. He's a lot more likely to know than I am what the proper role of a reviewer is, so I'll trust him on that one. But I still think the earlier you provide it, the better.
(Another take entirely: Michael Eisen, one of the co-founders of the Public Library of Science, says the order is all wrong, and we should review after publication, not before. He's written this before, in the wonderfully-titled post "Peer review is f***ed up, let's fix it".)
How do I make sure the code works for reviewers? Good question, and a hard one -- but it's one we have answers for.
First, this is the same damn problem that autotools, CPAN, pip and all the rest have been trying to fix. Yes, there are lots of shortcomings in these tools and these approaches, but these are known problems with at least half-working solutions. This is not new!
Second, this is what VMs and appliances are good at. The Encode project used exactly this approach and provided a VM with all the tools (!) to run the analysis. Yes, it's another layer of complexity (which platform? which player? how to easily encapsulate a working pipeline?); no, you don't get a working 5000-node Hadoop cluster. But it works, and it works on anyone's machine.
What obligation do I have to maintain or improve the code? No more than you can, or want to, provide.
Look: inherent in the question is the assumption that the authors will get hordes of people banging on the doors, asking why your software doesn't compile on Ubuntu Scratchy Coelacanth, or why it crashes when you send it input from /dev/null, or how come the man page is out of date. But for most Free Software projects of any description, that day never comes. They're used by a small handful of people, and a smaller handful than that actually work on it...until they no longer want to, and the software dies, is broken down by soil bacteria, returns to humus and is recycled into water snails. (That's where new Linux distros come from, btw.)
Some do become widely used. And the people who want, and are able, to fix these things do so because they need it to be done. (Or the project gets taken over by The Apache Foundation, which is an excellent fate.) But these are the exception. To worry about becoming one of them is like a teenage band being reluctant to play their first gig because they're worried about losing their privacy when they become celebrities.
In conclusion:
...Fuck it, I hate conclusions (another rant). Just publish the damned code, and the earlier, the better.
Prompted by fierce internecine rivalry with Tampa Bay Breakfasts, I'm finally putting in an update. My supervisor is my four year-old son, who's busy reading "You are the first kid on Mars" beside me while holding on to Power Ranger and Terl action figures.
Work: I've got a summer student. She was at one of the labs I work with for the last 8 months, and showed a real aptitude for computers. My boss agreed to pick up the bill for her salary, so here we are.
It's working out really, really well. She's got a lot to learn (basic networking, for example) but it is SUCH A WONDERFUL THING to have someone to send off on jobs. "Hey, have you got a minute to..." "She'll take care of it." She can help with what she knows, and what she doesn't she takes careful notes on. I've even had a chance to work on other, larger projects for, like, an hour or two at a time. It's great.
I'm going away for three weeks in June/July, and there's a lot to teach her before then. Fortunately, there are a couple other sysadmins who can help out, and a couple of other technical folk in the lab who can take on some duties. But it's been a real wake-up for me, realizing how could be made easier for someone else. It'd be nice, for example, to have something that'd let people reboot machines easily when they get stuck. Right now, I SSH to the ILOM and reset it there; what about a web page? It'd be its own set of problems, of course, and I'm not going to code something up between now and June, but it's something to think about. Or at least coming up with some handy wrapper around the ipmipower/console commands.
Home: The weather is at last, AT LAST becoming sunny and springlike. I took the telescope out on Saturday -- full moon, so I spent most of my time looking at Saturn. And holy crap, was it amazing! I saw the Cassini division for the first time, the C ring (!) and five moons. I'm starting to regret (a little) having sold the 4.3mm eyepiece; the 7.5mm is nice but does badly in the Barlow, which I suspect says more about the Barlow than anything else. (Also that night: tried looking for M65 and M66, just to see if I could find them in the suburbs under a full moon. Negative.)
I'm trying to port an astronomical utility to Rockbox; it will show altitude and azimuth for planets, Messier and NGC objects. My intention is to use it with manual setting circles on my Dob. The interesting part is that Rockbox has no floating point arithmetic, so it's not a straightforward port at all. Thus I've had to learn about fixed point arithmetic, lookup tables and the like. My trig and bitwise arithmetic are, how do you say, weak from underuse, so this is a bit of a slog. But I'm hopeful.
And now my other supervisor is coming for a status report. Time to go!
I ran into a problem today trying to compile an old Fortran program. Everything was working until the final link:
gcc -o ./DAlphaBall.f77 -O DAlphaBall.o sos_minor_gmp.o alf_tools_gmp.o adjust.o alfcx.o alfcx_tools.o delcx.o truncate_real.o measure_dvol.o dsurfvol_tools.o vector.o write_simplices.o -lgmp -lg2c -lm
/usr/bin/ld: cannot find -lg2c
collect2: ld returned 1 exit status
The strange thing is that libg2c.so.0 was installed:
$ ls -l /usr/lib64/libg2c*
lrwxrwxrwx 1 root root 15 2010-12-30 12:43 /usr/lib64/libg2c.so.0 -> libg2c.so.0.0.0
-rwxr-xr-x 1 root root 127368 2010-07-05 04:57 /usr/lib64/libg2c.so.0.0.0
After some searching, it seems that libg2c is part of an older version of gfortran, back in the day when it was actually called g77. My problem was that I was using gfortran to compile it (which, therefore, was part of the gcc-4 series) and not g77. On this system, the old version was installed as gcc33-*, and changing the Fortran compiler and CC/LD variables to the appropriate version worked a treat.
Oh, and here's some good technical background on linkers and names.
For my own reference, here are a couple things I had to find out about Perl today:
Woohoo! My first entry for the Sysadvent Calendar, on Development for Sysadmins, has been posted! Thanks to Jordan for tidying it up and adding integration testing, which I'd missed when writing the article. There will be another article from me coming up soon-ish on OCSNG and GLPI.
As of December 1st, Jordan and Matt were still accepting entries -- so head on over if you've got something to say.
I'm back at work after a week off. The UPS control panel continues to work (!), but there is no word back from the manufacturer (says the contractor who installed the thing and filed the ticket). I find this troubling; either the manufacturer really hasn't got back to us yet (bad), or I should have insisted on being a contact for the ticket. I'll have tos ort this out tomorrow.
Spent much of my day tearing my hair out over mod_proxy_html. Turns out that, by default, it strips the DTD from the HTML it proxies; this is a problem for one app that we're proxying. Not only that, the DTDs it does support are HTML, XHTML, and either with a "Transitional"/Legacy flag — but no URI to a DTD, like the one pointing to the Loose DTD that our app uses and the damned thing threw to the floor. (Sorry, brain cells on strike today and my ability to write clearly is going downhill.)
You can specify your own DTD, including a URI (undocumented feature, whee!), and thus put back in the original — but it doesn't append a newline, there's no way to append a newline that I could figure out, and so it mushes the DTD together with the first html opening tag and makes baby Firefox cry and render the page badly.
My rule of thumb for a long time was that if I start lppooking at
source code, I'm in over my head. I'm starting to think that may not
be entirely true anymore, that I've advanced to the point where I can
read C (say) and generally understand what's going on. But when I
start looking for API documentation for Apache 2.2 (surprisingly hard
to find) to find out if, say, ap_fputs
or apr_pstrdup
chomp
newlines or something (near as I can tell, they don't), or just what
AP_INIT_TAKE12
takes as arguments…well, then I am in over my
head. If nothing else, I don't want to make some silly error
because I don't know what the hell I'm doing. (That's not a slam
against the Debian folks; I just mean that I felt shivers when I read
about that, because I dread making the same sort of highly-visible,
catastrophic error) (unlike the rest of the planet, you understand).
At last: I'm finally coming to the end of working with the verdammnt web registration forms. We're going from our awful hack of a glued-together mess of Mambo and custom PHP, to something that'll mainly be Drupal with no custom code. Allegedly it's six weeks 'til launch date; the registration forms in use right now will limp along 'til they're no longer needed (end of the summer).
The registration form I'm working on now is not complicated in the absolute sense, but it's the most complicated one we've got. Last year I was afraid to touch the (old, legacy, ugly) code, and mostly just changed dates. This year I thought "fuck it" and rewrote nearly all of it, using the tools and skills I'd picked up in the meantime. (I'm still not a great programmer, understand, but I have improved some over last year.)
After a full day banging my head against it, I'm finally coming to the point where I'm pretty confident that the code will do what it's supposed to. And that's a relief. Therefore, in the stylee du Chromatic, I give thanks to:
In other news...just downloaded the second dev preview of Indiana, which I'd managed to not hear about at all (the preview releases, that is). I love university bandwidth; 640MB in about 1 minute. Sweet. I'll give it a try at home and see how it feels.
I've just finished reading the summaries of LISA '07 in the latest issue of ;login:. I feel…incredibly left out. I'm starting to think this profession might not be such a simple thing, you know, man? Sir? The presentations on autonomic computing have left me feeling a bit like a buggy whip maker with his nose to the grindstone.
And yes, it's a way off, and yes, small shops and generalists will probably be around for a while to come. But I'm not sure how much I want to keep being at a small shop. Which means learning the big stuff. Which, natch, is hard to do when you're trying to figure out how to properly test registration forms. Sigh.
But: I just stuck my head out a door at work and saw a chickadee. It chirped for a while, sitting on a tree near our building, then flew off. On a rare sunny day in Vancouver in Frebruary, after a week of messed-up sleep and feeling like I've been spinning my wheels, this is nice.
This is how I imagine Samuel L. Jackson leading off a conversation with the writers of the PHP language (edited to be less obscene and offensive).
In the name of all that is holy and right, please explain to me why
the fuck PHP's preg_replace()
takes delimiters for the first
argument, but not the second. IOW, Perl's
$foo =~ s/baz/bum/;
becomes
preg_replace('/baz/', 'bum', $foo);
Yes, I should've just RTFM. You're completely right. But this just bit me in the ass, after spending 10 minutes wondering WTF was going wrong, and a little fucking consistency goes a long fucking way.
From Bruce Schneier's newsletter comes this blog entry suggesting that there simply aren't that many serious spammers. Interesting data.
Managed to get the Perl/PHP parser extended so that it would see
nested PHP arrays and translate them to the proper hash/array
references in Perl. It was good to do that, but then other problems
arise — like the fact that, as the parser stands right now, it simply
stops parsing if it finds something it doesn't understand. This could
be something like a comment in a nested array, or something like if
($debug == 1) { $foo = "bar"; } else { … }
.
Again, I'm concluding that this would all be much, much easier if it was in a database…just have PHP and Perl suck out the data and do what they want. Either that, or just start writing everything in Perl…
Update: Also, this is not what I expect to see at the top of Planet Solaris — though maybe this should've prepared me. Rockwood's coworker's post is worth reading too.
Update2: Just for completeness, I'll mention that Ben's updates and comments are also worth reading. That's it from the Obvious Dep't.
One of the things I've been doing at work is writing registration forms for conferences. Natch, each one is slightly different, and I've never been quite sure I've been doing it right. Thus, WWW::Mechanize has been a fucking godsend to me.
But, as each of the forms are slightly different, each script is slightly different as well. If only my test script could parse the form's configuration file. Too bad the config file, like the form itself, is written in PHP.
Or is it? For what to my lumbering (yes, they lumber) eyes should appear, but CPAN's PHP parser and PHP::Include, which I think is more my size. Sweet!
But apparently not: this guy's blog also takes comments by email. (His links for reading or sending a comment are a lot better labelled than mine, so I might as well steal that too… :-)
In other news, I am finally getting close to being finished a new credit card payment page for work. The place that processes our CC payments has a new API, so this has been a good chance to rewrite the current page. I flatter myself that my version (helped out a lot by a simpler API) is much easier to understand than the old, and that's gratifying…but by the beard of Shuttleworth, I'm sick of web work. It feels like that's all I've been doing since January, and I'm really looking forward to being done with it.
Oh, and another thing -- don't take abstracts for mathematical pages in PDF. Everyone uses LaTeX or plain text for a reason, and that reason is that it's easy on the sysadmin. :-)
Some rough notes...
I may add more stuff later on. Basically, this is all the stuff that I've tripped over because I'm not a programmer, yet in a small shop it's often part of the job. I'm sure there's more, and I wouldn't be surprised if I'm all wrong about some of these things (API for DB, for example).
But man, if someone could write a book about these things, I'd be happy to buy it.