A note on comments

About a year ago, I started using a cobbled-together system of Bash and Perl scripts and Makefiles to put together this blog. One of the reasons was my general dislike for PHP; another was my desire to try living (at least in some small way) by Saint Aardvark's Axiom of Information Utility, and try keeping this in plain text. (Another was a desire to use Emacs to write these damn things; I want the control that's thrown out when you start using a GUI to edit.)

But one of the problems that faced me was how to deal with comments, and comment spam. Having a web form that allowed comments made commenting easy, but the downside was that it made spamming easy too. WP and others keep this down to a dull roar, but it's not perfect and I've had problems with false positives — people being unable to post comments because their IP address was on some blacklist, and the plugin had made no provision for whitelisting.

I decided to lash together something that would use email. For me — a very small, low-traffic website, with a blog devoted to a rather obscure set of concerns and a tech-savvy audience (Hi Dad!) — this seemed like a good choice. Email spam, for me, has been pretty much solved by greylisting and SpamAssassin. (There's the problem of a ten — no, fourteen — year-old email address that I've been meaning to get changed for a while now, but that's another story; they don't seem to do greylisting, and SpamAssassin does catch most of it.) So taking comments by email seemed, you know, righteous, dude.

The system for comments is pretty simple: every post gets an epoch timestamp embedded in it. (I think if you look in the HTML source, you can see it.) I use it for sorting the order of the posts, and I use it to generate email addresses for post-specific comments. The format is simple: comments+(seconds since the epoch)@saintaardvarkthecarpeted.com. The address is included in the post, though I haven't done much to make it obvious. (This blog, and I think this whole website, would make baby Jacob Nielson cry.)

My thinking was that, even though I was publishing the addresses, it wouldn't matter: as I mentioned, spam for me has been mainly solved (insert disclaimers here). Between greylisting and SpamAssassin, I figured I pretty much wouldn't see any spam at all.

Turns out there's another benefit: the addresses have been picked up by spam bot crawlers, but they're screwing up the scraping. From 24 days of mail logs, I see a crapload of attempts to deliver to the wrong address:

$ perl -ne'/NOQUEUE/ && s{.*to=<(\S+?)>.*}{$1} && print "$_\n";' mail.log* | sort | uniq -c | sort -n
[much snippage]
```
 36 1181577610@saintaardvarkthecarpeted.com
 36 1182947701@saintaardvarkthecarpeted.com
 37 1181326150@saintaardvarkthecarpeted.com
 37 1183667208@saintaardvarkthecarpeted.com
 38 1182949918@saintaardvarkthecarpeted.com
 40 1183349604@saintaardvarkthecarpeted.com
```

There were more than 2500 of these messages turned away by greylisting. They've all stripped off everything up to the plus, not realizing (as I didn't until a few years ago) that a plus in an email is valid.

In fact, the only attempts to deliver to legitimate comment addresses were two actual comments to my blog…which brings up a shortcoming: I never got that many comments with WordPress, but I sure got more than I do now. It's possible my writing has just gone 'way downhill, but I think it's more likely that this system just puts people off, or they're just unable to find it with my current (crappy) design.

(One interesting problem: my wife tried to comment once, using Lotus Notes at her workplace. It converted the plus sign into an underscore. Weird.)

I still regard this setup for comments as an experiment. Its results are definitely mixed; no spam, but fewer comments as well. Given the tiresome mess that comes with the lack of an HTTP equivalent of greylisting, I'm inclined to keep doing it.

Anyhow...that's my interesting research result for the day. You may now talk amongst yourselves.