mod_auth_pam v. NIS

Okay, so as I mentioned I'm trying to get a Subversion repository working in a way that a) keeps the repository safely on an NFS-exported, mirrored set of drives, and b) does not require YAFPF. Today I've been banging my head against Apache2 + mod_auth_pam. The problem is that while passwords are successfully checked (hurray! one less FPF!), group membership is not. this does not work:

AuthPAM_Enabled on
AuthPAM_FallThrough on
AuthGROUP_Enabled on
AuthGROUP_FallThrough on
AuthType Basic
AuthGroupFile /etc/group
AuthName "secure area"
Require group subversion

(For one brief, spastic moment I thought Satisfy any was the missing magic. Then I tried it without typing in a password. Sigh.) We're using FreeBSD and NIS; from what I've been able to find so far, that might be problematic. OTOH, I might have the entirely wrong idea about PAM and its ability to check group membership.

UPDATE: Logical as it seems, AuthGroupFile has no place in the modern kitchen. Removing that directive allowed everything to work. Whee!

Tags: security

Authenticating Subversion

We're going to switch from CVS to Subversion at work. I don't make a whole lot of use of CVS, so the finer points of change management are more academic to me than anything else. But authentication...ah, that's a different story. Right now, Unix clients access the CVS repository by NFS; Windows users use the pserver protocol/authentication. NFS access does cause some problems for CVS, but it's completely out of the question for Subversion if you use their Berkely DB filesystem. It's okay for read-only access if you use their FSFS (actual real filesystem files filesystem; the equivalent of CVS' bunch of directories and files). This leads to questions about how we'll allow access over the network, and how we'll authenticate users. Here's my thinking so far.

  1. Daemon + DB2
    • Pro: Can restrict access through file permissions to prevent access by NFS.
    • Con: Plain text password file. YAFPF.
  2. svn+ssh + DB2
    • Pro: Secure access from home. SSH key-based authentication.
    • Con: The mirrored drive where the repository should be kept is available by NFS and Samba; this can't change. Since file permissions would need to be open to allow read/commit, there's nothing preventing access by NFS and resultant corruption. The other alternative is putting it on a non-mirrored drive, which isn't an option either.
  3. Apache + PAM
    • Pro: Can restrict file permissions to prevent NFS/Samba access. Uses already existing FPF, and since we're not using PAM now we can eliminate AFPF. Prod to switch Samba to PAM, which would be AFPF gone.
    • Con: Haven't worked with A2, DAV or mod_auth_foo before. Since will need to coexist for a while with A1, possibility of calcification.
  4. Apache + LDAP
    • Pro: Full buzzword compliance. One FPF to bind them all. Get ready for the groupware that will someday be coming down the pike. Can restrict file permissions to prevent NFS/Samba access.
    • Con: Haven't worked with LDAP, either. Will need to convert current password file rather than access directly, creating YAFPF (at least in the short term). Much bigger change, so even bigger danger of calcifictation. (Heh...I like that typo.)

I think I can do Daemon + FSFS, but I need to reread the Subversion book (truly excellent, BTW). This might be the best way to get things going quickly. And of course, any insights or hints are welcome.

Tags: revisioncontrol

ProjectHoneypot.org

I found a link on Gecko's blog to Project Honeypot. Turns out it's a project to watch for, and attempt to track, spammer-run robots that scrape pages for email. I was intrigued, but a little put off by the terms of use. I did a big more digging around, and found I wasn't the only person who thought that way. However, there were some strong rebuttals from the SpamCop forums, discussion on SURBL mailing list, and from one of the principals (who also replied here).

Reassured, I signed up. It's still in the early stages, so there hasn't been a lot of spam received yet (350-odd pieces, according to the stats page on the site). Still, I'm hopeful it'll be a Good Thing.

Another approach: a Java SMTP honeypot. Huh.

Tags: security email

HTP!

Firewalled off from NTP? HTP to the rescue!

HTP is not really a protocol, but uses a feature from HTTP, aka web traffic. According the specifications of HTTP (RFC 2616) a web server needs to put a timestamp in a response to a web browser request. In web browsers you don't see the HTTP headers, but these headers contain a timestamp in Greenwich Mean Time (GMT), accurate in seconds.

Available in Perl or C. My compliments to Eddy Vervest.

Tags:

Ports vs NWR04B

Got a bad feeling in the pit of my stomach this morning when I came back to work. I'd deliberately stayed away from the usual non-Slashdot news sources (Internet Storm Center, Bugtraq, Full Disclosure), so there was a lot of catching up to do. Let's see: eighty-four new remote holes in Windows -- always fun -- and it turns out the phpBB worm is no longer a phpBB worm but a PHP worm. Jesus Christ.

I checked the logs on my home server, and sure enough there were tons of the little bastards hitting me. (The server at work was completely clean.) It looked like there was nothing there, but I couldn't be sure without more time spent on it than a few minutes' grepping -- which meant leaving it 'til I got home tonight. (Update: looks like I was fine. I tried the URLs in the logs, and none of them tried to fetch anything. Dodged a bullet there.)

OpenBSD has the right idea when it chroots Apache, but there's also the matter of initiating connections out. And yes, I'm guilty of this: Thornhill + port 80 + tcp syn should be firewalled off, but was not. Changed now, of course. Still, it would be nice to have Thornhill not be locked down entirely. Why not let me initiate a connection out, but prevent Apache from doing the same?

This gets back to What's Wrong With Unix?, and I still say a good part of it is the lack of fine-grained permissions on both ports and files. (That, and my inability to type a good post when I'm in a hurry...God, that was incoherent.) The sheer idiocy of continuing to insist on root permissions to open a port under 1024 is just ridiculous. Why do we do this? In a world of Unix on the desktop, where anyone can get root, what does this mean anymore? Nothing at all: it's a totem, a fetish, and the Unix equivalent of knocking on wood for luck.

Worse, by insisting that you need to be root to open port 80, you invite all sorts of security problems. Better hope you drop privileges effectively; better hope no one figures out a way to extract r00t from any lingering privileges; better hope you didn't make one single mistake, or you'll get 0wned. Serving web pages, answering DNS queries or answering QOTD requests (ports 80, 53 and 17, respectively) do not require root permissions. (This is quite a different question from whether or not J. Random User should be able to modify web pages, zone files, or the QOTD database.) qmail, Postfix and others have shown that delivering mail doesn't need root, either. (Other applications can be taken on a port-by-port basis; the full extent of my hand-waving is left as an exercise to the reader.)

So why is there no way to let UID www send a syn+ack, but not a syn? Or to let some range of UIDs do both? Why, Lord, can't I change ownership, groups and permissions on /proc/net/ipv4/tcp/port/80 so that UID www can open this port and nothing else? How long, O Lord, how long?

There is a patch I came across today that supposedly offers this sort of thing, but again: it SHOULD NOT be an option; it SHOULD NOT be a patch; it SHOULD be built-in and used, just like we use UIDs to restrict privileges now. (The key words "MUST", "MUST NOT", "SHOULD", "SHOULD NOT", and "MAY" are to be interpreted as described in RFC 2119.)

Ahem. In other news: At Staples today I picked up a Network Everywhere BWR04B 802.11b wireless router. --I'm sorry, "Network Everywhere"? Looks like Cisco/Linksys in disguise. But it was 18 Soviet Canuckistan pesos! Boxing Day special! How could I possibly resist? Better yet, it turns out that the damn thing can run Linux. It's got 8MB of RAM, 2MB of flash memory, and something like a 60MHz ARM CPU.

The folks over at the Hardware Recycling Initiative are working on getting this and other broadband router boards running Linux. Sweet! Now to figure out how the hell to get it to work on this thing...I can identify a soldering iron six times out of ten, but that's about it.

Tags: nwr04b

Hm.


title: Hm. date: 2004-12-26 21:32:50

I don't like the WPBlacklist plugin as much as I used to. Reasons why:

  1. Stupid insistance on banning IPs. There are so many zombie PCs out there acting as open proxies that this is a waste of time. I've watched the traffic, and comments don't come from the same IP twice.
  2. Stupid insistance on looking at email addresses. When it's so easy to make up an email address, why bother? Tracking 1600 variations on byob@[some number].com is just filling up the tables.
  3. Problems tonight with Topo's blog.

Topo mentioned tonight that not only had it been a while since a new comment was posted on her blog, but a test comment posted tonight never showed up. Part of the problem turned out to be a stupid PHP syntax error I'd introduced; I'd been editing one of the files in an attempt to force WPBlacklist stop emailing re: deleted posts. (Yes, we'd turned off all possible email-me-please settings, and it still kept filling up her inbox.)

And then somehow, our IP address and my URL got put into the blacklist tables. There's no note of when an entry is added to the blacklist (FIXME!), so I can't tell when it was added -- but every test comment I made was getting caught by this. Finally, there were at least two blank entries in the tables, and I'm afraid they might have been destroying everything in sight, too.

After a little bit of browsing around, it turns out the let's-insert-a-blank-line problem has been addressed in the 2.8 version of WPBlacklist, available from the new download page. I'll give that a try. Also, it'd be nice to clear up the license -- no mention of how WPBlacklist is available, and if I'm going to work on this (and hopefully improve it), I want to make sure I can distribute any changes. I'll post a question on the forum and see what happens.

Tags:

No_br_for_you!


title: No BR for you! date: 2004-12-23 08:18:06

Thanks to these two posts, I've finally managed to turn off WP's stupid, borked, let's-throw-in-a-<BR>-tag-every-time-a-line-ends-in-the-editing-box behaviour. Since I use Mozex, Firefox plugin of the gods, this was seriously pissing me off. I agree with OtherMichael: this behaviour is a bug, and should be option-controllable.

Tags:

It's_deja_vu_all_over_again


title: It's deja vu all over again date: 2004-12-23 23:39:56

Holy crap:

IP addresses are easy to fake as well. The design principles of TCP/IP allows the sender of a packet to specify its IP address. The message will still be routed to its destination using the fake origin address. Return packets would be mis-routed, however, because TCP/IP would send responses to the true location of the IP address rather than where it actually came from. This means that IP spoofing is ineffective in situations where you need to interact with a remote server, but very effective in a one-way conversation. I can't retrieve a Web page using a spoofed IP address because I need to make the request and then have the server send me the page. But I can send requests all day long if I don't care about the response.

I thought this was just a slight muddying of the waters. But no. The VERY NEXT PARAGAPH:

Posting a comment (or TrackBack) doesn't require interaction. I can send a comment in a POST or GET message and not worry about the response if I don't care about receiving acknowledgment that it was successful.

...what, has Apache moved to UDP all of a sudden? Sweet Zombie Jesus! (And don't talk to me about guessing SYN numbers; that is not what this idiot is talking about.) (Although to give him his due, he is talking about this in an article explaining why blocking IP addresses from blogs won't work, and he comes up with a great summary: "This [approach] is fundamentally flawed because it assumes IP addresses are both unique and hard to come by.") (But oh, this is a very painful case of bending over backwards to be fair.) And then:

Now spammers have turned their attention to weblogs and comment forms. In order to increase search engine rankings you are posting advertisements to our Web pages. What you failed to understand is that bloggers are smarter, better connected, and more technologically savvy than the average email user. We control the medium that you are now attempting to exploit. You've picked a fight with us and it's a fight you cannot win. Bloggers will track you down and notify your hosting providers about your activities. We will tell your ISPs what you are using their connections for. We will let the makers of the products you are advertising know of your despicable sales methods. We will hit you where it hurts by attacking your source of income. You can move to a new host, find a new ISP, or sign up for a different affiliate plan. The end result will be the same. Each time you rise out of the muck we will strike you down and send you back to the hole you crawled out of.

Do you smell that? That is the sound of sweet, virgin superiority, fresh and and naive and unmingled. This is from Dive Into Mark. I quoted it before, but here's a bit more context:

If you want to be an anti-spam advocate, if you want to write software or maintain a list or provide a service that identifies spam or blocks spam or targets spam in any way, you will be attacked. You will be attacked by professionals who have more money than you, more resources than you, better programmers than you, and no scruples at all. They want to make money, this is how they have decided to make money, they really can make a lot of money, and you're getting in their way. This is old hat to anyone whos been involved in anti-spam efforts in other domains (Usenet and email spring to mind), but just like everything else, the weblogging community seems intent on (a) thinking they're special and unique and nobody has ever had their problems before, and proceeding to (b) ignore all the work that has come before and reinventing the wheel. [....]Someone challenged me, Well, how am I supposed to continue hosting these low-barrier discussions? I'm sorry, but I don't know. To quote Bruce Schneier, "I feel rather like the physicist who just explained relativity to a group of would-be interstellar travelers, only to be asked, 'How do you expect us to get to the stars, then?' I'm sorry, but I don't know that, either." The low barrier is exactly the problem here. We got away with it (please, come post random links on my site which is well indexed, poorly managed, and open to unlimited anonymous contributions!) because we were collectively very young and naive and thought no one could hurt us. Now it's like were turning 30 and being told we need to go on a diet and asking, "Well when can I go back to my old eating habits?" Um, you can't. Your old eating habits don't work anymore. Weblogging is growing up. Oh wait, you thought that would be a good thing? You must still be young.

It is still worth reading every single depressing and true sentence in there, if only to keep yourself from being drowned in bullshit, nonsense and fairy tales.

Tags:

Debian_irritants


title: Debian Irritants date: 2004-12-23 08:21:19

Yes, it's trouble in paradise time:

  • WTF does Vim try to connect to X?
  • WTF does grip come with gconf?
  • WTF does gconf refuse to give up information?

ARGHHH.

Tags:

Two good deeds

Well, I did the right thing today -- twice. Damn right I'm bragging.

First off, it turns out that the FreeBSD Foundation has run into a (good!) problem: its donations have been too big. In order to keep its US charitable status, it needs to have two-thirds of its donations be relatively small. Due to a couple of big donations, this ratio is a little out of whack at the moment, and they need a bunch of small donations.

Welp, I've been administering FreeBSD systems for a living for...well, I was gonna say four years, but it's more like two and a half or three. I've been working on them for four, though; my rent and food has been paid in large part because of the generosity of the people who put together FreeBSD. A donation went off in short order.

Then I remembered that I've been meaning to join the Free Software Foundation for a while now. The motivation is the same: I've been paying my bills for a long time now (and enjoying myself immensely in the process) because of the generosity of Free-as-in-Freedom software people: Stallman, Torvalds, Wall, and a zillion others. I have a hard time imagining what I'd be doing now without Free software; I suspect that, if I was lucky, I'd be working as a grocery store manager right now. So: off to the FSF website to sign up for an associate membership.

And what did I find but two, count 'em TWO cool things:

  1. If you refer three people to the FSF for associate memberships, RMS or Eben Moglen will record a message for you, suitable for voicemail, Hallowe'en or impressing the ladies. I did a quick search on Google, but couldn't find anyone with the link...damn shame. Better than a free iPod, cooler than a CmdrTaco TiVo -- join the FSF and get RMS to say "All Hail Liddy!"

  2. The FSF is looking for a senior sysadmin. God, that'd be cool. Decent enough pay (no, it's not the sort of job you take because of the money, but it's nice to think about), all the Free software you can handle, and an IBM Thinkpad to run it on. Of course, I think I'd have some 'plainin' to do about the laptop I'm writing this on...and, of course, it would mean living in the US. Frankly, that scares the crap out of me these days. Goddamned PATRIOT Act...

In other news, work continues apace. We're losing two coop students and gaining one, gaining another full-time person, and I'm still trying to get my RAID array -- credit app is with the boss, and after that's done the order'll finally go in.

Rough guess (wild hope) at this point is that it'll be in my hands in mid-January, which won't be a moment too soon. There's a new Linux server I'm setting up that I'm desperately hoping won't have problems due to proprietary kernel modules in the software I'm installing. (I'm just writing myself further and further out of that job, aren't I?)

And I'm wondering if the simplest way to get Nagios to make sure the right machines are exporting the right filesystems is to check if amd is mounting them correctly. (No matter whether the machine or amd fails, something needs to be fixed.) Or maybe I just need to figure out the right wrapper for showmount -e.)

On the spam front: good god, what a smoking hole Movable Type is turning out to be. First there were the license changes, then the comment spammers (who seem to be posting a lot more aggressive to MT than to WordPress)...Of course, comment spam affects all blogs, not just MT. Still, this whole idea of rebuilding static pages every time the stars move seems to be causing them a lot of trouble. (Yep, that last sentence was pure FUD. Or bullshit.) And okay, no, I don't use MT, so what precisely is my beef?

As I'm not going to put up, I should shut up. I still have to upgrade WP -- though according to this posting, there are still lots of XSS issues left unfixed. I'm also upgrading PHP, and I should probably use ApacheToolbox to do that automagically, rather than periodically editing my own Makefile.

The release party for Where Are They Coming From? came off JUST FINE, thank you. EVERYONE was there. Top Stars include Topo, Phil Knight and Mos Def, fresh from the set of HHGTTG. Uh huh.

Further thoughts on the MySQL + GPhoto2 thing: gphoto2 does have the ability to pipe to STDOUT, which I don't think I knew...maybe it won't be as much work to insert directly into a database as I thought. Might even be able to do it as a Perl script.

Finally: what a gorgeous day. It's downtown Vancouver on the back steps of the Art Gallery, it's sunny (in December, too) and just cold enough to make you go "brr". The skater kids are practicing their synchronised jumping -- just in time for the Olympics, I'm sure. A far-too-generous co-worker has handed out chocolate, another has handed out home-made rum and brandy balls, and I'm taking off early to go drinking with a third. Feeling pretty damned good right now.

Update: Too bad Topo's not so great -- fever of 102.8F, as of a couple minutes ago. (Still haven't figured out what that is in Celsius; bad Canuckistanian!) It's down a bit from earlier this afternoon, though, so I'm thinking good things. And these pages say to not worry if it's less than a couple days, so I'm not worrying. Nope.

Tags: wontyoupleaselendahand bsd politics meta rant hardware spam

Tcpdrop


title: tcpdrop date: 2004-12-16 16:56:38

tcpdrop looks 'way cool. More and more reasons to make my next server run OpenBSD.

Tags:

Random_reminders_


title: Random reminders date: 2004-12-16 16:53:20

  1. When compiling a Linux kernel, you need to run make config (with your saved .config file, of course!) after running make mrproper.
  2. NFSv3 support is not included in the kernel just because you compiled in NFS support.
  3. In the 2.4.28 kernel, at least, serial ATA drives are available at /dev/sd[abcd], not /dev/hd[efgh] like in Knoppix...at least, when using the libata interface.

Tags:

Fun_with_awk


title: Fun with awk date: 2004-12-15 22:58:52

As I've mentioned before, I've set up Greylisting on my mail server. The basic principle is simple: if you haven't seen an IP and email address combo before, you give them a 450 ("Come back later") error. If they come back later, you let 'em in and whitelist 'em in the future. The theory is that spamming depends on volume, and a spammer bot won't try again. One thing I've been noticing, though, is that spammers are trying again -- but from different IP addresses, which means they still don't get past the Greylisting. How many IP addresses? Looking at my logs over the last week, here's what I see:

``` ``` ``` ``` ``` ``` ``` ``` ``` ``` ``` ``` ``` ``` ``` ```
Number of connections from separate IPs Number of occurrences
1 102
2 26
3 24
4 24
5 15
9 (!) 1
Total: 190

This means that more than half try once, then give up -- but more than 46% try again. It's only because they're trying from different IP addresses that Greylisting still works. What happens when someone decides to make their bot try again from the same proxy? BTW, all this reminds me that, while it's okay doing this with awk and sort, I still need to get msyslog working...this'd be a whole lot easier in SQL.

Tags:

Random updates

After a lot of consideration, and some reassurance from JWSmythe, I'm going with the Promise VTrak 15100 array for work. It has almost everything I want: serial ATA, dual SCSI adapters, and an ethernet interface. The downside is that Promise doesn't have an office in Canada, so there's the possibility that getting parts across the border could be a problem. However, there's a local company that'll do service, so that makes me feel better.

The other options just weren't as good: one was parallel ATA and had no ethernet interface. The other was the Fastora DAS-315, which certainly looked good -- but the local resellers couldn't be bothered to give me the time of day, let alone answer the questions I had. Best bit: when I asked for a copy of the service level agreement, the sales guy replied that he'd "have to see" if he could release it.

And at home, I've been running into problems with bridging, the 2.6.9 kernel and the 8139too driver. I thought I would enable bridging on Thornhill for some User-mode Linux fun, so I enabled it as a module, then rebuilt and reinstalled the modules. However, when I tried inserting it, I got unknown symbol: br_handle_frame_hook. Okay, what about rebuilding the kernel and including bridging within it? Tried that; when I booted, the kernel panicked as soon as it came time for the onboard 8139 interface to grab an address by DHCP.

It was similar to the earlier problems I had with the Shuttle, in that if I took out the ethernet cable everything was fine -- it was only when the response came in that the kernel panicked. And keep in mind this was without setting up a bridge at boot time, or anything like that. I had to go to the backup 2.6.7 kernel in order to calm things down.

I found this thread on LKML, and it seems to match pretty closely what I saw -- the stack trace matches what I saw; I wasn't able to see the whole message, because it would scroll off the screen. However, I'm reluctant to try this patch; I spent a whole evening rebooting (Sorry, Aaron) and trying different things before I finally confirmed that having bridging in the kernel was just a bad thing.

Interesting bit: I didn't realize that Linux does not have panic core dumping built into the kernel, as FreeBSD does; it's only available as a separate patch. Minus one for Linux.

Finally, it's the day after the office Xmas party, and what am I doing? Heading into work to unplug everything. The power is being shut off in our building (thirty-floor or so high-rise) while upgrades are done, so I'm shutting everything down and disconnecting it just in case. Tomorrow I go back in to reverse the process. Whee!

Tags: hardware

Cool_little_utility


title: Cool little utility date: 2004-12-11 10:48:00

SWAKS, a command-line utility for testing SMTP. Available as a Debian package. Sample output:

$ swaks
To: aardvark at thingy saintaardvarkthecarpeted communist
=== Trying thornhill.saintaardvarkthecarpeted.com:25...
=== Connected to thornhill.saintaardvarkthecarpeted.com.
< -  220 thornhill.saintaardvarkthecarpeted.com ESMTP Postfix All Hail Liddy!
 -> EHLO rearden.saintaardvarkthecarpeted.com
< -  250-thornhill.saintaardvarkthecarpeted.com
<-  250-PIPELINING
<-  250-SIZE 52428800
<-  250-ETRN
<-  250 8BITMIME
 -> MAIL FROM:<aardvark at="at" communist="communist" rearden="rearden" saintaardvarkthecarpeted="saintaardvarkthecarpeted" thingy="thingy">
< -  250 Ok
 -> RCPT TO:<aardvark at="at" communist="communist" saintaardvarkthecarpeted="saintaardvarkthecarpeted" thingy="thingy">
< -  250 Ok
 -> DATA
< -  354 End data with <cr><lf>.<cr><lf>
 -> Date: Sat, 11 Dec 2004 09:45:49 -0800
 -> To: aardvark at thingy saintaardvarkthecarpeted communist
 -> From: aardvark at thingy rearden saintaardvarkthecarpeted communist
 -> Subject: test Sat, 11 Dec 2004 09:45:49 -0800
 -> X-Mailer: swaks v20040404.1 jetmore.org/john/code/#swaks
 ->
 -> This is a test mailing
 ->
 -> .
< -  250 Ok: queued as 594AA2CF
 -> QUIT
< -  221 Bye
=== Connection closed by foreign host.


Wish I'd known about this a long time ago.

Tags:

Comment Spam v. SURBL

A quick Google turns up this entry on using SURBL to fight comment spam. More information here. A quick look at the WP-Blacklist plugin shows it shouldn't be that hard to add a quick DNS check...Hm. And the SURBL mailing list has discussed this too:

>The quick and easy answer, which may be wrong, is that they're >different folks, or at least different domains. > >Jeff C. > Oh please don't think that just yet!! Seriously. I'm working with some ninjas and the 6dos data and a new tool to let you look up this info! So far it ROCKS beyond belief! But more coming, and trying to keep data source anonymous of course. Also trying to tie in some other tools that other SURBL submitters have been asking for. Bottom line is that these guys ARE the same people. Data shows it.

Hm. Update, Nov. 30: Double hm

Tags: spam

My_chance_for_fame_and_fortune_gone


title: My chance for fame and fortune gone... date: 2004-11-30 18:35:02

Dammit! Someone's already integrated SpamAssassin with Wordpress! Now I'll have to show my legs to get attention... Actually, something that could be useful here is a blog honeypot in order to figure out how effective different mechanisms are. That could be interesting...

Tags:

WordPress Upgrades Part Two: Comment Spammers

As I mentioned, it's been a busy weekend for Gecko and I. With anything good and joyous on the Internet come spammers. Comment spam has been a minor irritant for a while -- nothing I couldn't handle by logging into MySQL directly and running DELETE statements with extreme prejudice -- but in the last few weeks it's gone off the hook. With dozens a day, it was time to start doing something automatically.

WordPress is pretty good this way -- you can set up your comments so that everything needs to be approved by the admin, or just stuff that matches certain words in the comment or URL fields. That worked for a while -- "poker", "debt" and "cialis" took care of most things. But it isn't a very sohphisticated filter, so I started looking around for something else.

I found Fahim Farook's WPBlacklist plugin, and it works pretty damned well. It imports a copy of Jay Allen's blacklist, then holds for approval anything that matches the HOLY CRAP two thousand three hundred forty five lines of regexes (a few) and domains (the bulk of the list). Plus, you can tell it to delete a comment and harvest information from it -- so it knows to watch out for that (domain, email address) in the future. All in all, I was pretty happy.

But then Gecko pointed out this elegant solution. My first name is not so obvious ("Saint? What kinda first name is that? Damn kids..."), so I put in my own simple question.

It's a brilliant idea, really: come up with a question with an answer that's obvious a) if you're at the site and b) are not a spammer's computer. Which makes me wonder what'll happen when/if AI gets a bit more common, or if spammers will start funding natural language parsing research...shudder.

In other comment spammer news, there's a really good article here about what one guy managed to find out about a comment spammer. Finally, turns out that what I was going to say was said a year ago:

...but just like everything else, the weblogging community seems intent on (a) thinking they're special and unique and nobody has ever had their problems before, and proceeding to (b) ignore all the work that has come before and reinventing the wheel. Now, certainly some adaptation of code and algorithms will be necessary. Existing tools probably can't be used as-is. Email spam fighting relies a lot on the structure of an email, the chain of headers that give away so much information to the trained eye, and none of that information is available in weblog spam. But I see from Jay's Comment Spam Clearinghouse that the latest and greatest tool available to us is a master list of domain names and a few regular expressions. No offense to Jay or all the people who have contributed to the list so far, but how quaint! I mean really. Savor this moment, folks. You can tell your children stories of how, back in the early days of weblogging, you could print out the entire spam blacklist on a single sheet of paper. Maybe with two or three columns and a smallish font, but still. Boy, those were the days.

Holy crap. I thought I was cynical. The entire article is highly recommended.

Tags: spam

That_was_quick._


title: That was quick. date: 2004-11-29 21:15:37

http://www.google.ca/search?hl=en&q=%22tcp+over+rss%22&btnG=Google+Search&meta=

Tags:

WordPress Upgrades Part One: RSS with URLs. I mean, \"Podcasting\"

So Gecko and I have been doing some interesting work with WordPress this weekend. My wife and I visited him and the lovely Arwen on Friday, drank too much wine, and when we woke up in the morning had a lovely Logitech USB headset, originally meant for a Sony PlayStation, sitting in our laps. We also had fuzzy memories of barked instructions to start "podcasting". Wha'?

First, we had to get the headset working. On my box (Debian Testing with a 2.6.9 kernel) it was a simple matter of getting ALSA modules compiled. Since I still hadn't got around to getting my sound card going after the big move, this had the pleasant bonus of being able to listen to music again.

When everything was done, I was able to run:

arecord -D plughw:Headset | oggenc - -o foo.ogg & sleep 60 killall arecord

and have a tasty OGG file at the end of it. Sweet!

But what about Ms Topo's computer? She's running RH9, and I had no interest in picking this weekend to migrate her to something newer. I knew (well, okay, I Googled and found out) that RH didn't do ALSA, so that left me with the fun of trying to bolt it on. I tried following these instructions, and it didn't work: whenever I tried to modprobe the new ALSA modules, I got lots of "unresolved symbol" errors. NFG.

Well, what about a new kernel? Could try upgrading to 2.6.9, right? Nope: RH uses initrd when booting, and I've never wrapped my head around that. But guess what? When booting back into 2.4, kudzu found and configured the USB headset automagically. Teach me to underestimate RH...

Okay, so that part solved. Next part was to figure out what the hell "podcasting" is. And for the love o' Linus, it's just a URI in an RSS 2.0 feed that points to a thing: an image, an MP3 file, whatever. They call it an enclosure, but it's just a fucking link! RSS is the new HTML. Somebody, somewhere, is going to figure out how to do TCP over RSS, and I won't know whether to laugh or cry.

(Hey! Google finds no pages with the phrase "TCP over RSS". You heard it here first, kids.)

But back to our story. So how the hell do you get an enclosure in your RSS 2.0 feed? Well, if you're using the ever-lovin' WordPress, you can either get the Alpha nightly releases, or you can make some judicous modifications to a few files. I backed up the originals, copied the others into place, made the right changes to the database, and baaaaaaaaaaaaaam!

Last step: oh yeah, an MP3. (Stupid patented file formats...) Quick look around found Audacity, and holy crap is that cool. The first time I started it, I got a little popup:

There was an error initializing the audio i/o layer. You will not be able to play or record audio. Error: Host error.

but turning off XMMS fixed that right up. I quickly recorded two tracks, exported the mess to MP3, put it up on the server, and hey-ho, let's go! Sir Gecko checked it out, and it worked on his iPod. What is it with Apple people, anyway?

Next step: Topo and Gecko do the ADD show. Watch for updates.

Tags: