Today's title from the subject line of some spam I just got. ("a
spam"? "a spammy email"? just "spam"?)
Mystery flu-like illness continues, or at least its fallout; I've
had lower back pain for the last ~ 4 weeks. Doctor says removing
spine is "not an option" but I've done some Googling and
$WORK continues apace. After taking a week of Python training, we're
using Go for a new tool we're building. Haven't got a good sense
for what it's like just yet, but so far I don't seem to be making a
mess of things.
Tried out drone.io at $WORK yesterday and holy god, is it
good. Auth with our internal Github, then activate repos, and boom!
it runs tests on every new commit on any branch, watches for PRs,
the whole nine yards. When I think of the amount of work we had to
do to get Jenkins to do this, it's insane. Plus the whole
run-as-a-Docker-container,
fire-up-sibling-docker-containers-for-tests thing is very, very
impressive.
Sportsball has started up again with a vengeance: practices on
Monday and Wednesday, games on Fridays and Saturdays. Somebody stop
this merry-go-round!
I've registered for LISA 16, woot! This will be my fifth --
wait, sixth? -- LISA, ten years after my first time attending.
Not sure who's gonna be the theme band this year -- I've done New
Pornographers, Josh Rouse, Soul Coughing and Sloan. And since he's
co-chair this year, it seems like a good time to pull out that
picture of Matt Simmons (@standaloneSA) as a PHP dev:
From: AUW-RSVP <melud.halasa@example.com>
To: undisclosed-recipients: ;
Subject: hi
Organization: Gen. Melud
My name is Gen. Melud Massoud Halasa,I was a Libyan army General in the
military force of Gardaffi in Libya,i have $23 Million Dollars hidden in
Libya,i need your assistance to move this money out of Libya to your
country,i have resigned from the army and i want to go into business in your
country as your partner.If interested REPLY ONLY VIA MY PERSONAL EMAIL
melud.halasa@example.org for more details.
I forwarded it off to a friend of mine and asked him if he knew
anything about it. His reply:
Well, Melud's been trying to move that cash for about six months now but no
one will help him. The thing is, it's actually a physical pallet of $100
bills that we had used to buy his loyalty back during the uprising. That's
why he's starting to reach out farther and farther afield, trying to find
someone who will come to Libya and help him carry the danged pallet to the
local Western Union branch. I'd say it's not worth it ... that Western Union
is at least a mile from where he has the pallet stashed in his mom's house.
I value my lower back more than that.
And, of course, by sending this email back and forth we've guaranteed that
it's being read by some automated spy system that is trying to determine if
this counts as terrorist "chatter." In the interests of being friendly and
polite, I'd like to say hello to that automated system, and perhaps to any
actual human analyst who stumbles across it.
I'd just like to say: that is the funniest thing I hope to read all
week. Maybe all month. And definitely on my top ten for the year.
Just spent the better part of five hours cleaning up four old,
out-of-date Wordpress installations after they got infected with this
worm. I host nine sites on my home server for friends and
family; I'm cutting that down to three (just family), and maybe
looking at mu-wordpress, as of Real Soon Now.
Happy Labour Day, everyone!
Update: I meant to add in here a few things I looked for, because
this info was hard to track down.
I found extra admin-level users in the wp_users table; some had
their email address set to "www@www.com", some had random made-up or
possibly real addresses, and some had the same email address as
already-existing users.
On one blog (possibly infected much earlier) I found 42,000 (!!)
approved, spammy comments.
I searched for infected posts using a query from here:
SELECT * FROM wp_posts WHERE post_content LIKE '%iframe%'
UNION
SELECT * FROM wp_posts WHERE post_content LIKE '%noscript%'
UNION
SELECT * FROM wp_posts WHERE post_content LIKE '%display:%'
About a year ago, I started using a cobbled-together system of Bash
and Perl scripts and Makefiles to put together this blog. One of the
reasons was my general dislike for PHP; another was my desire to try
living (at least in some small way) by Saint Aardvark's Axiom of
Information Utility, and try keeping this in plain text. (Another
was a desire to use Emacs to write these damn things; I want the
control that's thrown out when you start using a GUI to edit.)
But one of the problems that faced me was how to deal with comments,
and comment spam. Having a web form that allowed comments made
commenting easy, but the downside was that it made spamming easy
too. WP and others keep this down to a dull roar, but it's not perfect
and I've had problems with false positives — people being unable to
post comments because their IP address was on some blacklist, and the
plugin had made no provision for whitelisting.
I decided to lash together something that would use email. For me — a
very small, low-traffic website, with a blog devoted to a rather
obscure set of concerns and a tech-savvy audience (Hi Dad!) — this
seemed like a good choice. Email spam, for me, has been pretty much
solved by greylisting and SpamAssassin. (There's the problem of a ten
— no, fourteen — year-old email address that I've been meaning to
get changed for a while now, but that's another story; they don't seem
to do greylisting, and SpamAssassin does catch most of it.) So taking
comments by email seemed, you know, righteous, dude.
The system for comments is pretty simple: every post gets an epoch
timestamp embedded in it. (I think if you look in the HTML source, you
can see it.) I use it for sorting the order of the posts, and I use it
to generate email addresses for post-specific comments. The format is
simple: comments+(seconds since the
epoch)@saintaardvarkthecarpeted.com. The address is included in the
post, though I haven't done much to make it obvious. (This blog, and I
think this whole website, would make baby Jacob Nielson cry.)
My thinking was that, even though I was publishing the addresses, it
wouldn't matter: as I mentioned, spam for me has been mainly solved
(insert disclaimers here). Between greylisting and SpamAssassin, I
figured I pretty much wouldn't see any spam at all.
Turns out there's another benefit: the addresses have been picked up
by spam bot crawlers, but they're screwing up the scraping. From 24
days of mail logs, I see a crapload of attempts to deliver to the
wrong address:
There were more than 2500 of these messages turned away by
greylisting. They've all stripped off everything up to the plus, not
realizing (as I didn't until a few years ago) that a plus in an email
is valid.
In fact, the only attempts to deliver to legitimate comment
addresses were two actual comments to my blog…which brings up a
shortcoming: I never got that many comments with WordPress, but I
sure got more than I do now. It's possible my writing has just gone
'way downhill, but I think it's more likely that this system just puts
people off, or they're just unable to find it with my current (crappy)
design.
(One interesting problem: my wife tried to comment once, using
Lotus Notes at her workplace. It converted the plus sign into an
underscore. Weird.)
I still regard this setup for comments as an experiment. Its results
are definitely mixed; no spam, but fewer comments as well. Given the
tiresome mess that comes with the lack of an HTTP equivalent of
greylisting, I'm inclined to keep doing it.
Anyhow...that's my interesting research result for the day. You may now
talk amongst yourselves.
I've been listening to the presentations from LISA07, and I have
a few observations.
Trey Darley's presentation reminded me a lot of my last job, but
much more intense: fast growth, no control, and no budget. The
difference is that he had the experience and the chops to deal with it
well. Also, if he can present at LISA, so can I.
Andrew Hume's presentation, "No Terabyte Left Behind", was
interesting, by which I mean frightening. People mostly just trust
that hardware does what it says it does/will do when it comes to
storage. But that doesn't always work: he tells the story of a prof he
worked with who checksummed all his files once a week. When a checksum
changed — and it did about every 6 months — he'd retrieve it from
backup. His rough guess for undetectable errors: 1 per 10
terabyte-years. And we're getting to the point where that's going to
be significant very soon.
Tony Cass' presentation on grid computing for CERN was
fascinating. This is the place I wanted to work (though as a particle
physicist). UBC/TRIUMF is doing some work for this project as
well, which makes me think I should jump over.
David Josephson's presentation was interesting, as much for the
Q&A afterward as for his point. Which was? Glad you asked: that
focussing on IP-based spam filtering (RBLs, greylisting) provides an
incentive to spammers to hijack network prefixes via BGP attacks, and
generally do nasty things to the Internet; please switch to
content-based filtering post-haste. (To clarify, he was talking in
particular about fast naive Bayesian classifiers, not SpamAssassin.)
Since IP-based filtering treats IPs as valuable things — tokens that
demonstrate your email is worth accepting — spammers steal IP
addresses.
I'm not sure how much I buy his argument; he kept promising that the
BGP attacks he described were only part of the problem, but he never
seemed to get beyond that. But during the Q&A Brad Knowles
got up and said (my summary) Content filtering doesn't scale, at
least in his experience (as Senior Internet Mail Systems Administrator
for AOL). At that point, another guy got up and said (again, my
summary) that sort of thing is heard all the time, but with no data
to back it up. The responder had co-authored a paper with Josephson
that got Best Paper award at LISA '04, and they'd made damn sure to
include a ton of footnotes. If their conclusions were wrong, people
were free to challenge them; if Knowle's were wrong, they were
unchallengeable because there was no data to back it up -- it was all
just story that got passed along and became myth.
Knowles' response was "I don't have time to write papers; I'm a
technician, not an academic." Which is true, in lots of ways. And I
don't mean any insult to Knowles; he's done things I will probably
never match, we are all flooded with work, and so on. I'm one guy,
working at a small shop, with none of his experience, or chops, or
rep, or audience.
But there's a reason my .signature says "Because the plural of
Anecdote is Myth": it's to remind me that unless you can back
something up with facts, preferably written down and logged and
repeatable, all you've got is a bunch of stories that become more and
more True the more you repeat them.
It's obnoxious to sneer and say, "Cite, please"; it's worse to be
ignorant.
Lots more listening to do. If you haven't downloaded them yet, you
really should.
Earlier this week the boss forwarded some bounced emails to me and asked me to figure out what had gone wrong. The weird thing was that the email was being greylisted, so it shouldn't have bounced:
This is the Symantec Mail Security program at host
mail.globalsuite.net.
I'm sorry to have to inform you that your message could not
be delivered to one or more recipients. It's attached below.
For further assistance, please send mail to <postmaster>
If you do so, please include this problem report. You can
delete your own text from the attached returned message.
```
The Symantec Mail Security program
```
<example@example.com>: host smtpbackup.example.com said: 451
<example@example.com>: Recipient address rejected: Please
try sending again. (in reply to RCPT TO command)
Turns out that Symantec Mail Security is meant to sit in front of an Exchange server, and it turns out that Exchange has a bug (or had; I'm unsure if it's been fixed) where doesn't requeue email that's been greylisted, and later on bounces it back to the sender without ever having retried.
From what I can tell, globalsuite.net is run by guest-tek.com, which provides high-speed access for hotels…so I'm probably not the only one being asked to explain this bug. :-)
Somehow in the move of the websites and files from Linode
back to Thornhill (home server on the other end of DSL; 1.5GHz Sempron
and 1GB of RAM in a nice Shuttle box), I copied ~/.spamassassin to
the wrong directory...and wow, did this ever make a difference to spam
filtering. My mailbox was flooded with stuff coming in to an old
(12 years!) address that I pretty much just use for WHOIS contacts
these days.
I didn't realize what was going on at first, so I tried training it on
my saved spam and ham. 90k messages later, it still didn't do it
properly. I did some digging, then figured out what had happened and
copied the files to the right place. Boom — the sweet, sweet sound of
a nearly-empty inbox.
The user_prefs files were the same each time, so it was just the
Bayes token files that were different. The only thing I can think of
is that the working files were the result of training SA on its
mistakes, rather than on its successes.
Of course, I should probably just get the address cancelled or
changed…the last time I looked, well over 95% of the spam I've got
came to that address. But still, I'm starting to think that I should
be keeping the Bayes files under revision control...
From Bruce Schneier's newsletter comes this blog entry suggesting
that there simply aren't that many serious spammers. Interesting data.
Managed to get the Perl/PHP parser extended so that it would see
nested PHP arrays and translate them to the proper hash/array
references in Perl. It was good to do that, but then other problems
arise — like the fact that, as the parser stands right now, it simply
stops parsing if it finds something it doesn't understand. This could
be something like a comment in a nested array, or something like if
($debug == 1) { $foo = "bar"; } else { … }.
Again, I'm concluding that this would all be much, much easier if it
was in a database…just have PHP and Perl suck out the data and do what
they want. Either that, or just start writing everything in Perl…
The only thing I love more than a printer that does SMTP is a printer
whose default address for an email alert is name@company.com, a
legitimate (though spammy) domain. Did Donald E. Eastlake 3rd and
Aliza R. Panitz die for our sins in vain? Hm?
Upgrading SpamAssassin at work; we're using 2.63, and they're up to,
what, 3.1.0 now? The upgrade itself was relatively painless, but for
complicated reasons it was integrated with Mimedefang, and I didn't
like that. MDF is great, but:
It takes out the SA score header. This can be corrected, but
it turns the SA score into a number, rather than a series of asterisks, which makes it difficult to filter with a regex, or with Outlook. (I have SA set conservatively, but the header makes it easy to filter more aggressively if that's what you want.
Finally, MDF puts the SA report into the message as an attachment. Admittedly, it's a plain-text attachment, but that doesn't console the Outlook users who are worried (and rightly so) about clicking on attachments.
Hm. Will have to figure out a way around that; maybe just run
spamc/spamd like I currently do.
I've also got word that, due to some old prototype equipment no longer
being needed, I will have three new boxes to play with. Woohoo! I'm
already planning the DRBD fileserver.
Finally, I managed to get the new version of uClinux to compile and
run on the NWR04B. Sweet...except that I didn't check out a particular
tag, and I'm having to guess at the date when I did check out my tree,
which makes it difficult to say exactly what I've got. Currently
checking out with the date set to when I think I grabbed it, then may
upgrade/downgrade to the latest tag (currently 2.4.31).
I am fucking pissed off. Over the last few weeks, I've been noticing
attempts to spam the wiki on my website. The spammers would create a
new page similar to one already existing, and fill it full of links to
Russian linkfarms (right term? who cares?). It was annoying, and I
figured it would only get worse, but I didn't get too worried. I
deleted the pages, blocked the IP address (it was all coming from one
open proxy), and watched the changes page for further action. Last
night I checked the changes page again. It was late (well, sort of; it
had been a long day) and I was making one last check before going to
bed. Just to make sure that everything was okay, you know? Every
single fucking goddamned page had been vandalized. Every single
page that I had put up had been replaced with spam, and there were a
dozen new pages with even more spam. Over the course of maybe four
hours, all my work had been removed. My only consolation is that
Google had not visited the wiki since the changes had been made. There
were maybe a hundred pages to revert. And PHPWiki, the software I
was using, sucks ass through straws when it comes to reverting
changes. Check this out, ladies and germs:
There is no easy, documented way to revert to a specific revision of a page using the web interface. The version I was using (1.3.4) forces you to go edit an old version, then save that version. The new version I tried upgrading to (1.3.10) allegedly has "action=revert", but I was unable to get this to work: it appeared to do nothing different from "action=edit". To be fair, this may be because the spammer seemed to edit most pages multiple times, perhaps to get around action=revert. But why couldn't I find any documentation on this? All I could find was this page and the words "See action=revert".
There is no easy way to revert to a specific revision of a page using the database directly. Check it out: The database appears to store metadata in a column dedicated to compressed, cached markup. That's right: instead of breaking out metadata like revision, author IP and so on into a separate table, it's stored in the middle of a big gzipped, serialized PHP object. This means I can't do something like "delete from version where versiondata like '%10.0.0.1%'"; going to the page I've done this on hits an assert in the code that appears to check that the revision listed in the cache column is available in the pagedata table. Whee! Let's get all our programming ideas from MS Office!
As a result, I'm pulling a backup of the database from Friday in order
to get the old pages back. I'm going to dump the pages to HTML, figure
out how to script whatever changes I want to make, then leave PHPWiki
forever the fuck behind me. Shame, really, 'cos I do like the ease of
use of Wikis. But I do not have time for this fucking nonsense. Shame
on me for not remembering these words:
Someone challenged me, Well, how am I supposed to continue hosting
these low-barrier discussions? I'm sorry, but I don't know. To quote
Bruce Schneier, "I feel rather like the physicist who just explained
relativity to a group of would-be interstellar travelers, only to be
asked, 'How do you expect us to get to the stars, then?' I'm sorry,
but I don't know that, either."
Those of you looking for info on the NWR04B, please continue to leave
comments on my blog. I'll get the documentation from the wiki back as
soon as I can.
Well, I did the right thing today -- twice. Damn right I'm
bragging.
First off, it turns out that the FreeBSD Foundation has run into
a (good!) problem: its donations have been too big. In order to keep
its US charitable status, it needs to have two-thirds of its donations
be relatively small. Due to a couple of big donations, this ratio is
a little out of whack at the moment, and they need a bunch of
small donations.
Welp, I've been administering FreeBSD systems for a living
for...well, I was gonna say four years, but it's more like two and a
half or three. I've been working on them for four, though; my rent
and food has been paid in large part because of the generosity of the
people who put together FreeBSD. A donation went off in short
order.
Then I remembered that I've been meaning to join the Free Software
Foundation for a while now. The motivation is the same: I've been
paying my bills for a long time now (and enjoying myself immensely in
the process) because of the generosity of Free-as-in-Freedom
software people: Stallman, Torvalds, Wall, and a
zillionothers. I have a hard time imagining what I'd be
doing now without Free software; I suspect that, if I was lucky, I'd
be working as a grocery store manager right now. So: off to the FSF
website to sign up for an associate membership.
And what did I find but two, count 'em TWO cool things:
If you refer three people to the FSF for associate memberships, RMS
or Eben Moglen will record a message for you, suitable for voicemail,
Hallowe'en or impressing the ladies. I did a quick search on Google,
but couldn't find anyone with the link...damn shame. Better than a
free iPod, cooler than a CmdrTaco TiVo -- join the FSF and get
RMS to say "All Hail Liddy!"
The FSF is looking for a senior sysadmin. God, that'd be
cool. Decent enough pay (no, it's not the sort of job you take
because of the money, but it's nice to think about), all the Free
software you can handle, and an IBM Thinkpad to run it on. Of course,
I think I'd have some 'plainin' to do about the laptop I'm writing
this on...and, of course, it would mean living in the US. Frankly,
that scares the crap out of me these days. Goddamned PATRIOT Act...
In other news, work continues apace. We're losing two coop students
and gaining one, gaining another full-time person, and I'm still
trying to get my RAID array -- credit app is with the boss, and
after that's done the order'll finally go in.
Rough guess (wild hope) at this point is that it'll be in my hands in
mid-January, which won't be a moment too soon. There's a new Linux
server I'm setting up that I'm desperately hoping won't have problems
due to proprietary kernel modules in the software I'm installing. (I'm
just writing myself further and further out of that job, aren't I?)
And I'm wondering if the simplest way to get Nagios to make sure the
right machines are exporting the right filesystems is to check if amd
is mounting them correctly. (No matter whether the machine or amd
fails, something needs to be fixed.) Or maybe I just need to figure
out the right wrapper for showmount -e.)
On the spam front: good god, what a smoking hole Movable Type is
turning out to be. First there were the license changes, then the
commentspammers (who seem to be posting a lot more
aggressive to MT than to WordPress)...Of course, comment
spam affects all blogs, not just MT. Still, this whole idea of
rebuilding static pages every time the stars move seems to be causing
them a lot of trouble. (Yep, that last sentence was pure FUD. Or
bullshit.) And okay, no, I don't use MT, so what precisely is my beef?
As I'm not going to put up, I should shut up. I still have to upgrade
WP -- though according to this posting, there are still lots of
XSS issues left unfixed. I'm also upgrading PHP, and I should
probably use ApacheToolbox to do that automagically, rather than
periodically editing my own Makefile.
The release party for Where Are They Coming From? came off JUST
FINE, thank you. EVERYONE was there. Top Stars include Topo,
Phil Knight and Mos Def, fresh from the set of HHGTTG. Uh huh.
Further thoughts on the MySQL + GPhoto2 thing: gphoto2 does have
the ability to pipe to STDOUT, which I don't think I knew...maybe it
won't be as much work to insert directly into a database as I
thought. Might even be able to do it as a Perl script.
Finally: what a gorgeous day. It's downtown Vancouver on the back
steps of the Art Gallery, it's sunny (in December, too) and just cold
enough to make you go "brr". The skater kids are practicing their
synchronised jumping -- just in time for the Olympics, I'm sure. A
far-too-generous co-worker has handed out chocolate, another has
handed out home-made rum and brandy balls, and I'm taking off early
to go drinking with a third. Feeling pretty damned good right
now.
Update: Too bad Topo's not so great -- fever of 102.8F, as of
a couple minutes ago. (Still haven't figured out what that is in
Celsius; bad Canuckistanian!) It's down a bit from earlier this
afternoon, though, so I'm thinking good things. And thesepages say to not worry if it's less than a couple days, so I'm
not worrying. Nope.
A quick Google turns up this entry on using SURBL to fight
comment spam. More information here. A quick look at the
WP-Blacklist plugin shows it shouldn't be that hard to add a
quick DNS check...Hm. And the SURBL mailing list has discussed
this too:
>The quick and easy answer, which may be wrong, is that they're
>different folks, or at least different domains. > >Jeff
C. > Oh please don't think that just yet!! Seriously. I'm working
with some ninjas and the 6dos data and a new tool to let you look up
this info! So far it ROCKS beyond belief! But more coming, and
trying to keep data source anonymous of course. Also trying to tie
in some other tools that other SURBL submitters have been asking
for. Bottom line is that these guys ARE the same people. Data shows
it.
As I mentioned, it's been a busy weekend for Gecko and
I. With anything good and joyous on the Internet come
spammers. Comment spam has been a minor irritant for a while --
nothing I couldn't handle by logging into MySQL directly and running
DELETE statements with extreme prejudice -- but in the last few weeks
it's gone off the hook. With dozens a day, it was time to start
doing something automatically.
WordPress is pretty good this way -- you can set up your comments so
that everything needs to be approved by the admin, or just stuff
that matches certain words in the comment or URL fields. That worked
for a while -- "poker", "debt" and "cialis" took care of most
things. But it isn't a very sohphisticated filter, so I started
looking around for something else.
I found Fahim Farook's WPBlacklist plugin, and it works pretty
damned well. It imports a copy of Jay Allen's blacklist, then
holds for approval anything that matches the HOLY CRAP two thousand
three hundred forty five lines of regexes (a few) and domains (the
bulk of the list). Plus, you can tell it to delete a comment and
harvest information from it -- so it knows to watch out for that
(domain, email address) in the future. All in all, I was pretty
happy.
But then Gecko pointed out this elegant solution. My first name
is not so obvious ("Saint? What kinda first name is that? Damn
kids..."), so I put in my own simple question.
It's a brilliant idea, really: come up with a question with an answer
that's obvious a) if you're at the site and b) are not a spammer's
computer. Which makes me wonder what'll happen when/if AI gets a bit
more common, or if spammers will start funding natural language
parsing research...shudder.
In other comment spammer news, there's a really good article here
about what one guy managed to find out about a comment
spammer. Finally, turns out that what I was going to say was said a
year ago:
...but just like everything else, the weblogging community seems
intent on (a) thinking they're special and unique and nobody has
ever had their problems before, and proceeding to (b) ignore all the
work that has come before and reinventing the wheel. Now, certainly
some adaptation of code and algorithms will be necessary. Existing
tools probably can't be used as-is. Email spam fighting relies a lot
on the structure of an email, the chain of headers that give away so
much information to the trained eye, and none of that information is
available in weblog spam. But I see from Jay's Comment Spam
Clearinghouse that the latest and greatest tool available to us is a
master list of domain names and a few regular expressions. No
offense to Jay or all the people who have contributed to the list so
far, but how quaint! I mean really. Savor this moment, folks. You
can tell your children stories of how, back in the early days of
weblogging, you could print out the entire spam blacklist on a
single sheet of paper. Maybe with two or three columns and a
smallish font, but still. Boy, those were the days.
Holy crap. I thought I was cynical. The entire article is highly
recommended.
The sumbitches are at it agin', mother. Comment spam is infecting both
my blog and my wife's. So far a relatively small number of
keywords -- poker, Texas, debt -- is sufficient to keep 'em away from
where Google can see 'em. Well, that and OCD-like running of SELECT
statements in MySQL. But the fuckers are gonna be the death of me, or
at least blog comments. Although maybe some sort of SURBL plugin
for URLs in the post...that'd be cool. Someone must have something
like that already.
Not that I notice a whole lot of comments, anyhow, at least away
from the Slashdot side of things...although I do notice that
I've made it onto somebody's blogroll. How'd that happen?
In other news: I finally decided what to do about new computers: buy a
new Shuttle Sk43G, Sempron processor, and make that my web server;
then, make my current webserver (older Compaq P3-500 desktop machine)
my desktop and firewall: lots of room for ethernet cards, tape drives
and whatnot.
I agree, it's a little silly that the more powerful box becomes the
horribly underutilized server, but such is life. If there was a
comparably cheap shuttle that came with two onboard ethernet
interfaces, I'd be buying that instead.
So dive right in, right? I got the new box home last night, assembled
it and booted w/o problems. It took little effort to move the hard
drive from the web server and put it in the new, tiny box; sure, I had
to recompile the kernel (8 minutes! eat that, P90!) to get the right
drivers in, but nothing big. Until, that is, it froze. Hard. And only
a few minutes after booting. If I ran top and set it to update
continuously, I could get to freeze within seconds.
Some fiddling with Grub (boot loader of the GODS, man) showed that the
problem seemed to go away if I went with the original Slackware stock
2.4.20 kernel instead of the 2.6.7 kernel I'd last compiled. (I'm a
packrat, and that includes keeping every kernel compiled on this
damned thing, Just In Case, because You Never Know.) We've got one of
these boxes at work with an Athlon XP and it works fine; admittedly,
it's not doing much, but neither is my web server. (Ba-zing!)
God only knows what's going on there, but it didn't last: I left it on
overnight to see if it'd keep going, and sure enough it froze again
around 10pm. I put the HD back in the P3 and left it. I'm going to see
Wilco tonight (Whoo! WilCO! WHOO!), so this'll take a back seat to
some serious RAWK. Except I'll probably be speculating about crappy
memory or badly applied heatsink paste the whole time. No. No, I
won't. It's Wilco.
Actually, I'm thinking I may have to upgrade the BIOS in order to get
it to work properly with the Sempron; originally it was detected as a
900MHz Athlon, and I had to tweak the bus speed and whatnot to get it
to run at 1.5GHz. (Interestingly, this seemed to have no effect
whatsoever on how quickly it would crash, compared to the difference
the different kernel version made.) (God, that's an awful
sentence. I'm sorry, everyone.)
Anyhow, there's probably lots wrong with the settings; I never really
wanted to learn about memory spacings and CPU voltages and I don't
know what-all.
In other other news, I mentioned that I moved last week, but I
didn't mention that I came back to two, count 'em TWO dead
computers. (Before you ask: Support contracts are for the weak, and I
suspect I'm about to get very weak.) One was a Linux box whose hard
drive gave up the ghost. Stupid IDE hard drives in a dusty, hot
environment anyway! But the other was was an old Duron whose
motherboard's capacitors yearned to be one with the cosmos (ie, they
blew up real good). That was running Windows, so the whole
let's-just-throw-the-hard-drive-into-another-box-and-see-if-it-boots
thing was good for a very, very bitter laugh but little else.
Instead, I reinstalled not only Windows but Cygwin, too. That proved
to be harder; we use Cygwin to compile very particular things that
depend on version 2.2 of Python. Version 2.3 makes things cry. And no
matter how much you tell the Cygwin installer that you don't want to
upgrade Python, it goes ahead and does so anyway like some hyperactive
sugar-fueled kid who's certain he knows how to fix things.
After far too much experimentation, I did what I should have done in
the first place: I found an old archive of Cygwin, with the right
version of Python, and I mirrored it. One gigantic, nine-hour long
sucking sound later, and I had a local copy to point the Cygwin
installer at. Thank god.
Finally, just got in the first 19" LCD monitor at work. This was, of
course, two weeks after assuring someone that they were too expensive
to get past the boss. My bad. I'm going to get a lot of mean looks, I
think. But then, if I was a people person, why would I have become a
sysadmin?
Recommendation of the Day: Vicious Battle Rap, by DJ Format and
Abdominal. Bow down, baby.
A while back I set up greylisting on Postfix for my home
server. It works well, but I have the same concerns now that I did
then. The script (smtpd-policy.pl from the examples section of
Postfix' source) feels like a bit of a crock; yes, it's just the
example script, but I don't like the Berkeley DB files, and comments
in the code like "DO NOT create the greylist database in a file system
that can run out of space" make me nervous. It hasn't been a problem
-- in, oh, six months of running the file is only up to about 5.5
MB. But still: there's no provision for removing old entries, which
means an awful soul-searching battle with the database if you ever
need to trim it.
I had a brief look at the script tonight, hoping to find a way to
maybe hack in MySQL support, but decided to check with Saint Google
first. Sure enough, there's gps, the Greylist Policy Service for
Postfix. Uses C++ for speed and MySQL/PostgreSQL for the backend,
which is nice. I should be able to hack up a migration script for the
old entries (just as soon as I hack up a migration script for all the
old journal entries...), and all should be good.
One thing I'm noticing with greylisting, though, is just how many
attempts are being made from multiple IP addresses within a short
time; one attempt, today, had attempts from four different IP
addresses within five minutes, all from the same made-up email
address. The original Perl script has the advantage that I can change
it easily -- I know Perl, and I'd be pretty much starting from scratch
with C++ -- and maybe add the ability to track this sort of
thing. It'd be nice to be able to tarpit attempts to do this, say on
the third attempt.
Tarpitting...another problem with Linux. The TARPIT module for
netfilter has yet to be updated to work with the 2.6 kernel, and I
really don't want to switch back to 2.4 just for this. LaBrea is
nice, and I'm running a lashed-together natd configuration on my
FreeBSD firewall box in conjunction with LaBrea running on my desktop
on a second interface. It works, but it doesn't work in the case of a
Linux webserver running on its own, outside the main firewall. I'm
even less a kernel hacker than I am a C++ programmer, and figuring out
the compiling problems and changed skbuff route structures (say) is
beyond me. It's things like this that make me want to move to
OpenBSD. Yeah, rebuilding a server and learning a new firewall
language is a pain in the ass, but at least it's one I can handle.
So a while back, Slashdot posted a story about TheBulkClub.com,
an online forum for heathen cowfucking spammer scum ("Suppose you were
a lying, sociopathic thief. And suppose you were a spammer. But I
repeat myself." -- Mark Twain) that, sadly, left its membership list
and other goodies exposed.
Being the good citizen that I am, I posted a reply that, I
flatter myself, was both informative and helpful: it pointed the way
to several mirrors of the information, including one on my own
site. Well, what do I receive the other day but this charming
email:
Date: Wed, 11 Aug 2004 10:23:03 -0700 (PDT) From: EmailSupplyNET <emailsupplynet@yahoo.com> Subject: Question about website To: aardvark@example.com Hey, I like (part) of your website, http://saintaardvarkthecarpeted.com It's informative. There was something on your site about "thebulkclub.com" Did you create that site for them or something? I run an email list site and am trying to contact them for advertising on their forums/boards... Any ideas/help? Thanks in advance, Thanks, www.EmailSupply.net EmailSupplyNet@Yahoo.Com 877.426.6636 --------------------------------- Do you Yahoo!? Yahoo! Mail is new and improved - Check it out!
It's quite the site. They offer a sample list -- 4MB of email
addresses, meant to be a sample of the up to 14 million you can
buy. I must warn you, it would be wrong to run this command:
while [ true ] ; do
wget http://www.emailsupply.net/sample.txt -O /dev/null
done
So don't do that. But my question is, what should I do? I'm open to
ideas, suggestions, thoughts, plans and dicta.
I've been trying to come up with a way to tarpit formmail spammer
probes/attacks, and I haven't had much luck yet. This is an outline of
what I've done and what I plan on trying next. If anyone has any
thoughts on this, please let me know. In particular, I'm looking for
any approaches I've overlooked; I'm sure there's a lot.
Background: Matt's old version of Formmail, up to at least version
1.9, had serious terrible bad vulnerabilities that would let a
spammer use it to send any email anywhere, no matter how much you
tried to secure it.
At my last job (ISP helpdesk), I get complaints every now and then of
spam coming from our mail server; it was almost always spammers using
Formmail to do their dirty work. I'd have to track down websites where
an old copy of Formmail was lurking, shut it down, and try to clear
the mail queue of as much crap as possible. (This got a lot easier
once I discovered [ngrep|http://www.packetfactory.net/projects/ngrep/]
and had the root password
~SlashdotJournal_29August2003). Eventually I went and replaced
all the copies w/the NMS version of Formmail, which did the trick
wonderfully. I could drop it in to a website under attack and it would
work right away: spam would stop, legitimate requests would still
work.
I still get Formmail probes on my website all the time. A while back,
I decided to send the spammers something more than just a
[404|http://saintaardvarkthecarpeted.com/wheredidthispagego?]
page. Using Apache's ~ScriptAliasMatch directive, any request with
that matches "/cgi-bin/formmail" (case-insensitive) in the URL gets
redirected to my copy of (ta-da!) Formmail Weasel.
Formmail Weasel is a boringly simple Perl script that parses the
request made to it, logs everything to a database, and displays an
innocuous "thanks for the submission" page (not that the robot ever
read it). There's another script that displays the last ten requests
in horrible tables. That's it so far.
Once, I got curious and sent off a fake reply to an address mentioned
in one of the probes, making it look like a vulnerable Formmail script
had been found. (Future plans for Formmail Weasel include the ability
to send off these fake replies automatically, and x-ray vision.)
Within a week, there were all kinds of attempts to send spam going on
-- maybe one a minute or so. After a few weeks of this, the spammer
figured out that it wasn't working, and stopped.
That was interesting, and moderately gratifying, but I wanted to cause
pain. I want to imagine spammer wails of dismay. Tarpitting
immediately leaped to mind. But I can't simply tarpit port 80 and be
done with that: I'm still running Apache to serve a few websites, and
I don't want to interfere with that. Besides, Formmail probes go by
website, not IP addresses, so I need to have www.somethingorother
resolve to my server in order to attract scans.
First I decided to try directing Formmail requests to a separate
port. Using Apache's ~RedirectMatch directive and a separate
~VirtualHost thingy, I sent all requests for formmail to port 2348
(aka port random) where Formmail Weasel would be listening and Apache
would be logging. For good measure, I set up tcpdump too.
My first hope was that the probe robots looking for Formmail scripts
would follow the redirect, and I'd be able to capture the traffic on
port 2348 w/tcpdump for analysis. ("Lookee here: spammers use SYN
packets! Guess we know what to look for now, professor!")
My second hope was that I could provoke an attack by sending off a
fake reply, and see whether the attack robots would follow the
redirect. Maybe, if I was extraordinarily lucky, I could just tarpit
port 2348 and be done with it.
I forgot about it for a week after sending off the decoy email. Today
I checked the Apache logs and the tcpdump file: nothing on either
one. But when I checked the main logs for my website, there had been
half a dozen requests for formmail; the robots simply didn't follow
the redirect. I made sure that the redirect was still working, then
cried for a bit.
As I see it, this leaves me with a couple options that don't involve
deep heavy network hacking:
Leave Formmail Weasel the way it is: an essentially passive annoyance to spammers.
Be a little more crafty.
From the requests I've seen, Formmail probes will look for a few
common variations on the extension (.pl, .cgi) with some
capitalization variations thrown in for good measure (formmail,
Formmail, ~FormMail). This gives me a way of distinguishing an attack
(an attempt to send spam) from a probe (seeing whether or not there's
a script that can be exploited).
Formmail Weasel could designate one of these (let's say ~FormMail.cgi)
as one that is the signal of an attack. Probes that came in for other
variations would result in an email being sent off to the spammer, but
with the attack address in it. In other words, any probes for
FormMail.pl, formmail.cgi, or Formmail.cgi would result in an email
back to the spammer indicating that ~FormMail.cgi was successful. At
that point, the spammer (hopefully) takes the bait and begins the
attack.
At this point we can use the [Linux iptables string matching kernel
module|http://www.netfilter.org/documentation/pomlist/pom-extra.html#string]
to look for packets that have the request in them, and tarpit
them. You'd have to be specific about what exactly to look for:
something like "GET ~or /cgi-bin/~FormMail.cgi", plus the
host/site/whatever directive. But this is a small enough part of the
request, and close enough to the beginning, that it should serve as a
way of flagging that address as one that should be tarpitted.
Another option, and probably an easier and more effective one, would
be to have Formmail Weasel set up separate iptables rules to tarpit
the addresses that are part of the attack. You could age them and
phase them out after a short/long while. One possible problem with
this is that I've seen Formmail attacks that have come from many
different IP addresses simultanteously; these usually end up being
open proxies. You'd have to take care not to flood your firewall with
tarpitting rules.
My prayers are answered. Submitted this as a story, but got rejected;
in case it doesn't show up, have a look: SpamAssassin for
Windows, Perl Artistic License, easy to set up. Just trying it out
now. Slow so far, but it's in beta.
Found out about it here. And read this while you're at it.
Didn't think it was ever going to happen, but I finally got spam today
on my [spider-trap address|SlashdotJournal_21November2002]. Helen
Baker, who appears to be pretty active, emailed me today. About
time, too. Can't believe I posted that back in November.
They're located in [San
Jose|http://www.coolstats.com/helpdesk/contactus.html], though their
servers appear to be in China (surprise). Sadly, the California
Attorney General's office is only interested in spam that, among
other things, is received by California residents. Fair enough, I
guess.
Now if only Ms Helen had a Slashdot account and I could mod her
down. Heh...wonder if there are any spammers w/accounts on
Slashdot. That'd make for an interesting time...
Just for fun, a couple days ago I added a link to the index page of
my website to a hidden page. On that page was a mailto: link with a throwaway address for
my domain. I wanted to see how quickly it would get picked up, and how
quickly I would get spam for it.
Well, the first bit has happened. I created the page at 6.41am local
time on November 19; at 2.07pm that same day, it was spidered, then
again at 2.40am this morning (Nov. 21).
The first spidering appears to have been done by
[Thunderstone|http://www.thunderstone.com/], so I don't think there's
too much to worry about there. I'll have to set up a robots.txt file
to keep the nice spiders out. The second, however, is from a NY
ISP, so I'm guessing something will come of that.
It would be interesting to figure out the average time-to-live of a
published email address: how long it can be on a webpage before it
gets spammed (and will therefore be spammed unto the end of time, yea,
and beyond). This would be like Lance Spitzer's research into the TTL
of an unpatched Win98 system on the Internet (Dammit -- all I could
find was [this
link|http://amsterdam.nettime.org/Lists-Archives/nettime-l-0106/msg00126.html],
but I know I've seen the original paper somewhere...), or the idea of
mailpings mentioned in this excellent book (track email delivery
time to a given address to monitor performance/health).
SpamAssassin is set up now on our new front-end mail server, and it
pretty much rox. Got it going this afternoon, and it hasn't fallen
over or anything. We even took the other front-end box out of
round-robin dns, and the new box has held up perfectly well.
For the record, we've got a 1.4GHz Athlon w/512MB RAM doing about 100
messages a minute right now (in + out), and sending 'em all through
SpamAssassin via spamd/spamc. Threshold is set to 15; not as
aggressive as I run it at home (8) or as it runs out of the box (5),
but we have had some false positives in the first little while (only a
few). Load is noticeably up, but not obnoxious by any means.
We've caught about 6500 messages since turning it on at noon, which is
a little -- no, wait, just fired up bc -- a lot better than our
previous average. (Please note that this graph will now be
hopelessly messed up until I get it set up again to monitor
spamcatchin' on the new server.)
I got into work today and found that the mail server had just come up
after *half a fucking hour* of being down because of the insane
load placed on it by spam -- just spam -- coming in. The owner of the
company couldn't send email. I started setting up the new mail server.
And it was nice. I got to go away, away from the help desk, sit down
and figure out how to make it work. FreeBSD's vinum + Promise raid
controller == kernel panic (details later on). Finally got vinum
figured out -- I've only worked w/it once before -- and before I was
grabbed back to help desk had the disk setup about 80% done.
So some more details: there's 4 x 40GB maxtor IDE drives. (Yeah yeah
yeah SCSI.) We've got an onboard Promise controller chip; I'll put in
the mobo tomorrow and make this all seamless. First it turns out we've
got the Promise Lite (Less Filling!) BIOS, which means we can only
have one (1) array of two disks; the other two disks can be single
arrays on their own, which is useful in some alternate universe I'm
sure. So okay, try setting up one mirrored (Raid 1? 0? I can't keep
'em straight) array, and we'll use vinum to tie it together with the
other single drives...
Only as soon as I try using vinum to do _anything_ with the
Promise'd arrays, BANG: kernel panic. This is 4.6, not the latest
(4.7RC1 as I type), but still. Arghh. Doesn't matter whether vinum
tries raid 0, 1 or 5 -- just panics right away. If I had more time and
a box of my own to fool around with, I'd try [Michael
Lucas'|http://www.oreillynet.com/pub/a/bsd/2002/03/21/Big_Scary_Daemons.html]
SlashdotJournal_25September2002-02 (Buy his book!) and
contribute something useful to the FreeBSD folk. Alas, it's not my box
or my time, and if I were to post this message to
freebsd-hackers-important-vinum-people tomorrow I'd (deservedly) get
laughed at so hard I'd feel it over the ether.
Anyway. Point is I can't get vinum to play nice w/the Promise'd chip
even as an IDE controller. The BIOS of the box allows you to turn the
Promise chip on, off, or to ATA/IDE; but even set to the latter, it
panics once vinum touches /dev/ar*. You have been warned.
So get vinum using the four drives on the first two IDE channels, and
that works fine once I learn the intricacies of disklabel (set type to
vinum, kids!) and vinum init (and that takes a long time w/3*35GB
partitions^H^H^H^H^H^H^H^H^subsooperplexen). 1 5m 5o 133t!
OT: One of my side notes was going to be about how I'm posting this
w/Lynx 'cos Mozilla won't let me use vi, editor of the Elder Gods, as
an editor. Then I realized I could have just fired up a shell and used
vi in there. Sigh. Rumours of my cleverness have been exaggerated.
So I set up a honeypot here at home, to try and learn a bit about
computer security. I don't know a whole lot about security beyond the
obvious (strong passwords, ssh, turn off services, firewall), so I
figured this would be a good way to learn. I took an old Pentium,
installed Red Hat 6.2, and away I went.
Welp, as the good folks at Project Honeynet suggested, the first while
was spent making mistakes and learning from them. First, I went for
the default workstation install -- which meant no services
running. After a day, I took it down and installed a default server
install. Next, I watched as there were a million probes for NetBIOS or
IIS (there's a guy at work with a Win98 box at home on cable w/no
firewall...I should show him the logs), and then...aha! SunRPC probes!
Whee! ...only I was firewalling the replies. D'oh!
That was last weekend. I didn't want to leave it running w/o me being
around to keep an eye on it, so I left it 'til this weekend to turn it
back on. Friday night I booted and watched.
...and then it happened: inside of *ten seconds* the cracker
detected the ftp server and rooted me. I was agog; all of a sudden I
was watching commands being typed in by the cracker, who had logged in
with the new user ID he'd just added for himself.
Unfortunately, the timing was bad (silly cracker!). My wife's company
was having a [boat cruise|http://www.konawindscharters.com/] that
afternoon, and he got in literally ten minutes before I had to
leave. I watched for a little while, then shut everything down and ran
out the door. (Not that I was sad to go. The boat cruise took us up
Indian Arm and it was absolutely amazing: beautiful weather, free
food and Bheer...a gorgeous day.)
I'll add more on my honeypot later, but it was pretty stock:
RH6.2, firewall, tcpdump, Bash patch to log commands, logging
offsite. The one thing I forgot to do was run tripwire.
Music: such a cliched thing to add to something like this (can't even
bring myself to say "weblog" or "journal entry"), but: Harry Belafonte
and Kate Bush. Old Harry Belafonte is so very much fun; Kate Bush's
"Running Up That Hill" is incredible.