Fun_with_awk


title: Fun with awk date: 2004-12-15 22:58:52

As I've mentioned before, I've set up Greylisting on my mail server. The basic principle is simple: if you haven't seen an IP and email address combo before, you give them a 450 ("Come back later") error. If they come back later, you let 'em in and whitelist 'em in the future. The theory is that spamming depends on volume, and a spammer bot won't try again. One thing I've been noticing, though, is that spammers are trying again -- but from different IP addresses, which means they still don't get past the Greylisting. How many IP addresses? Looking at my logs over the last week, here's what I see:

``` ``` ``` ``` ``` ``` ``` ``` ``` ``` ``` ``` ``` ``` ``` ```
Number of connections from separate IPs Number of occurrences
1 102
2 26
3 24
4 24
5 15
9 (!) 1
Total: 190

This means that more than half try once, then give up -- but more than 46% try again. It's only because they're trying from different IP addresses that Greylisting still works. What happens when someone decides to make their bot try again from the same proxy? BTW, all this reminds me that, while it's okay doing this with awk and sort, I still need to get msyslog working...this'd be a whole lot easier in SQL.