My Plan For Spam #

Chart of spam levels (increasing trend) A couple of months ago I posted an entry describing my spam filtering status. At that point, I was very happy to see that my weekly level had declined from ~4,000 to ~2,000 messages. That decrease coincided with a net-wide downturn in spam levels, and was additionally boosted by some tweaks I had done to my rules.

I'm sorry to say that the respite was purely temporary; as the chart shows I'm now getting ~16,000 messages a week. Things have gotten to the point where filtering attempts are breaking down. For example, a recent spammer tactic is to stuff messages with random words in attempt to dilute the overall "spammy-ness." Although, as described by Paul Graham in his FAQ, this won't get around Bayesian filters, it does have another side effect. With enough of these messages being received (i.e. in my situation) then these filler words also begin to acquire a positive correlation with spam messages. With my corpus thus "poisoned," even completely innocent messages will be marked as spam.

The net result of this is that I've been forced to lower the weight of the Bayesian rule within SpamAssassin so that by itself it is not enough to bring a message's score over the spam threshold. I am now forced to depend more heavily on the other rules (that look for broken headers, certain words, etc.) that SpamAssassin provides. However, since these rules are universal (and publicly available), spammers can (and have begun to) tune their messages against them to make sure that none are triggered. This explains the decrease in effectiveness in the latter half of May.

I have also noticed other tricks, notably a couple involving the "Subject" header. I have set SA to prepend "**JUNK**" to the subject of any message over the threshold, and then by sorting by subject's Junk folder (I'm using its built-in filtering in combination with SA), I can see what was marked by SA. One way to work around SA's marking is to not have any subject header at all, in which case it appears to do no prepending at all. Another way is to have two headers, in which case SA modifies only one of them. It just so happens that Mail prefers the opposite one from SA. The net result in both cases is that these messages are not sorted with all of the spam, and thus I'm forced to check them by hand (since is less forgiving than SA, those that SA doesn't mark as junk have to be checked by hand for false-positives).

I could probably make some tweaks to make the situation more bearable (e.g., fix SA to deal properly with the Subject tricks, play around with the rule scoring a bit more, etc.) but that still wouldn't work in the long run, and it still would do nothing about the torrent of spam messages that ends up in my (Junk) mailbox. Therefore I have in mind a more drastic solution:

Ideally, this would involve shutting down's email access entirely, and moving to another domain (e.g. this one). However, that's not feasible due to the number of copies of Iconographer that are floating around and contain email addresses at (in addition to other places and people that may have address there). The next best thing is to switch to a challenge-response system, a la Mailblocks. It would probably have to be a home-grown solution, since there are a few tweaks that I'd want to make in order to make the transition as painless as possible. First of all, I'd want the entire thing to reside on my servers, since solutions that involve forwarding or redirection would mean an increase in traffic and generally seem rather brittle. I would also have to do extensive whitelisting, supporting not only addresses (ideally I'd upload my "Sent" mbox and extract people's emails from there) but also keywords (e.g. all emails containing "Iconographer" and such words). I don't know if there are any open source solutions that I could build on, but if not, it should make a fun summer project.

Post a Comment