Out with SpamAssassin; In with a multi-pronged “total annihilation” of spam
For several years now I’ve been using SpamAssassin as my primary tool to fight the spam arriving at my mail server. The performance of SpamAssassin has never been great (and using spamd/spamc is not a real fix – it merely avoids the startup overhead, doing nothing for the speed of the classification engine). Thus, about 6 months ago, to lower the overhead on my machine (a UserModeLinux host from bytemark with a mere 64 MB of RAM), I moved the RBL checks from SpamAssassin into Exim. This was a big help, but detecting and filtering spam from blacklisted domains only stops approx 30-40% of spam – the rest still goes through the heavy SA.
To compound problems, spammers have increasingly been finding ways to get around the Bayesian DB I’ve trained with SpamAssassin, which means I’ve spent a large amount of time re-training SpamAssassin with mails it got wrong. Well SpamAssassin is a really slow learner and today I’ve had enough, thrown it away and decided to pursue a new strategy of “total annihilation”. What this means in practice is that I’m now using 3 bayesian classifiers at once – BMF, QSF and BogoFilter – all of which are written in C, so crazy fast (training on a few thousand mails takes seconds instead of minutes). Every incoming message is fed through all three of these engines and if 2 or more of them agree on it being spam it gets sent to a spam folder. If one of them didn’t agree, but the other two gave it a 100% spam rating, that dissenting one will be auto-trained. In addition I’ve also added Rik’s PSBL blacklist to my exim config, to backup my existing use of SpamHaus.
NB. I can’t really take credit for having come up with this plan – I’ve used the procmail recipes from acme.com‘s bayesian mail filtering pages with minimal change. Hopefully this will prove to be a winning formula – certainly early indications are very promising, but of course only time will tell.