Email spam is bad, but blog spam is even worse. When a junk message arrives in your inbox, only you see it; when a junk message gets posted to your blog, anyone who visits can see it. Spammers use automated programs to post comments on blogs, partly to attract people to their sites, but mostly to improve their rankings in search engines like Google. Comment spam was so frustrating, and stopping it so elusive that I turned off comments for a year. When a new version of MovableType came out that offered better spam protection, I turned comments back on. The filtering worked, and it was good.
Nothing good lasts forever. Starting a few months ago, bunches of comments started to get through the filters. Spammers are like viruses: they adapt. Before long, I had comments all over my site that I was embarrassed to read, much less have other people reading. To give you a sense of the volume that we’re talking about, in the last three years I have received 191 legitimate comments. In the last seven days I have received 1,141 spam comments. Here’s what this last week looks like:
And if anything, the spammers are speeding up. In three years, I’ve received 17,431 spam comments, which is about 100 a week on average. In the last seven days there were over a thousand.
What to do? First, I resolved not to turn off comments again. I have slowly but surely begun attracting feedback; I don’t want to squash what little progress I have made by eliminating feedback completely. There are services that make readers sign up for an account to comment, but the associated overhead for my readers is enough to discourage activity. My default spam filters were catching 97% of junk comments, which is pretty good, but the extra 3% meant that five or six comments showed up on my site each day, most of which were X-rated. I don’t have daily access to the internet, so often comments accumulated and remained on the site for days. While 97% may be an A in school, it’s not good enough in this case. I thought I could add some observations to get my spam filters back on-track for near 100% accuracy. I went wading in the spam pool to see what I could find in the comments that were slipping through. Here are some samples.
Mr. Soma Sonic writes:
Good site. Thank you!!!
Mr. Buy Viagra Online Plz writes:
Fine and pretty site! Very good owner!
Mr. Online Phentermine Safe Trust writes:
Cool site. Thanks!!!
and Kathy writes:
Thanks for your great site! Please visit my homepage too:
I noticed one common thread was some compliment, e.g. “awesome site!!”. Spammers, you flatter me. I take pride in my site, but it’s a little minnow in the giant ocean of the Internet. Awesome? You have me confused with FlashEarth. Cool? Maybe you meant Google’s GapMinder. Perfect? I can’t even comprehend the metaphysical implications of a perfect website. The truth is that I am normal, and I added a spam filter to take advantage of being average. Now every comments that contains “perfect,” “nice,” “good,” “great,” “cool,” or “awesome” followed by “site” goes right into the trash. Amazingly enough, that rule covered 252 of the 1,141 in the last week. These spammers praise early and praise often. No more; now the praise falls on deaf ears. I also blacklisted a handful of recurring words, some of which I don’t care to write here, and others like “slot machine,” “credit card,” “ambien,” “cialis,” and “hardcore.”
This was a step in the right direction, but junk comments kept trickling in even with my additions. Spam protection is an ongoing battle: once you’ve made some development to counter spammers’ tactics, spammers change them, and the race starts anew. I doubt that any spammer will read my blog and stop leaving “Great site!” comments, but certainly someday the content will change. Plus it’s a waste of my time to have to read through junk comments, trying to identify the newest common characteristics. This is ultimately why spam filters and blacklists aren’t the ideal approach. Paul Graham, a well-known programmer, explains the reasons in his seminal essay on spam protection and points to the new direction, which is spam protection based on Bayesian filtering. This means that automated systems have adaptive algorithms that continually re-evaluate what message characteristics correlate with being spam. Once a lot of people flag messages with the phrase “buy cialis” and many outgoing links, the system will learn that messages like these are likely to be spam. I signed up with Akismet, a free service that evaluates all my comments using this system and marks them as legit or spam. Since then, I haven’t received a single junk comment. As far as I can tell, no real comments have been accidentally marked as junk either.
Do your worst, spammers. I’m ready. And to those real human beings who want to leave a comment like, “Totally awesome site!!! By the way, I have cheap phentermine that I’m unloading at bargain-basement prices….,” my apologies. Chances are your message will end up in my circular file cabinet.