Confessions of a Spam-Catcher: How to Identify Spam
As part of my role as Lifehack’s manager, I am responsible for moderating the comments queue. Lifehack’s back-end has a “Pending” queue for comments that our spam-catching software thinks might be spam, a “Spam” queue for comments labeled “spam” either by the software or by me, and another queue for comments that have been approved, again either by the software or by me. As a general rule, I check that “Pending” queue several times a day, the “Approved” queue every day or so, and the “Spam” queue every week or so.
I’ve been doing this for two years, and I’ve gotten pretty proficient at figuring out what is and is not spam – a tough call to make sometimes, since spammers get more and more sophisticated in lock-step with those of us charged with blocking them. I present my “formula” here for two reasons: one, to give less experienced bloggers and webmasters an idea of how to catch spam on their own site, and two, to give commenters an idea of the kind of thing to avoid so their comments don’t get accidentally thrown in the “Spam” bin.
I should say, a big part of catching spam is a “feel” – intuiting that some comment just doesn’t feel right. I’m not sure I can capture exactly what goes into that feel. Andy Warhol once said that to recognize a great painting, first you have to look at a thousand paintings, and catching spam is a bit like that – the experience of having looked at thousands of spam messages cannot be easily encapsulated. But I’ll try as well as I can.
What is spam?
What makes a message spam is relative and subjective. In a sense, spam is like a weed – a weed is not any particular kind of plant, but a plant that isn’t wanted where it’s at. (See, for example, Wikipidia’s definition of Weed as “a plant that is considered by the user of the term to be a nuisance.”) For instance, Corn is delicious, but if it’s growing in your soybean field, it’s a weed. A message that, say, pimps a word processor might be perfectly welcome on a post that asks for product recommendations for writers, while on a post that just happens to mention writing, the same message could be considered spam.
Some messages are clearly spam; for example, anything delivered by a spambot programmed to leave its message wherever it can find an open form to submit through. But a message can be left by a living person, custom-written for the particular content it’s posted to, and still be spam. This list starts with the most obvious signs and moves to more vague and difficult-to-interpret signs. My guess is that a lot of people run into the ones further down the list because they post without thinking very clearly, so pay attention.
A comment is spam if it:
- Contains links to websites that are unrelated to the content.
For example, a comment might say “I think your baby is really cute!” but the word “baby” links to a site selling baby clothes or even a Forex trading site or other scam.
- Is posted on more than one post.
This is obvious, right? Real people don’t post the same comment over and over on different posts, no matter how relevant. most likely it’s a spambot responding to multiple posts on your blog that contain similar keywords.
- Contains more than one link.
While there are a few situations in which a legitimate comment could contain several links, they’re fairly rare. As a general rule, the likelihood of a comment being spam increases directly with the number of links; anything over three and it’s virtually guaranteed to be spam.
- Is not directly related to the post.
A lot of spambots (or even live spammers) crawl the web looking for posts with certain keywords and then insert a generic message loosely related to the topic on the hopes that it will slip past any human reader who is likely to just skim through their comments. Unless a comment addresses something specific about your post, it’s likely to be spam.
- Is overly complimentary.
Most spammers are fairly astute observers of basic human psychology – particularly our desire to believe good things about ourselves. So they butter us up, saying things like “Great post! In fact, I love this whole site – I’m definitely going to come back again and again!”.
- Has keywords or a business name in the “Name” field.
A basic search engine optimization strategy is to get your website’s address associated with specific keywords, and search engines look closely at the text associated with a link to determine the usefulness of the website linked to. Real people aren’t trying to game search engines, and frankly, we want to be recognized for our contribution, so we use our actual name, or a username. If you can’t imagine replying to a person by the name in their “Name” field, you’re dealing with a spammer. (For example, here’s one taken from our spam queue: “Having a good vocabulary not only gives a framework for thought. It also allows you to be concise and precise to make communication better.” This is relevant to the post, and thoughtful, but it was left by an entity named “dining room table”. It’s spam.)
- Links to a spammy business.
This is a tough call – sometimes I’ll see a thoughtful comment clearly written in direct response to the post it’s commenting on, under a real person’s name, and still mark it as spam because they link to a site whose legitimacy is questionable. Could be porn, WOW gold scams, Forex scams, get rich quick schemes, blogs with stolen content, or anything else that feels to me like someone left a comment more to get their link out than to add to the discussion.
- Quotes the post without responding to the quote.
This is a relatively sophisticated spam technique: pulling lines out of the post it’s responding to in order to make the language of the comment sound like real writing. Real people mark the quotes they’re commenting on (usually with quotation marks, but it could be by italicizing or bolding it, putting it in blockquotes, or some other means) and try to clearly separate their response form the post’s words.
- Is posted on an old post.
Old posts tend to attract a lot of spam. Real people generally recognize that if a post is a year or so old, the conversation there is pretty much over. Spambots do not realize that. It still sometimes happens that someone comments on an ancient post, but the age of the post is a big red flag.
- Is in a different language from the site.
If the point of a comment is to engage in discussion with the author of the post and his or her readers, it doesn’t make much sense to comment in a language that you’re not sure the author knows.
- Is from a Russian .ru domain.
I hate to stereotype an entire top-level domain like this. I’m sure there are Russians out there making thoughtful comments on blogs all the time. And yet I’ve never had a comment that wasn’t spam from a commentor with a .ru domain or email address.
- Tells a long, personal story.
This is experience talking – a lot of times you’ll see what appears to be a blog post in its own right in your moderation queue that starts off, at least, relevant, and is clearly written by a real person. This falls under the “Weed” heading – it might have been totally welcome except it’s out of place as a comment on your blog.
- Asks for specific support.
This is another “weed” situation: a comment on a post about, say, installing Windows 7 that asks for help with a specific problem. Unless the point of your site is to answer specific questions about computer problems, this comment is out of place. There are better and more likely places to get help than on your blog.
- Feels wrong.
Sometimes a comment just feels wrong – it is a little too smarmy, maybe, or it’s a little too formal and stiff. You click through the link and it’s a legitimate-enough site, maybe a little sketchy, but you can totally construct a case where this comment was written by a real person with something to say. The question, though, isn’t what was the intention of the writer, but what is the effect on the conversation on your site. If a comment doesn’t seem to quite fit, you’re well within your rights to “spam it”.
Anyone else have advice for would-be spam-catchers? Or for commenters who might be finding their comments relegated to the spam-heaps of history? Leave a thoughtful, non-spammy comment below!
Love this article? Share it with your friends on Facebook