Stopping spambots with hashes and honeypots

Sunday 21 January 2007

Up until about six months ago, I was preventing spam on this site using a keyword list. As new spam would arrive, I would update the list to prevent it. It was a pain. Six months ago, I changed my comment form to use a number of tricks to make it difficult or impossible for a spambot to successfully post a comment. In that time, I have had 450 real comments, five spam comments (almost certainly by people), and perhaps 2500 failed spam attempts. That is a good ratio.

I’ve written up how I do it: Stopping spambots with hashes and honeypots.

Last week, Damien wrote Negative CAPTCHA, about fooling spambots into identifying themselves with invisible fields. This is a component of my technique, so his post spurred me to explain what I do to keep spam off this site.

I was particularly interested in the comments on Damien’s post, as they show the variety of know-it-alls that boldly proclaim facts that are plain wrong, or miss the point.

For example, about the possibility of spambots properly parsing forms with invisible elements, Guymac wrote:

It’s a simple DOM method call to determine if an element, say a form field, is visible or not. So Bot writers could trivially work around this technique.

I was amused by his use of the word “trivially”, since it neatly glossed over the need to base the bot on a browser infrastructure, and ignored some of the ways that fields can be made invisible.

My technique works well on this site. Maybe by writing it up, we can get some more good ideas flowing. I picked up some tips from commenters on Damien’s post that I have now integrated into my system.

Comments

[gravatar]
Manuzhai 4:26 PM on 21 Jan 2007

Welll, if you talk about input type="hidden", which I think Damien actually mentioned, then it's easy to detect using something DOM-like, and you don't need browser infrastructure to use the DOM. Parsing the page using libxml2 (which has a HTML mode), for example, would allow XPath queries for input[@type = 'hidden'] just fine (but I agree that it's not really all that trivial, and this battle is all about making the detection procedures as untrivial as possible without adding too much complexity).

[gravatar]
Lorenzo 4:47 PM on 21 Jan 2007

Never tried fighting spam but what about hiding through CSS with "display: none;"

Also how come spammers don't use a browser engine such as gecko or mshtml to parse pages?

[gravatar]
intepid 5:41 PM on 21 Jan 2007

I can confirm that a bunch of dummy fields hidden via stylesheet works a treat... I have been using this method for more than a year with virtually no automated spam getting through. Yes it would be possible to defeat by parsing the stylesheet as well, but at this stage I don't think spammers will bother.

[gravatar]
Jan 5:26 AM on 22 Jan 2007

@Manuzhai: Damien talks about hiding form fields with CSS as Lorenzo suggests.

Add a comment:

Ignore this:
Leave this empty:
Name is required. Either email or web are required. Email won't be displayed and I won't spam you. Your web site won't be indexed by search engines.
Don't put anything here:
Leave this empty:
URLs auto-link and some tags are allowed: <a><b><i><p><br><pre>.