Spam sucks. Any site which allows unauthenticated users to submit forms will have a problem with spamming software (spambots) submitting junk content. A common technique to prevent spambots is CAPTCHA, which requires people to perform a task (usually noisy text recognition) that would be extremely difficult to do with software. But CAPTCHAs annoy users, and are becoming more difficult even for people to get right.

Rather than stopping bots by having people identify themselves, we can stop the bots by making it difficult for them to make a successful post, or by having them inadvertently identify themselves as bots. This removes the burden from people, and leaves the comment form free of visible anti-spam measures.

This technique is how I prevent spambots on this site. It works. The method described here doesn't look at the content at all. It can be augmented with content-based prevention such as Akismet, but I find it works very well all by itself.

Know thy enemy

By watching how spammers fail to create spam on my site, there seem to be three different types of spam creators: Playback spambots, form-filling spambots, and humans.

Playback bots

These are bots which have recorded POST data which they replay back to the form submission URL. A person visits the form the first time, and records the form data. Certain fields are marked as slots to be filled in with randomized spam later, but the structure of the form is played back verbatim each time. This includes the names of the fields, and the contents of hidden fields.

These bots don't even bother looking at the form as served by the site, but blindly post their canned data to the submission URL. Using unusual field names to avoid these bots will only work for a week or so, when they will then record the new field name, and begin posting with it.

A playback bot can be stopped by varying the hidden data on the form so that it will not be valid forever. A timestamp is a simple way to do this, making it possible to detect when old data is being replayed. The timestamp can be made tamper-proof by hashing it with a secret and including the hash in the hidden data of the form. Replaying can be further hindered by including the client's IP address in the hash, so that data can't even be immediately replayed across an army of spambots.

Form-filling bots

These bots read the form served by the site, and mechanically fill data into the fields. I don't know if they understand common field names (email, name, subject) or not. On my site, I've observed bots that look at the type of the field, and fill in data based on the type. Single-line edit controls (type=text) get name, email, and subject, while textareas get the comment body. Some bots will fill the same data into all the fields of the same type, while others will enter (for example) different first names into each of the single-line fields.

Form-filling bots can be stopped by including editable fields on the form that are invisible to people. These fields are called honeypots and are validated when the form data is posted. If they contain any text, then the submitter must be a bot, and the submission is rejected.

Using randomized obscured field names, and strict validation can also stop these bots. If the email field must have an @-sign, and the name field must not, and the bot can't tell which field is email and which is name, then the chances it will make a successful post have been greatly reduced.

Humans

These are actual people using your form. There's nothing you can do to stop them, other than to remove the incentive. They want link traffic. Use the rel="nofollow" attribute on all links, and be clear that you are doing it.

Building the bot-proof form

The comment form has four key components: timestamp, spinner, field names, and honeypots.

The timestamp is simply the number of seconds since some fixed point in time. For example, the PHP function time() follows the Unix convention of returning seconds since 1/1/1970.

The spinner is a hidden field used for a few things: it hashes together a number of values that prevent tampering and replays, and is used to obscure field names. The spinner is an MD5 hash of:

  • The timestamp,
  • The client's IP address,
  • The entry id of the blog entry being commented on, and
  • A secret.

The field names on the form are all randomized. They are hashes of the real field name, the spinner, and a secret. The spinner gets a fixed field name, but all other fields on the form, including the submission buttons, use hashed field names.

Honeypot fields are invisible fields on the form. Invisible is different than hidden. Hidden is a type of field that is not displayed for editing. Bots understand hidden fields, because hidden fields often carry identifying information that has to be returned intact. Invisible fields are ordinary editable fields that have been made invisible in the browser.

The invisibility of the honeypot fields is a key way that bots reveal themselves. Because bots do not process the entirety of the HTML, CSS, and Javascript in the form, and because they do not build a visual representation of the page, and because they do not perceive the form as people do, they cannot distinguish invisible fields from visible ones. They will put data into honeypot fields because they don't know any better.

The form is built as usual, including:

  • editable fields for all of the information we want to collect from the user,
  • hidden fields for identifying information, including the timestamp, the spinner, and the entry id,
  • invisible honeypot fields of all types, including submission buttons.

Processing the post data

When the form is posted back to the server, a number of checks are made to determine if the form is valid. If any validation fails, the submission is rejected.

First the spinner field is read, and is used to hash all of the real field names into their hashed counterparts so that we can find data on the form.

The timestamp is checked. If it is too far in the past, or if it is in the future, the form is invalid. Of course a missing or non-integer timestamp is also a deal-breaker.

The value of the spinner is checked. The same hash that created it in the first place is re-computed to see that the spinner hasn't been tampered with. (Note that this check isn't actually necessary, since if the spinner had been modified, it wouldn't have successfully hashed the timestamp field name and the timestamp verification would already have failed, but the extra check is harmless and reassuring.)

Check the honeypots. If any of them have any text in them, the submission is rejected.

Validate all the rest of the data as usual, for example, name, email, website, and so on.

At this point, if all of the validation succeeded, you know that you have a post from a human. You can also apply content-based spam prevention, but I have not found it to be necessary.

Making honeypots invisible

This is the essence of catching the bots. The idea here is to do something to keep the honeypot fields from being visible (or tempting) to people, but that bots won't be able to pick up on. There are lots of possibilities. As you can see from looking at my comment form, I've simply added a style attribute that sets display:none, but there are lots of other ideas:

  • Use CSS classes (randomized of course) to set the fields or a containing element to display:none.
  • Color the fields the same (or very similar to) the background of the page.
  • Use positioning to move a field off of the visible area of the page.
  • Make an element too small to show the contained honeypot field.
  • Leave the fields visible, but use positioning to cover them with an obscuring element.
  • Use Javascript to effect any of these changes, requiring a bot to have a full Javascript engine.
  • Leave the honeypots displayed like the other fields, but tell people not to enter anything into them.

Criticisms

Let me address a few common criticisms.

Defeatability

In theory, it is possible for a spambot to defeat any of these measures. But in practice, bots are very stupid, and the simplest trick will confuse them. Spam prevention doesn't have to make it theoretically impossible to post spam, it just has to make it more difficult than most of the interesting forms on the internet. Spammers don't make software that can post to any form, they make software that can post to many forms.

A relevant joke:

Jim and Joe are out hiking in the forest, when in the distance, they see a huge bear. The bear notices them, and begins angrily running toward them. Jim calmly checks the knots of his shoes and stretches his legs.

Joe asks incredulously, "What are you doing? Do you think you can outrun that bear!?"

Jim replies, "I don't have to outrun the bear, I just have to outrun you."

In any case, yes, spammers may eventually write spambots sophisticated enough to navigate honeypots properly. If and when they do, we can switch back to CAPTCHAs. In the meantime, honeypots work really well, and there are lots of ways to make them invisible we haven't even needed to use yet.

Accessibility

Users that don't use CSS or Javascript will be exposed to all of the honeypot fields. A simple solution is to label the fields so that these users will leave them untouched. As long as no text is entered into them, the form will submit just fine.

It works

This technique has been keeping spam off my site for a year now, and works really well. I have not had problems with false positives, as Akismet has had. I have not had problems with false negatives, as keyword-based filtering has had. Spambots may get more sophisticated, but their software complexity will have to increase orders of magnitude before they can break this method.

See also

  • Negative CAPTCHA, Damien's post about a similar technique. The comments on that post got me energized to write up my technique.
  • My blog, where many software engineering topics are discussed.

Comments

[gravatar]
mort 9:54 AM on 23 Jan 2007

Hi Ned, a great recollection of anti-span tactics, thank you.

One thing that you didn't mention and that I consider valid is checking that a minimum interval of time passes between the request of the page containing the form and its submission. Humans just won't (or shouldn't) spend less than, say, 15 seconds reading or browsing a page before submitting content via a form, while bots don't need that time at all.

So checking a 'form-generated-at' timestamp vs. a 'form-submitted-at' one and rejecting the post when those are too close makes a good bot detection method. What do you think?

[gravatar]
Ned Batchelder 10:21 AM on 23 Jan 2007

Mort, that is an excellent idea. An entire area I haven't explored here is verifications like yours that confirm the typical human behavior: read the posting, load the comment form, type a comment, then post.

[gravatar]
JF 12:41 PM on 23 Jan 2007

Very good idea, one more that could help with human spam/bad comments in general. Add a question/combo box asking what the blog post was about. For example the choices could be: timestamps, preventing blog spam, or equilateral triangles. A human would actually need to read some of the post to get it right, and a spam bot would only have a 1 in 3 chance.

It's not a bad as a CAPTCHA either since it actually relates to the content they are commenting on, and would require human spammers to at least read some of the post slowing down that rate of spam.

[gravatar]
Charles 6:55 PM on 23 Jan 2007

I was recently highly depressed by spambots. I unveiled a new, custom-written set of forum software on my site, and enabled posting without registration for a period. Within hours of this happening, the spambots appeared and started doing their thing -- on a form that they'd never seen before, even bypassing a hash system similar to one documented here.

I ended up turning off unregistered user posting, but I was really, really irritated by the experience.

[gravatar]
Ray 10:22 PM on 23 Jan 2007

Good food for thought in this post!

Here's my own logic: If we can make it (near?) impossible for human spammers to suceed, we can happily forget about spambots, as the bot problem would be solved. So perhaps we need to focus on making sure that humans who enter comments do so out of legitimate motivation.

The multiple choice method mentioned above is a strong pointer in that direction.

A nice twist to mort's delay method would be to not let the user know how much delay is required in error messages. If a user knows it's 15 seconds he's got something systematic to work with (in a way we might not want). Also, I'd suggest randomly varying the delay time for each request.

[gravatar]
Chris Neale 4:46 AM on 24 Jan 2007

How would you integrate these techniques with something like the xml validation used with this (or variations of this) technique: http://peter.mapledesign.co.uk/weblog/archives/category/questsformprocessor/

[gravatar]
Ned Batchelder 7:08 AM on 24 Jan 2007

JF: My goal here is to avoid any effort by the people I want to encourage: commenters. If we have to use those techniques, then your idea is interesting, and there are many other semantic-based CAPTCHAs that can be explored.

[gravatar]
Ned Batchelder 7:11 AM on 24 Jan 2007

Ray: I fear that there is no way to truly separate good people from bad people. If a human spammer wants to post a comment, they will be able to pass whatever Turing test we pose for them. Content-based filters like Akismet can help here, but that will be a very tough nut to crack.

[gravatar]
Ned Batchelder 7:15 AM on 24 Jan 2007

Chris: I haven't used an existing forms package for this site, so I don't know what would be involved to adapt it for these techniques. I imagine the field hashing and unhashing could be performed as a last step and first step wrapped around a standard forms library, but I don't really know.

[gravatar]
Skip Montanaro 11:48 AM on 24 Jan 2007

I had only a couple forms to validate on the Musi-Cal website. I used SpamBayes to filter the submissions. This had the added advantage of flagging otherwise valid submissions where users had mistyped something (bad date, misspelled city, etc).

Skip

[gravatar]
jjpet 12:58 PM on 24 Jan 2007

I was interested in putting this through 508 validation. Somewhat ironic, upon first look it fails due to INPUT elements requiring to contain the alt attribute or use a LABEL, not the fact that you cant see the fields. Very interesting Ned.

[gravatar]
Keith Wilkinson 8:12 PM on 24 Jan 2007

One strategy is to have the form processing program (1) capture the IP, then (2) try to send a confirmation email to the poster, and
(3) if the email address is invalid, switch to displaying a "hit the back button and check your email address" message; this stops bots and people who give a ficticious email address when they post...

[gravatar]
Senko Rasic 5:24 AM on 5 Feb 2007

For sites that require the users to have JavaScript enabled, a good way of stopping spambots could be to require that they have a full JS implementation, by presenting a challenge in the form of a javascript function that should be run in order to get some number, which should be included in the response.

The function should be randomized, e.g. using different constants, loops, math operations, so it would require the spambot to evaluate it every time.

The spambots based on e.g. firefox+greasemonkey would make it relatively easy to break this protection, but even then the spammers would need more resource (JS is not exactly fast) for spamming than they do now. Or so I hope.

[gravatar]
dan jolt 11:21 AM on 5 Feb 2007

While I like your methods, they might cause trouble to disabled people and to those with strange browsers.

In my opinion Javascript is best disabled - I use the Firefox NoScript plugin for this.

Also I think that your approach only works if the robot's screen is different from the user. I'm really tempted to try a script in Autohotkey (www.autohotkey.com) that does the following:

1. Search the site for the comment section "add a comment:",then "name:", "email:","www:".
2. Then go back to "name:" and move the mouse pointer 300 pixels to the right and click.
3. Enter name, TAB, email, TAB, TAB,Spam text,TAB,Enter
4. Reset DSL-Line for new IP
5. Reset Cookies
6. Reload your page
7. Wait 10 seconds
8. Goto #1

I wouldn't be surprised if the comment list would fill up fast.

[gravatar]
Sebastian Becker 10:25 PM on 5 Feb 2007

As bots don't know JavaScript another method would be to use captcha validation only if JavaScript is disabled.

[gravatar]
Dimo 8:51 AM on 8 Feb 2007

Hello,

First of all, congratulations for the strategy, I am using something similar, but quite simpler, and now will improve it.

One question: could you please explain what is the purpose of the PHPSESSID hidden field? I doubt that you are using it, because it would mean one cannot post if session cookies are disabled or if the session has expired. So what is the benefit of this field?

Thank you.

----

P.S. Well well well -

"You took a long time entering this post. Please preview it and submit it again."

This actually means that everyone who posts a comment after spending some time reading thoroughly your article will have to submit twice. Why not move the slider more to the usability side, after all, website are made FOR the people, not AGAINST spam bots.

In addition, please make the page position itself on the submitted comment or on the unsubmitted form after posting.

Best regards and good luck!
Dimo

[gravatar]
Jonathan 4:53 AM on 9 Feb 2007

Thanks for the article. I've been protecting my contact form with a fairly crude method that adds a random number to a hidden field, and passes the same number in a cookie. I assumed the spambots wouldn't use cookies so would be filtered out. Until this week it worked fine, but now I've started to receive spam e-mails again. I'm going to replace the random number with a hashed version, and I might try a honeypot field too.

[gravatar]
Mike Cherim 4:15 PM on 15 Feb 2007

Outstanding article. I have a (widely distributed) stand-alone and WordPress contact form that uses the techniques you describe as well as a few others. I call them all spam traps collectively, but I do like the names you've given them. :)

[gravatar]
Steven Clark 5:08 PM on 17 Feb 2007

Interesting article, I'd never thought to use a honeypot technique (I came via the WSG list by the way)... I've had, like everyone, battles with spammers both on my and my clients sites and have avoided captchas like the plague. Thanks for posting this, greatly appreciated.

[gravatar]
George 10:21 PM on 25 Feb 2007

Another way to hide text fields is to just surround them with comment tags. On my guestbook form page I set a cookie using javascript then retrieve the cookie during form processing in the cgi script. If the cookie is not found you get an error message stating javascript and cookies both need to be enabled to post. Setting it with javascript will eliminate most of the non human posters not to mention most bots can't store cookies.

[gravatar]
Kathy 12:18 PM on 28 Feb 2007

All great ideas to keep spambot form submissions from going through, but how to stop their endless attempts from spoofed and randomized IP addresses that can bring a server down?

[gravatar]
driz 5:48 AM on 9 Mar 2007

This is a great article, thank you for the insight!

[gravatar]
Sarah 10:44 AM on 20 Mar 2007

I still don't get it. I'd like to use the hidden fields via css. Where can I find more info on exactly how to do this? There has to be more to it than display:none ? How does the form not get submitted? We're bombarded with form spam to the point that I had to remove the form. Thanks for the article!

[gravatar]
olpa 11:02 PM on 8 Apr 2007

Thanks, very nice introduction. Meanwhile, I'd like to share my solution, a smart textual captcha:

Advanced Textual Confirmation
http://bbantispam.com/atc/

It works very well, easy to installation and no complaints from the visitors.

[gravatar]
Kasimir 2:29 PM on 18 Apr 2007

Two tricks I use on my site's contact form - they should work on a comment form too.

1. when the form page is served a token is created together with a timestamp - these are stored in a text file. When a submission is received the system checks:
a) does the token exist
b) was the submission too slow or quick for a human

2. I reject submissions containing the string "http://" and ask the user to remove those from the web addresses.

[gravatar]
Rich S 12:48 PM on 27 Apr 2007

I've been having to deal with these problems a lot recently (and having to make 508 compliant sites makes captchas out of the question), and I have come to similar conclusions on my own. Actually, nearly identical. Strange. Anyhow, my forms now create a hash of each 'real' field name, pass the decryption key in the session to the submission script, decrypt the field names, check that the invisible honey pot fields, which either contain a random string or nothing, are not molested between the form and the submission script, replaced all field keywords (i.e. email, comments, etc.) with the HTML numeric character equivalents (i.e. P for P...not quite sure if bots scan for keywords near fields though), scan for common spam words and HTML, and only then, if all of these conditions are met, will the form submit.

Another thought I had was using javascript requiring very simple human interaction - focusing or clicking on the page would pull all of the form fields from a linked (but dynamically generated) javascript file using 'document.write' to output the code to the page. That would require that not only the spambot be capable of running javascript, but also that it would have to download all linked .js files to the page, and actually perform an interaction with the page to write out the randomly generated form fields.

[gravatar]
kenrick 5:18 PM on 7 Jun 2007

one thing you are doing thats an extra check, is by checking the timestamp value.

"The timestamp is checked. If it is too far in the past, or if it is in the future, the form is invalid. Of course a missing or non-integer timestamp is also a deal-breaker."

If you have hashed the timestamp in the main hash, if someone screws around with the hidden timestamp field, it will never match the created hash.

So while you can still check how long it has been since that hash was generated, it would be impossible for someone to generate a future hash without knowing your secret/hash parts.

[gravatar]
N 4:31 PM on 13 Aug 2007

Excellent idea!

Things I noticed on this blog are that honeypots of this blog do not seem to have:
- the value intialized.
- the id.

So, if they have id='foo' and value='bar', writing a spambot to detect honeypots on this site would be even more difficult.

[gravatar]
Reed Hedges 1:50 PM on 21 Nov 2007

Here's a honeypot/spamtrap CGI program I use on some pages:

----


#!/bin/bash
#
# Anyone who requests this CGI gets added to the "Deny from" line in
# two files, .htaccess and .htaccess_recent.
#
# First put this CGI in your robots.txt file to prevent legitimate spiders from
# finding it. You should wait about a day since many spiders (such as Google)
# only request robots.txt once per day.
# Next, put empty links to it in various HTML pages to trick
# bad spiders into tripping it.
# After you've verified that it's working correctly, you can change
# the htaccess file below to point to a real .htaccess file that blocks
# sites.
#
# Use a cron job to run spamtrap_rotate_lists.sh to periodically move .htaccess_recent over .htaccess and recreate
# it as an empty htaccess file. This prevents .htaccess from filling up
# with old addresses.
# You must edit spamtrap_rotate_lists.sh to set the correct directory
# and file names.
#


file="htaccess" #change this to the real one.

function error() {
echo "

File error adding entry to $1.

"
exit
}


echo "Content-type: text/html"
echo
echo ""

if grep -q "$REMOTE_ADDR" ${file}
then
echo "

Already in the list.

"
exit
fi

sed -i "s/Deny from.*/& ${REMOTE_ADDR}/" ${file} 2>&1 || error "${file}"
sed -i "s/Deny from.*/& ${REMOTE_ADDR}/" ${file}_recent 2>&1 || error "${file}_recent"

echo "

Your address $REMOTE_ADDR will now be blocked from this site. This is a trap for automated spiders that do not honor the robots.txt file. Email "webmaster" at this site for details.

"



----

file rotation script (run monthly as a cron job):

----

#!/bin/bash
dir=/usr/lib/cgi-bin/spamtrap
file=${dir}/htaccess
file_recent=${file}_recent
file_default=${file}_default
mv ${file_recent} ${file}
cp ${file_default} ${file_recent}



----

htaccess_default:

----

# htaccess_defaut
# 88.151.114.* is webbot.org (webbot.ru)
# you could add more known spamming domains if you want:

Deny from 88.151.114.*

[gravatar]
Magyusz 6:40 AM on 22 Nov 2007

reCAPTCHA (http://recaptcha.net/) is an interesting project in this topic. From developer's point of view the best is that don't have to (re)invent your own CAPTCHA for your site, you can use it as a service many ways:
http://recaptcha.net/resources.html

I've just created a new possibility to use it's functionality:
http://code.google.com/p/mailhide-tag/
It is a JSP tag which helps developers to hide mail address from spambots.

[gravatar]
Sami 8:05 AM on 23 Nov 2007

Ned,

I think the methods you suggest are valid and for most, will work outstandingly.

Like with captcha, the main point is that we don't want to create traps into which existing bots fall, but traps which are completely unavoidable, even if you know the drill. That's a bit higher mathematics.

The problem is, that IF the spammer is intrested enough in your site, he/she WILL write a script to defeat all these ordinary methods. The point of captcha is to force the user make something that a computer just is not able to repeat, even if tried to teach it so.

Of course, the quest for a perfect captcha is still on. Google has done quite good, with almost 100% accuracy.

[gravatar]
Kent Johnson 3:42 PM on 6 Dec 2007

We just released a free (BSD license) blogging app for Django that implements honeypot spam prevention based on these ideas. See this post for details: http://blog.blogcosm.com/2007/12/06/developers-we-just-released-blogmaker-free-blogging-app-django/

[gravatar]
Cooksey-Talbott 8:21 PM on 10 Dec 2007

Thanks for an interesting post.

I am currently doing battle with these things and have them beat for now with custom CAPTCHAS but...

Based on the info here I will have to take further measures in the future.

As my studio manages a number of sites I want my next solution to be comprehensive and more robust.

[gravatar]
sam 10:07 PM on 11 Dec 2007

Very interesting post. Enjoyed thoroughly.

[gravatar]
Kent Johnson 7:51 AM on 13 Dec 2007

It's obvious in retrospect, but something that tripped me up is that this strategy won't work with cached pages. The page with the comment form has to be generated for each request with the proper timestamp and IP address. If the page is cached the spinner is not correct for the current user and the form submission looks like spam.

[gravatar]
Gary Hammond 8:37 PM on 16 Dec 2007

I have used a fairly simple technique to stop the bots. I have not had a single bot registration since implementing it.

After discovering the the bots are only registering at my site to promote a web site URL for search engine ranking, I have modified the 'register new user' script to barf an error message and fail the registration if the new user submits a web site URL. So far I have only had 2 people advise me of the problem when registering so I just update their web site URL manually. Very simple and very effective.

[gravatar]
gilemon 11:27 AM on 16 Feb 2008

Very useful informations Ned.

I'm developing an alternative technique called Pictcha.
It is a protection in a form of an image retrieved from UTYP engine, which can be embedded inside Web Forms, and which will filter out various spams in a more user friendly way than the well known captchas.

http://nthinking.net/miss/pictcha.html (simple JavaScript implementation)

http://nthinking.net/miss/pictcha-sample.php (server side PHP implementation)

, there is also a PHP lib for server

And as a bonus, it is learning. Thus it recycles the tremendous waste of concentration which is the conventional Form Validation by text recognition.

You may check the lab page to get more details about it:

http://nthinking.net/miss/lab.html

[gravatar]
Ambrose 4:26 PM on 25 Mar 2008

I agree completely with Kent Johnson. There are now already some sites that don't work correctly when, for example, Opera is restoring the last session.

I also find this "submitting twice" thing annoying, as Dimo said. But perhaps not too annoying. At least this is better than sites that outright discard your comments when you submit, when you just took one hour to write your comment and have not saved it somewhere.

In all, I see this (and the current status of email spam) as that the spammers have already won. I believe the spammers' true intention is to destroy the web and email as a viable means of communications, and we have, hopefully only reluctantly, in fact helped them achieve their real goals.

[gravatar]
Aleksey 9:29 AM on 4 Apr 2008

I find rel="nofollow" to be somewhat rude to the commenters.


I have an idea I haven't tried yet: use rel="nofollow" for all new comments by default and remove it when the comment is marked as non-spam. With all that being clearly indicated.

[gravatar]
On 2:40 PM on 12 May 2008

Thank you very much for this article, I guess no one was taking this dimension into enough consideration yet.
We in Web-APP, will absolutely apply some of your tips.

Thanks
On
http://www.web-app.net

[gravatar]
Bill 7:39 PM on 18 May 2008

In the pursuit of blocking bots, we need to remember people with vision disabilities. (Well, at least if you write web sites for large businesses and governments, you do.) Visually impaired people may be using a screenreader which will "see" all the hidden traps you put out for bots. And if you don't use <label/> and <fieldsets/> tags and accessibility techniques like that, you're making life harder for some people. (And if you do improve accesibility, then the bots will find it easier to spam you, but is the purpose of your forms to solicit data from people or to block bots?) For anyone who's interested, Google on "section 508 compliance" or "accessible design."

[gravatar]
Ted Cambron 5:38 PM on 7 Jun 2008

One thing I did as an experiment that worked out very well is count the number of "http://" used in a single post. If more than a set amount (I use 3) then the post is marked as spam. Spammers seem to like to use an excessive amount of links. Maybe better to compare the number of links to the size of the post. I ended up using the bots by replacing the potential spam with a random message like "this site rocks!". Like I said, this overly simple technique worked out very well :)

[gravatar]
Dan 2:44 PM on 21 Jun 2008

Any chance you could post/send some of the key code for this? I've got a little hand-coded site and I am being inundated with spambots, despite my own feeble attempts to screen the posted content.

[gravatar]
madrid 6:13 PM on 20 Jul 2008

Captcha works great.

[gravatar]
James Phillips 12:43 AM on 3 Aug 2008

One person mentioned odd browsers.

In particular, I LIKE text-based browsers BECAUSE they don't display graphics, unreadable colour schemes, or have an expensive scripting engine.

It's just too bad that the (latest) HTML standards now pretty much require the DOM.

Time-based analysis may fail in cases where the form does things like time out, and the user is forced to copy&paste their comment. (I have had to do that for one site)

[gravatar]
Xpl0si0n 1:21 AM on 29 Nov 2008

other website with you can detect status is www.yahooscan.eu also with website you can download your buddies avatar.

[gravatar]
KittyKat 7:13 AM on 7 Dec 2008

I LOVE the comments displayed when you click the I'm a spambot button, or put stuff in the honeypot fields *smiles* great system

[gravatar]
Rune Jensen 8:41 PM on 12 Jan 2009

Combine the two CAPTCHA/teqniques. That is Prove, you are a bot, and prove you are human. Like this.

You make a CSS/hidden field. To that you make a CSS/hidden label saying @Write the number eight.

Now, if the browser supports CSS, the field will not be seen, therefore you have to check for empty field.

But if the browser does not understand CSS, the field will be labeled Write the number eight.

On the serverside, you then have to check for either empty field or the number 8. This number should be random every time, or, based on the IP. The CAPTCHA method like this works 100% in basic, and it is quite userfriendly.

To get the stupid bots (meaning in general russians) even before this, you make a form and some fields in an HTML remark. Then you check serverside that these fields are not filled out, since no browser will execute this.

Furthermore, you put a single entity in the submit/button, and checks that this has been translated into a letter on the serverside. F.eks. S#entity#nd comment

About the time limit. The new isreal bots are quite clever in this matter. They will wait up til 30 seconds from GET to POST according to our statistics. Also they will act before this like they are in fact browsing the pages. So it will not stop them.

Counting hrefs. The new way of spamming goes like

Hey, I am looking for |product|. Anyone knows where i can find it

And either another bot or an (innocent) user on the forum will give the |answer|

I do not consider blocking based on content a good idea.

Lastly.
Some stupid bots can be blocked in whole or from some content, if they contains a suspicious useragent string or an empty string.

Like

JAVA meaning they are probably harvesters or
ru indicating it is russian | goes for acceptlang as well, check here for cn meaning china as well

also if it doesnt understand GZIP, it is suspicious, since even LYNX does.

All of it can be combined with javascript and cookies, which is hard for most bots to understand.

This is not all, of course. Just what I can remember from some analysis we made on our side. There are special ways of treating referer spammers and vulnerability scanners and injectors and so on.

[gravatar]
Dan Giles 9:29 AM on 20 Jan 2009

I use several tricks, in concert, they seem to work great. Some are applicable to key submission forms, others to contact forms only.

I randomize form field names twice a day, a cron job updates a file with fields such as email=SKJEJFDLFKDLKFF, name=WIWIJDSLKSLSNBV, comment=SKJDSKJDKJSKD, etc. The tokens at the end are randomized. The program that outputs the form looks up the proper token and outputs the token instead of an obvious form field name. The process is reversed on the way back in.

I also set cookie when form is requested and check for same cookie when form subnitted. Amazing how much spam this eliminates on it's own.

Never allow CC: BCC: or http: (or variations) in any thing that may forward an e-mail. This also eliminates a lot of attempts since without a link a lot of spam is kind of useless.

One of the form fields is also a timestamp, usually time() but doctored up so that it's not obvious that it's the number of seconds since 1970. The program processing the form gives an error if it compares it to the current time() and it is greater, or too much in the past. I also started inserting one letter (a random letter that changes every 12 hours) into the string, if that letter doesn't come back or has more letters or no letters, again an error. So you are a bot and you see a hidden form field like name=HDJJWIJDJWIJDIWJ value=27738G84 - tell me, what would you put into it?

I'd be happy to clarify if I typed this too fast.

[gravatar]
Josh 7:44 AM on 25 Jan 2009

Ned-

What techniques would you use for EMAIL submitted comments, such as mailing lists, to prevent spam? I run a mailing list and I want to prevent spammers from sending mail to the list. If I deny or kickout one email address, they come back with another.

I was considering making it so each time you post to the list you get a confirmation link that you need to click on to confirm the email is legit and then do a CAPTCHA to make sure you are real. I was also considering trying to incorporate a SPAM COP like list to screen IP addresses. I dont want to make it difficult for people or annoying to use the list, but if it is riddled with spammers it is useless.

Thanks for your thoughts.

Best,

Josh

[gravatar]
dsims 6:18 AM on 10 Feb 2009

great now someone will make a bot that just watches forms, thanks alot man lol.

[gravatar]
leo 4:03 PM on 13 Feb 2009

Yes, you're right, spam sucks. I use Akismet for wordpress to get off spamming and it works pretty good by now.

[gravatar]
Siedenburg 1:44 PM on 2 Mar 2009

Stopping spam is essential, but these days many CAPTCHA solutions are far too often ‘human proof’, resulting in frustrated, defecting users, forgone new account creations and less participation. We have listened to these complaints and created a solution that solves the usability flaws of text based CAPTCHA and addresses spam. It’s also free web service. We are looking for feedback on what is proving to be a more effective way. http://demo.vidoop.com/captcha/ jds@vidoop.com

[gravatar]
yuna 2:07 AM on 3 Mar 2009

hi guys.. anyone can help me? i'm doing my final project, my topic is bluetooth honeypot.. i already search but i just have honeypot for tcp stack, but i want honeypot for bluetooth stack.. anyone can give me any idea where website should i go..

thanks

[gravatar]
Bill 5:42 AM on 22 Jul 2009

I have one more solution I found that stops the bots in my forms. I added in a duplicate email input field on the form and tell the user to leave one blank (two inputs with the same variable name in the form). With a duplicate field present the variable will have two entries if done by a bot and the result will look like this, "spam@spammer.com,spam@spammer.com". Look for the second input and you will be able to detect the bot. A human input would look like; "spammer@spam.com." They will have to make the bot smart enough to leave one form field blank but not the other with the same name. Then you can randomize the fields on the form so they change locations to make it even harder for the bot to post to the correct field while leaving the correct one blank. I decided to leave the field visable for now as it doesn't look as intrusuve as the captcha does. I may need to hide it in the future as bots get smarter.

[gravatar]
yuvashri 12:39 AM on 23 Jul 2009

Thanks for giving this information. I am searching for how to change the ip address. and also i found a website for chk the ip address from .ip-details.com at a free of cost.

[gravatar]
chris holbrook 10:23 AM on 5 Aug 2009

Nice post. Anybody want to buy some viagra?

[gravatar]
Gruhn 1:02 PM on 8 Oct 2009

Here is another way to combat the Spambot. Marry it! It's just what they wouldn't expect!
http://www.webdonuts.com/2009/10/spambot/

[gravatar]
bruce 12:56 AM on 17 Oct 2009

One idea I have is this:

After form is submitted, use javascript to display a confirmation link, like "Please click the link to confirm you're human'. That link would be inserted via javascript, and thus would not be in the DOM (therefore, inaccessible to scrapes). If then that link is clicked, form is posted.

[gravatar]
Brian 10:05 PM on 17 Oct 2009

This is awesome. I love these ideas. However, your "spinner" confuses me. Hash'es inherently are not reversible. Wouldn't an AES cypher block (ECB mode, nothing fancy) be better? Then you don't have to leave the timestamp out in the open when you rehash it. You could even store the field names within that, so that only the spinner ever stays the same.

[gravatar]
Barry O'Callaghan 5:50 AM on 6 Nov 2009

This post is almost 3 years old now. I am just wondering if the honeypot method is still effective. A few months ago I put recaptcha on my yougn wordPress blog for both my comments and my email. I was thinking it was a great idea t the time. But guess what it killed the spam (not all) but it also stopped the albeit limited amount of comments I was getting too. So now I'm thinking captchas are not a good idea for a small site that needs to encourage users. But I'm still not sure of what is the best alternative, and even were I to know - I'm unclear as how to go about implementing it. Any examples of how to code a honeypot?

[gravatar]
Artem 1:57 PM on 12 Nov 2009

Barry, just try Akismet or Mollom.

[gravatar]
John 10:11 AM on 24 Dec 2009

Any idea on how to use this with ajax?

Obviously hashing the field names inside javascript gives away your salt, you could send a ajax request to php to spit out the values for you but this leaves a way for bots to grab your names AND a extra http request.

I'm guessing I'm gonna end up using something like descendant selectors.

[gravatar]
Julia Sykes-Turner 4:19 AM on 21 Jan 2010

There are some great ideas here! I would really appreciate your advice on an issue that my client experiences. My client very occasionally gets bizarre emails from her web form like this (where words in English are from my form php, and the garbage is from the user entry fields):

Name: unmrab
Phone: LrASqJYNJhod
Enquiry: 21vy8X href="http://kinqjyhxukwb.com/">kinqjyhxukwb>,
[url=
http://ekhouqnuhrzp.com/]ekhouqnuhrzp[/url],
[link=http://dkwclumezmns.com/]dkwclumezmns[/link],
http://ayzwqhockubx.com/

Does this look like the work of a spambot, or do you think something else is at work here?

[gravatar]
James 9:16 AM on 24 Feb 2010

Just taken a look at this form with CSS turned off and "overkill" sprung to mind. Then again, if it works...

[gravatar]
Ignacio Segura 7:28 PM on 18 Mar 2010

You can find a Drupal implementation of the honeypot method here:

http://www.isegura.es/blog/stop-spam-your-site-being-invisible-honeytrap-drupal-comments-form

We're also working on a Wordpress implementation, as it has been a success.

[gravatar]
Black Testing 4:43 PM on 6 Apr 2010

Neat - But the bar to exploit this compared to Captcha is low. All the bot has to do is edit itself to look for invisible form fields as opposed to complex image processing.

Security Vs Usability Trade Off I guess....

[gravatar]
metin 11:41 AM on 10 Apr 2010

http://nosp.me is a good alternative to hide email adress from spam bots.

[gravatar]
Richard 2:14 PM on 9 Aug 2010

The time-delay between form fetch and form submission is an interesting one, but not one that is good for comment forms. We already experience problems where aggressive timeouts foil people attempting to post a lengthy comment that they have spent considerable time editing and refining via preview. To force me to submit a form within N seconds just feels capricious and short-sighted.

[gravatar]
Blake Kidney 12:51 PM on 12 Aug 2010

A more complicated approach would be to create a "submit" button in Flash. All the html form data is sent to the Flash swf via FlashVars (or using ExternalInterface and Javascript). When a person clicks the Flash "submit" button, it POST's the data to the server. Along with the data, Flash is able to send a variable to confirm whether or not a keyboard or mouse button was clicked in the process confirming user input versus bot input.

[gravatar]
Philip 9:17 AM on 27 Aug 2010

Hmm nice thought on how to fool spambots. This is my idea:

Include a "i am human" button, once clicked it inserts a new (random selection) label+field into a div via a ajax call, then this has to be filled with the question in the label.

I currently use a simple way to fool spambots on my sites:
I made 6 images with a number range, one of the images gets randomly called when page is opened with the forum. Users must fill in this form into a field and once clicking submit it would check via ajax if number is ok. If scripts are disabled then the number is shown instead of the field. In the back end the number is verified before further processing.

[gravatar]
Anonymous 5:44 PM on 30 Aug 2010

I say, don't rely on audio, images, CSS, javascript, or Flash.

Here are some other approaches (too bad I can't use <LI> tags here):

* Create a form with unusual field names.

* Create a varying form, where field names and prefilled data is varied.

* Include a simple text CAPTCHA (such as "Type the answer of five plus three in digits" or "Enter your favorite sport (hint: I believe its name must start with 'g')").

* Use the "spinner" if you want to.

* Use the Preview/Submit order like you use here.

* Simply do not use HTTP! Make comments sending by telnet or gopher protocols. Spambots won't send anything using those protocols.

[gravatar]
Anon 5:18 AM on 10 Mar 2011

Great article, even it seems to be a bit old :)

[gravatar]
Piyush 5:25 PM on 18 Mar 2011

What about generating images for form labels ? form itself acts like a capcha ....
like form name field .. create a image with "NAME" on the fly ... similar for others ... and randomize the sequence of images each time user wants to register of login .. probability greatly decreases that a bot can fill the form correctly ...

[gravatar]
Samuel Lavoie 5:40 PM on 27 Mar 2011

Great ressource article, added to delicious, thanks!

[gravatar]
Rob Emmery 10:25 PM on 7 Apr 2011

Security by Obscurity. Eh.

[gravatar]
Tjerja Geerts 1:17 AM on 15 Jun 2011

I recently bounced into this one:

http://www.myjqueryplugins.com/QapTcha/demo

a very nice 'human' way of activating the form. The user doesn't have to do any math / or read any of the annoying hard to read Captcha texts. This one gives by far an easy user experience. (Because it's jquery, it'll also work on ipads etc. Something that most of the Flash alternatives here don't offer. Maybe something to consider as well. :)

[gravatar]
Philip 2:31 AM on 15 Jun 2011

The problem with the above jquery captcha is that if you turn off javascript it is not useable. spambots do not fire script. they just read in the html fields fill it and do a submit.

[gravatar]
Philip 2:38 AM on 15 Jun 2011

Ok, I see that it stores the true/false value in a session, that can be read out on the server side. So it can work. my bad.

[gravatar]
Gary 5:57 AM on 5 Aug 2011

Hi, I have tried this with no success using asp, is anyone able to post some basic code to allow me to check for null field?

Currently use below where email-confirmation is the hidden field.
if(!String.IsNullOrEmpty(Request.Form["email-confirmation"]))
IgnoreComment();


That doesnt work.

Thanks

[gravatar]
Philip 6:01 AM on 5 Aug 2011

@Garry

You check if the string NOT containig anything and do IgnoreComments
It should be

if(String.IsNullOrEmpty(Request.Form["email-confirmation"]))
IgnoreComment();

[gravatar]
Gary 6:35 AM on 5 Aug 2011

Thanks for the lightning quick reply Philip, would I place the code inside the form tags or in head section? I wrapped that in < %code% >

Cant seem to make this work!

Microsoft VBScript compilation error '800a03ee'

Expected ')'

/xcontact.asp, line 425

if(String.IsNullOrEmpty(Request.Form["email-confirmation"]))
------------------------------------^

Sorry for being so thick!

[gravatar]
Philip 2:13 AM on 9 Aug 2011

If you are using asp.net VB then it is:

If Not String.IsNullOrEmpty(Request.Form("Surname")) Then
'your code here...
End If

[gravatar]
Philip 2:15 AM on 9 Aug 2011

OR:
If String.IsNullOrEmpty(Request.Form("Surname")) Then
'Form value is empty
End If

[gravatar]
Gary 2:55 AM on 9 Aug 2011

Hi Philip, its your basic VB asp I am using, would any of the above work with that?
Thanks

[gravatar]
george 9:37 PM on 30 Sep 2011

what's awful is if you don't have control of your site. If you use TypePad, for instance, you can't suppress the "URL" field that commenters are provided with. For some reason, if you accept comments to your blog, you have to allow them to type in a URL which becomes an active link. Great invitation for spammers, and TypePad sites are infested with spam as a result. They do filter out about 90+% of it, but every day some gets through. If they allowed the URL field to be turned off, then they probably wouldn't have to filter. Without a URL, spammers wouldn't even bother. Instead they give spammers free advertising space to piggyback on anyone's blog. And they don't listen when people tell them we need to be able to disallow that field.

[gravatar]
Mike Scott 1:40 AM on 1 Oct 2011

While it's less common than it used to be, some legitimate users come through megaproxy farms and have a different IP address for each page they load. So their form submission would come from an IP address that didn't match the one in the hash value, and you would block their submission.

[gravatar]
Tony 10:06 PM on 13 Dec 2011

This is a very good article. I just modified my registration form just now.

[gravatar]
kirpi 6:49 AM on 2 Jan 2012

What if browsers have autofill features enabled? It seems to be a (dreadful) trend for lazy/dumb users.
Honeypot fields might be filled somehow even without real users seeing them, thus producing flaws in the testing process.
Perhaps an

autocomplete="off"
should be adopted on the whole form tag.

[gravatar]
Alex 1:29 PM on 30 Mar 2012

This is a great list... thank you!

One other thought: This won't degrade gracefully, but if you are submitting the form via javascript, you can also leave the action out of the form, or put some dummy url in where the form will post. Then specify the url where you wish to post the form data via the javascript itself.

[gravatar]
Bilge 5:09 AM on 20 Sep 2012

That's a lot of trouble to go to for protecting a site nobody gives a shit about. If your site is popular enough people will write bots for your site and all of this work is irrelevant.

[gravatar]
Daniel 2:44 AM on 13 Sep 2013

Thanks for writing this up for others to learn from - after 6 years it's still relevant. I used your spinner approach, together with some text analysis, to make a WordPress plugin which is doing a good job on my blog. There's a write-up here: http://daniemon.com/blog/block-comment-spam-without-captchas/

[gravatar]
George 12:27 AM on 2 Dec 2013

I have used the honeypot fields for several years now, and stop almost all spam with just that. Browser auto form fillers don't see them, so your user is not blocked. Thanks for the great article, gave me several ideas for updates as we are redoing all the sites.

[gravatar]
Benjamin 9:43 AM on 19 Dec 2013

This is such a great article. I've been searching high and low for something that will stop spam.

What do you think about using all of the methods above and incorporating a CAPTCHA as a honeypot? The user doesn't see the CAPTCHA or need to fill it in, but bots will attempt to do so and fail every time.

[gravatar]
Alexander Walters 3:01 PM on 13 Jul 2014

another idea that came to mind while reading the comments:

action url does not in fact submit to the page that commits the comment, it in fact submits to a page that only has an html meta redirection (http 200 not 3**). A spam bot will not follow that redirect, and the comment will be silently lost. There are some implementation details that can make such a thing safer.

Add a comment:

name
email
Ignore this:
not displayed and no spam.
Leave this empty:
www
not searched.
 
Name and either email or www are required.
Don't put anything here:
Leave this empty:
URLs auto-link and some tags are allowed: <a><b><i><p><br><pre>.