Human spammers

Wednesday 24 May 2006

I don’t know about other bloggers out there, but I’m seeing a different kind of spam in my comments these days. It looks like actual people are reading the blog posts and writing minimally appropriate comments, just to get a link to their website.

For example, on my post about The Wave, I got this comment from “christy”:

I think Wikipedia might have some info about the traces of this interesting game where fans are as much playing the game as professionals. I enjoyed watching this game.

The website linked to was a brand-new spam blog about televisions.

My post about the David Copperfield spoof video (which I called Magic is all around us) garnered this comment from “laura”:

Since the lord god is all around us, obviously the magic too. This is what i believe. What do you say?????????

Laura linked to a spam blog about dogs with an identical design to Christy’s.

These comments have clearly been written by a person because of their goofy but tenuous applicability to the blog post. Someone out there is interested enough in page rank to pay someone to write comments in middling English that are “about” the blog post, and not about their cheesy portal site at all. The good news is that this makes me think the comment spam preventions are working, and it’s increasing the cost of spam for the spammers. The bad news is that these comments have to be cleaned by hand.

In my comments (and in the link above), I use the rel=”nofollow” attribute to ensure that search engines don’t lend any credibility to the link. As of now, I advertise that fact on the comments form. I doubt it will stop the spammers from trying, but one can hope...

Comments

[gravatar]
Stuart Langridge 8:40 AM on 24 May 2006

Aha. I was going to write something about this myself today, because I'm experiencing exactly the same thing. I've been leaving the comments in but stripping the URLs.

[gravatar]
Andrew Smith 8:49 AM on 24 May 2006

While those most likely were hand written, I wonder how long it will be(if not already), until spammers start using grammar based text generators such as the dada engine

[gravatar]
Platypus 9:21 AM on 24 May 2006

Yeah, I've had a few of those too. A week or two ago I got a half-dozen from some schlep in Poland before I modified my top-level .htaccess to deny his IP. That's actually the nice thing about these guys; unlike the bots, which come from many different IPs back to back to back, these guys typically only have one and if you block that then they're done. Until the next one comes along, of course, but they just don't appear fast enough to be much of a pain.

[gravatar]
Platypus 9:22 AM on 24 May 2006

Oh, and here's a close tag for Andrew who forgot to bring one.

[gravatar]
Stuart Langridge 9:29 AM on 24 May 2006

Tut, tut, Ned. :)

[gravatar]
Ned Batchelder 10:47 AM on 24 May 2006

If by "tut tut", you mean I should have tidied the HTML in the comment, you are probably right. But this way, we get to build a community, with Platypus helping Andrew!

[gravatar]
Denis 11:12 AM on 24 May 2006

You're underestimating spammers. These are machine-generated as I've seen very similar ones posted day after day. They are very vague and anchor on some keyword in your post, but their real purpose is to spam the URL. I've seen things from "Great post, I definitely agree." to "It doesn't work as well on my Gentoo box."

[gravatar]
Andrew Smith 11:29 AM on 24 May 2006

I have no idea what close tag you are talking about. I included one link and had an opening and closing tag for the "a href".

I wouldn't normally care, but half of the comments besides mine are criticizing my post. :-(

sorry if forgot anything(tags),
Andrew

[gravatar]
Ian Bicking 12:00 PM on 24 May 2006

There's also a possibility that they are building their text programmatically, similar to the very spam blogs they point to. They grab the page, pick some words, use a Disassociated-Text style generation (or pick from some database of dumb comments), and put their link on it.

I'm not sure if this is happening, but it certainly seems possible. I'm inclined to guess that in part because the cost of the spammers' time is probably closely related to their proficiency with English, so it seems expensive to get English speakers (even not very good ones) to do manual comment entry.

[gravatar]
Calvin Spealman 12:31 PM on 24 May 2006

Thankfully, I'm not popular enough to get these! Or, maybe I am and requiring people to read that squiggly text image was a good idea. Of course, that means these aren't human written, and I do believe they are generated. Perhaps some text mining looked for sentances matching keywords in your post? I find myself intriqued enough to wonder if I could write something that would scrape the web for blog posts and their comments, and then build comments based on the content of posts and mining the comments from similar posts in the database.

What I hate most about spammers is that I can't do that experiment, because my nature would cause me to open-source it and release it for free, and then someone would use my creation for evil! In the same manner, I can't write a greasemonkey script to extract real e-mail addresses from spam-guarded addresses, so I can just click them like normal.

[gravatar]
Kyle Bennett 1:48 PM on 24 May 2006

These are machine-generated and posted. The grammer is not right, the sentences often have no real meaning, appropriate to what they are commenting on or not. I used to get a bunch of these, CAPTCHA stopped them just as well as it did the others. The few that I've gotten since that were questionable in their content linked to real live bloggers, so those were either amateurs looking for more traffic, or simply uninteresting good-faith commenters.

[gravatar]
David Chen 2:02 PM on 24 May 2006

I do not know if yours were human generated or not, but I have been getting quite a few similar ones on the blogs I host on FallenEarth, leading me to suspect that they're computer generated.

Another trend I've been noticing are spam comments that don't have any links in them at all, but some gibberish containing a random 5-digit number. I'm not sure what is going on there.

[gravatar]
Peter Harkins 4:07 PM on 24 May 2006

I've played around with Markov chain text generation (length of 2) using a database of a couple thousand forum posts about a game. It's really stupid but still decent sometimes -- if I added some basic grammer so it didn't spit out run-on sentences it'd sound about as good as these spammers.

[gravatar]
Stan Seibert 6:21 PM on 24 May 2006

Yeah, I've been filtering these out for a while, as I still manually approve comments on my blog. It looks like (not surprisingly) the spammers have figured out that writing comments which are just a dozen links to poker sites is not effective.


My favorite ones are those which look like empty praise: "Great blog, I found this article very interesting" or the ones that appear to ask a technical question: "I found your site while debugging a website problem. I can't seem to get [lame pr0n site url] to load. Does it work for you?".


It's an unfortunate testament to human ingenuity.

[gravatar]
Alan Green 9:09 PM on 24 May 2006

Yes this is hard problem. Why not be pleased to visit cardboard.nu? It is blog with no spam.

(Sorry. Somebody had to do it. :)

[gravatar]
Laurence Gonsalves 12:51 AM on 25 May 2006

I had a similar comment show up on my site just a couple of days ago. The comment was something like "yeah, those sure were the days" which was strangely relevant to the post and not something that looked like it could've been automatically generated. I almost didn't realize it was spam until I saw that it linked to a spam site selling pizza (?!). I've enabled Blogger's CAPTCHAs in comment posting, but I've occasionally seen other spam comments that were clearly automated get through. CAPTCHAs are only a deterrent -- they don't stop the really determined spammers.

Add a comment:

Ignore this:
Leave this empty:
Name is required. Either email or web are required. Email won't be displayed and I won't spam you. Your web site won't be indexed by search engines.
Don't put anything here:
Leave this empty:
URLs auto-link and some tags are allowed: <a><b><i><p><br><pre>.