Print Page - Another reason to dislike reCaptcha

Public area => The Pub => Off-topic => Topic started by: Arantor on April 4th, 2012, 01:49 PM

Title: Another reason to dislike reCaptcha
Post by: Arantor on April 4th, 2012, 01:49 PM

Regular readers here will know that I dislike reCaptcha, a lot.[1]

Putting aside the fact that it is worryingly flawed,[2] we have an interesting new vector for it.

I give you: http://www.theregister.co.uk/2012/04/04/google_recaptcha_street_view/

Specifically, it's using street numbers, and possibly street names, off Street View as things it's expecting you to decipher, to improve their database. Very effective crowdsourcing, potentially, but also it makes a mockery of any anti-spam considerations it may once have had.

And that I only wrote a reCaptcha plugin to prove the viability of the CAPTCHA hooks, and to avoid people whining about not having it, not because *I* wanted it.

It shows you two words, one of which it actually doesn't know itself, and has been known to display numbers, and even mathematical equations and Hebrew writing, as the second 'word', in an attempt to crowdsource their meaning, which doesn't work as well as you might think, because people including bots just put nonsense in. In any case, even the word it does know, it accepts one letter being wrong, making it even less reliable than an actual regular CAPTCHA that displays a word. Small wonder it's been broken by bots.

Title: Re: Another reason to dislike reCaptcha
Post by: Farjo on April 4th, 2012, 02:44 PM

But how do you know it's the image that's the one it's learning? Presumably by now enough people have identified the images and so they could be the control 'word'.

This is just tabloid journalism in that they've built a whole article around a single assumption: quote: Johnston added: "If we assume Google’s objective is to get reCAPTCHA users to determine building numbers from Street View images..."

But it's sensible to assume that they'll use the street image as a control word and as an unknown word, which will fool the bots (for a while at least) and have the added benefit of improving google maps.

Title: Re: Another reason to dislike reCaptcha
Post by: Arantor on April 4th, 2012, 02:54 PM

That's the thing, it's long been possible to figure out which is the known word and which is the control word. Either way it's a fuck-up.

If the street number is the control word, you have a maximum of something like 10000 permutations to cope with and even then it's possible to drastically cut that back with a bit of imagination in the code.

If the street number is not the control word, nothing changes and you break the control word much as you do now with convention OCR.

This is the thing people don't realise, I can tell you *at a GLANCE* which is the control word in a normal reCaptcha. I will likely be able to tell you with the same glance which is which with the new setup, but on top of that it is more likely they will use the street numbers as the 'unknown' rather than the control value anyway.

This is the fundamental thing with reCaptcha: you get two words and you only have to get one mostly right, the other can be failed without any ill effects. So you actually spend more computational effect figuring out which is which, and now that just got a lot easier.

Title: Re: Another reason to dislike reCaptcha
Post by: Nao on April 4th, 2012, 03:08 PM

weCaptcha is way better anyway! :eheh:

Title: Re: Another reason to dislike reCaptcha
Post by: Arantor on April 4th, 2012, 03:25 PM

weQ&A is even better though, heh ;)

Title: Re: Another reason to dislike reCaptcha
Post by: Nao on April 4th, 2012, 03:27 PM

Certainly -- except for non-English speakers!

We really need to add some kind of UI to make it easy to choose a language for the question I'd say.
It's probably better, UX-wise, than implementing a [lang] tag (which I'll probably end up doing, for completeness.)

Title: Re: Another reason to dislike reCaptcha
Post by: Arantor on April 4th, 2012, 03:31 PM

For single language sites, which is most of them to be fair, it's just not an issue.

But for multiple language sites, sure, we can look at extending that in some fashion. I might even be persuaded to allow admins to define multiple answers that are acceptable, e.g. variant spellings. Which would also allow you to write the same question in - say - two languages, and define both answers as allowed.

Title: Re: Another reason to dislike reCaptcha
Post by: nolsilang on April 4th, 2012, 03:53 PM

In one of the forum that I used to be admin, I used http://[url]https://www.keycaptcha.com/[/url]. It pretty good to weed the bot.[1] Recaptcha just doesn't cut it, it just like a unlocked door. The door is there but you can move the handle and it will opened.[2]

1.	2 years ago, I just know that it will show ads too now
2.	Pardon my very limited English

Title: Re: Another reason to dislike reCaptcha
Post by: Arantor on April 4th, 2012, 04:42 PM

Except that KeyCaptcha, like several of the other CAPTCHA solutions is also going to limit those with visual or motor impairments, or various browsers such as certain mobile browsers.

As opposed to writing a question for your forum which has no such limitations.

Title: Re: Another reason to dislike reCaptcha
Post by: Nao on April 4th, 2012, 04:52 PM

Quote from Arantor on April 4th, 2012, 03:31 PM

But for multiple language sites, sure, we can look at extending that in some fashion. I might even be persuaded to allow admins to define multiple answers that are acceptable, e.g. variant spellings. Which would also allow you to write the same question in - say - two languages, and define both answers as allowed.

Sounds like a good idea... :)

Title: Re: Another reason to dislike reCaptcha
Post by: Arantor on April 4th, 2012, 05:12 PM

Sadly the concept of allowing multiple answers to a single question is not a new idea and it isn't mine, but the approach I have in mind is more what I described than what already exists.

Specifically, there's a mod that allows you to define a regex to match against rather than anything else, so that you can declare multiple variations through a regex rather than multiple discrete answers. Effective? Certainly. Intuitive? Not so much.

Title: Re: Another reason to dislike reCaptcha
Post by: Norodo on April 4th, 2012, 05:39 PM

I have serious problems making regexes (see: any .htaccess I ever created). I would assume this goes for 99% of all admins too, or maybe I'm just daft. Regexes scare me, anyway.

Title: Re: Another reason to dislike reCaptcha
Post by: Arantor on April 4th, 2012, 05:42 PM

They're a complex beast. I wouldn't say I've mastered them, but I do know how to wield them for general cases (Nao's the regexpert around here), so I'm not fearful of using them, though I can readily see how others might be.

Title: Re: Another reason to dislike reCaptcha
Post by: nend on April 5th, 2012, 07:27 PM

I don't mind reCaptcha, but it shouldn't be the only line of defense. You have to consider what reCaptcha was designed for and that is what Google is using it for. IMHO it wasn't really designed to be a captcha but a decipher system, captcha just came second nature.

However on a second note, how about a captcha question system. The question and instructions will be generated like a captcha, so the bot will have to decipher the question and answer it too, lol.

IMHO though, most bots I have seen get through most security lately are not bots but people. They create a account and get it unlocked for the bots to use it later. You can captcha the post maybe, allot of junk though just to keep a few bots out. :whistle:

Title: Re: Another reason to dislike reCaptcha
Post by: Arantor on April 5th, 2012, 07:45 PM

That's part of the problem, it's been billed as a 'magic bullet' back from its Carnegie Mellon days, back when it actually was pretty much a magic bullet. Now, of course, it's long since been beaten.

Quote

However on a second note, how about a captcha question system. The question and instructions will be generated like a captcha, so the bot will have to decipher the question and answer it too, lol.

If you mean putting both the instructions and puzzle in the image, surely that will be even worse for usability than just having the puzzle in an image? (Note that bots were breaking reCaptcha by its audio puzzle for a while because that was easier!)

Quote

IMHO though, most bots I have seen get through most security lately are not bots but people. They create a account and get it unlocked for the bots to use it later. You can captcha the post maybe, allot of junk though just to keep a few bots out.

That's nothing new, especially given the CAPTCHA-solving farms. This is partly why I made animated CAPTCHAs, so that CAPTCHA farms would take a little longer to beat them.

But a CAPTCHA is not really a solution and hasn't been for a while; it's a simple automated defence - but a proper defence does involve better things like having Q&A.

Title: Re: Another reason to dislike reCaptcha
Post by: nolsilang on April 6th, 2012, 01:53 AM

Quote from nend on April 5th, 2012, 07:27 PM

IMHO though, most bots I have seen get through most security lately are not bots but people. They create a account and get it unlocked for the bots to use it later. You can captcha the post maybe, allot of junk though just to keep a few bots out. :whistle:

Captcha protects from automation/bot(It can hit your forum thousands per day), if anything got through then the forum's staff should be able to take action. The reason they want an account is to post link, so I usually just limited that in forum permission. Stuff like website link on profile(min 10 post), link on post/signature(min 20 post) and so on. I think the forum's staff would be able differentiate between regular user and would be spammer. :)

Title: Re: Another reason to dislike reCaptcha
Post by: Arantor on April 6th, 2012, 01:59 AM

No, no it doesn't, that's precisely the point.

The SMF one has been broken by bots for years. ReCaptcha has also been broken for at least a year. It's incredibly easy to automate - I've even seen JavaScript implementations for OCRing the text in a CAPTCHA and even limited neural network solutions (i.e. a JavaScript routine that's able to learn and improve its ability to process text)

Please understand, we know what CAPTCHAs can and can't do, we've been fighting spam with them for years, and we've learned their limitations only too well - which is why almost two years ago I implemented my own from scratch, which Wedge inherited.

CAPTCHAs are outdated and do not actively solve the spam problem.

Limiting the website link to 10+ posts doesn't really solve the problem either, the bots won't notice and will try it anyway, the human spammers might be discouraged. We have also made that entirely possible in Wedge, to limit website and signature to higher post counts, plus it's also possible to set things up where a user can put their signature in but it isn't visible until they made 10 posts (or whatever setting you want)