Arantor

  • As powerful as possible, as complex as necessary.
  • Posts: 14,278
Another reason to dislike reCaptcha
« on April 4th, 2012, 01:49 PM »
Regular readers here will know that I dislike reCaptcha, a lot.[1]

Putting aside the fact that it is worryingly flawed,[2] we have an interesting new vector for it.

I give you: http://www.theregister.co.uk/2012/04/04/google_recaptcha_street_view/

Specifically, it's using street numbers, and possibly street names, off Street View as things it's expecting you to decipher, to improve their database. Very effective crowdsourcing, potentially, but also it makes a mockery of any anti-spam considerations it may once have had.
 1. And that I only wrote a reCaptcha plugin to prove the viability of the CAPTCHA hooks, and to avoid people whining about not having it, not because *I* wanted it.
 2. It shows you two words, one of which it actually doesn't know itself, and has been known to display numbers, and even mathematical equations and Hebrew writing, as the second 'word', in an attempt to crowdsource their meaning, which doesn't work as well as you might think, because people including bots just put nonsense in. In any case, even the word it does know, it accepts one letter being wrong, making it even less reliable than an actual regular CAPTCHA that displays a word. Small wonder it's been broken by bots.
When we unite against a common enemy that attacks our ethos, it nurtures group solidarity. Trolls are sensational, yes, but we keep everyone honest. | Game Memorial

Farjo

  • "a valuable asset to the community"
  • Posts: 492
Re: Another reason to dislike reCaptcha
« Reply #1, on April 4th, 2012, 02:44 PM »
But how do you know it's the image that's the one it's learning? Presumably by now enough people have identified the images and so they could be the control 'word'.

This is just tabloid journalism in that they've built a whole article around a single assumption: quote: Johnston added: "If we assume Google’s objective is to get reCAPTCHA users to determine building numbers from Street View images..."

But it's sensible to assume that they'll use the street image as a control word and as an unknown word, which will fool the bots (for a while at least) and have the added benefit of improving google maps.

Arantor

  • As powerful as possible, as complex as necessary.
  • Posts: 14,278
Re: Another reason to dislike reCaptcha
« Reply #2, on April 4th, 2012, 02:54 PM »
That's the thing, it's long been possible to figure out which is the known word and which is the control word. Either way it's a fuck-up.

If the street number is the control word, you have a maximum of something like 10000 permutations to cope with and even then it's possible to drastically cut that back with a bit of imagination in the code.

If the street number is not the control word, nothing changes and you break the control word much as you do now with convention OCR.

This is the thing people don't realise, I can tell you *at a GLANCE* which is the control word in a normal reCaptcha. I will likely be able to tell you with the same glance which is which with the new setup, but on top of that it is more likely they will use the street numbers as the 'unknown' rather than the control value anyway.


This is the fundamental thing with reCaptcha: you get two words and you only have to get one mostly right, the other can be failed without any ill effects. So you actually spend more computational effect figuring out which is which, and now that just got a lot easier.

Nao

  • Dadman with a boy
  • Posts: 16,082

Arantor

  • As powerful as possible, as complex as necessary.
  • Posts: 14,278

Nao

  • Dadman with a boy
  • Posts: 16,082
Re: Another reason to dislike reCaptcha
« Reply #5, on April 4th, 2012, 03:27 PM »
Certainly -- except for non-English speakers!

We really need to add some kind of UI to make it easy to choose a language for the question I'd say.
It's probably better, UX-wise, than implementing a [lang] tag (which I'll probably end up doing, for completeness.)

Arantor

  • As powerful as possible, as complex as necessary.
  • Posts: 14,278
Re: Another reason to dislike reCaptcha
« Reply #6, on April 4th, 2012, 03:31 PM »
For single language sites, which is most of them to be fair, it's just not an issue.

But for multiple language sites, sure, we can look at extending that in some fashion. I might even be persuaded to allow admins to define multiple answers that are acceptable, e.g. variant spellings. Which would also allow you to write the same question in - say - two languages, and define both answers as allowed.

nolsilang

  • Lurking <i class=
  • Posts: 106

Arantor

  • As powerful as possible, as complex as necessary.
  • Posts: 14,278
Re: Another reason to dislike reCaptcha
« Reply #8, on April 4th, 2012, 04:42 PM »
Except that KeyCaptcha, like several of the other CAPTCHA solutions is also going to limit those with visual or motor impairments, or various browsers such as certain mobile browsers.

As opposed to writing a question for your forum which has no such limitations.

Nao

  • Dadman with a boy
  • Posts: 16,082
Re: Another reason to dislike reCaptcha
« Reply #9, on April 4th, 2012, 04:52 PM »
Quote from Arantor on April 4th, 2012, 03:31 PM
But for multiple language sites, sure, we can look at extending that in some fashion. I might even be persuaded to allow admins to define multiple answers that are acceptable, e.g. variant spellings. Which would also allow you to write the same question in - say - two languages, and define both answers as allowed.
Sounds like a good idea... :)

Arantor

  • As powerful as possible, as complex as necessary.
  • Posts: 14,278
Re: Another reason to dislike reCaptcha
« Reply #10, on April 4th, 2012, 05:12 PM »
Sadly the concept of allowing multiple answers to a single question is not a new idea and it isn't mine, but the approach I have in mind is more what I described than what already exists.

Specifically, there's a mod that allows you to define a regex to match against rather than anything else, so that you can declare multiple variations through a regex rather than multiple discrete answers. Effective? Certainly. Intuitive? Not so much.

Norodo

  • Oh you Baidu, so randumb. (60 sites being indexed at once? Jeez)
  • Posts: 469
Re: Another reason to dislike reCaptcha
« Reply #11, on April 4th, 2012, 05:39 PM »
I have serious problems making regexes (see: any .htaccess I ever created). I would assume this goes for 99% of all admins too, or maybe I'm just daft. Regexes scare me, anyway.

Arantor

  • As powerful as possible, as complex as necessary.
  • Posts: 14,278
Re: Another reason to dislike reCaptcha
« Reply #12, on April 4th, 2012, 05:42 PM »
They're a complex beast. I wouldn't say I've mastered them, but I do know how to wield them for general cases (Nao's the regexpert around here), so I'm not fearful of using them, though I can readily see how others might be.

nend

  • When is a theme, no longer what it was when installed?
  • Posts: 165
Re: Another reason to dislike reCaptcha
« Reply #13, on April 5th, 2012, 07:27 PM »
I don't mind reCaptcha, but it shouldn't be the only line of defense. You have to consider what reCaptcha was designed for and that is what Google is using it for. IMHO it wasn't really designed to be a captcha but a decipher system, captcha just came second nature.

However on a second note, how about a captcha question system. The question and instructions will be generated like a captcha, so the bot will have to decipher the question and answer it too, lol.

IMHO though, most bots I have seen get through most security lately are not bots but people. They create a account and get it unlocked for the bots to use it later. You can captcha the post maybe, allot of junk though just to keep a few bots out. :whistle:

Arantor

  • As powerful as possible, as complex as necessary.
  • Posts: 14,278
Re: Another reason to dislike reCaptcha
« Reply #14, on April 5th, 2012, 07:45 PM »
That's part of the problem, it's been billed as a 'magic bullet' back from its Carnegie Mellon days, back when it actually was pretty much a magic bullet. Now, of course, it's long since been beaten.
Quote
However on a second note, how about a captcha question system. The question and instructions will be generated like a captcha, so the bot will have to decipher the question and answer it too, lol.
If you mean putting both the instructions and puzzle in the image, surely that will be even worse for usability than just having the puzzle in an image? (Note that bots were breaking reCaptcha by its audio puzzle for a while because that was easier!)
Quote
IMHO though, most bots I have seen get through most security lately are not bots but people. They create a account and get it unlocked for the bots to use it later. You can captcha the post maybe, allot of junk though just to keep a few bots out.
That's nothing new, especially given the CAPTCHA-solving farms. This is partly why I made animated CAPTCHAs, so that CAPTCHA farms would take a little longer to beat them.

But a CAPTCHA is not really a solution and hasn't been for a while; it's a simple automated defence - but a proper defence does involve better things like having Q&A.