Wedge
Public area => The Pub => Off-topic => Topic started by: Arantor on April 4th, 2012, 01:49 PM
-
Regular readers here will know that I dislike reCaptcha, a lot.
Putting aside the fact that it is worryingly flawed, we have an interesting new vector for it.
I give you: http://www.theregister.co.uk/2012/04/04/google_recaptcha_street_view/
Specifically, it's using street numbers, and possibly street names, off Street View as things it's expecting you to decipher, to improve their database. Very effective crowdsourcing, potentially, but also it makes a mockery of any anti-spam considerations it may once have had.
-
But how do you know it's the image that's the one it's learning? Presumably by now enough people have identified the images and so they could be the control 'word'.
This is just tabloid journalism in that they've built a whole article around a single assumption: quote: Johnston added: "If we assume Google’s objective is to get reCAPTCHA users to determine building numbers from Street View images..."
But it's sensible to assume that they'll use the street image as a control word and as an unknown word, which will fool the bots (for a while at least) and have the added benefit of improving google maps.
-
That's the thing, it's long been possible to figure out which is the known word and which is the control word. Either way it's a fuck-up.
If the street number is the control word, you have a maximum of something like 10000 permutations to cope with and even then it's possible to drastically cut that back with a bit of imagination in the code.
If the street number is not the control word, nothing changes and you break the control word much as you do now with convention OCR.
This is the thing people don't realise, I can tell you *at a GLANCE* which is the control word in a normal reCaptcha. I will likely be able to tell you with the same glance which is which with the new setup, but on top of that it is more likely they will use the street numbers as the 'unknown' rather than the control value anyway.
This is the fundamental thing with reCaptcha: you get two words and you only have to get one mostly right, the other can be failed without any ill effects. So you actually spend more computational effect figuring out which is which, and now that just got a lot easier.
-
weCaptcha is way better anyway! :eheh:
-
weQ&A is even better though, heh ;)
-
Certainly -- except for non-English speakers!
We really need to add some kind of UI to make it easy to choose a language for the question I'd say.
It's probably better, UX-wise, than implementing a [lang] tag (which I'll probably end up doing, for completeness.)
-
For single language sites, which is most of them to be fair, it's just not an issue.
But for multiple language sites, sure, we can look at extending that in some fashion. I might even be persuaded to allow admins to define multiple answers that are acceptable, e.g. variant spellings. Which would also allow you to write the same question in - say - two languages, and define both answers as allowed.
-
In one of the forum that I used to be admin, I used http://[url]https://www.keycaptcha.com/[/url]. It pretty good to weed the bot. Recaptcha just doesn't cut it, it just like a unlocked door. The door is there but you can move the handle and it will opened.
-
Except that KeyCaptcha, like several of the other CAPTCHA solutions is also going to limit those with visual or motor impairments, or various browsers such as certain mobile browsers.
As opposed to writing a question for your forum which has no such limitations.
-
But for multiple language sites, sure, we can look at extending that in some fashion. I might even be persuaded to allow admins to define multiple answers that are acceptable, e.g. variant spellings. Which would also allow you to write the same question in - say - two languages, and define both answers as allowed.
Sounds like a good idea... :)
-
Sadly the concept of allowing multiple answers to a single question is not a new idea and it isn't mine, but the approach I have in mind is more what I described than what already exists.
Specifically, there's a mod that allows you to define a regex to match against rather than anything else, so that you can declare multiple variations through a regex rather than multiple discrete answers. Effective? Certainly. Intuitive? Not so much.
-
I have serious problems making regexes (see: any .htaccess I ever created). I would assume this goes for 99% of all admins too, or maybe I'm just daft. Regexes scare me, anyway.
-
They're a complex beast. I wouldn't say I've mastered them, but I do know how to wield them for general cases (Nao's the regexpert around here), so I'm not fearful of using them, though I can readily see how others might be.
-
I don't mind reCaptcha, but it shouldn't be the only line of defense. You have to consider what reCaptcha was designed for and that is what Google is using it for. IMHO it wasn't really designed to be a captcha but a decipher system, captcha just came second nature.
However on a second note, how about a captcha question system. The question and instructions will be generated like a captcha, so the bot will have to decipher the question and answer it too, lol.
IMHO though, most bots I have seen get through most security lately are not bots but people. They create a account and get it unlocked for the bots to use it later. You can captcha the post maybe, allot of junk though just to keep a few bots out. :whistle:
-
That's part of the problem, it's been billed as a 'magic bullet' back from its Carnegie Mellon days, back when it actually was pretty much a magic bullet. Now, of course, it's long since been beaten.
However on a second note, how about a captcha question system. The question and instructions will be generated like a captcha, so the bot will have to decipher the question and answer it too, lol.
If you mean putting both the instructions and puzzle in the image, surely that will be even worse for usability than just having the puzzle in an image? (Note that bots were breaking reCaptcha by its audio puzzle for a while because that was easier!)IMHO though, most bots I have seen get through most security lately are not bots but people. They create a account and get it unlocked for the bots to use it later. You can captcha the post maybe, allot of junk though just to keep a few bots out.
That's nothing new, especially given the CAPTCHA-solving farms. This is partly why I made animated CAPTCHAs, so that CAPTCHA farms would take a little longer to beat them.
But a CAPTCHA is not really a solution and hasn't been for a while; it's a simple automated defence - but a proper defence does involve better things like having Q&A.
-
IMHO though, most bots I have seen get through most security lately are not bots but people. They create a account and get it unlocked for the bots to use it later. You can captcha the post maybe, allot of junk though just to keep a few bots out. :whistle:
Captcha protects from automation/bot(It can hit your forum thousands per day), if anything got through then the forum's staff should be able to take action. The reason they want an account is to post link, so I usually just limited that in forum permission. Stuff like website link on profile(min 10 post), link on post/signature(min 20 post) and so on. I think the forum's staff would be able differentiate between regular user and would be spammer. :)
-
No, no it doesn't, that's precisely the point.
The SMF one has been broken by bots for years. ReCaptcha has also been broken for at least a year. It's incredibly easy to automate - I've even seen JavaScript implementations for OCRing the text in a CAPTCHA and even limited neural network solutions (i.e. a JavaScript routine that's able to learn and improve its ability to process text)
Please understand, we know what CAPTCHAs can and can't do, we've been fighting spam with them for years, and we've learned their limitations only too well - which is why almost two years ago I implemented my own from scratch, which Wedge inherited.
CAPTCHAs are outdated and do not actively solve the spam problem.
Limiting the website link to 10+ posts doesn't really solve the problem either, the bots won't notice and will try it anyway, the human spammers might be discouraged. We have also made that entirely possible in Wedge, to limit website and signature to higher post counts, plus it's also possible to set things up where a user can put their signature in but it isn't visible until they made 10 posts (or whatever setting you want)