Technology @lemmy.world ForgottenFlux @lemmy.world 7mo ago

Forget security – Google's reCAPTCHA v2 is exploiting users for profit | Web puzzles don't protect against bots, but humans have spent 819 million unpaid hours solving them

www.theregister.com Google's reCAPTCHA v2 just labor exploitation, boffins say

Web puzzles don't protect against bots, but humans have spent 819 million unpaid hours solving them

Research Findings:

reCAPTCHA v2 is not effective in preventing bots and fraud, despite its intended purpose
reCAPTCHA v2 can be defeated by bots 70-100% of the time
reCAPTCHA v3, the latest version, is also vulnerable to attacks and has been beaten 97% of the time
reCAPTCHA interactions impose a significant cost on users, with an estimated 819 million hours of human time spent on reCAPTCHA over 13 years, which corresponds to at least $6.1 billion USD in wages
Google has potentially profited $888 billion from cookies [created by reCAPTCHA sessions] and $8.75–32.3 billion per each sale of their total labeled data set
Google should bear the cost of detecting bots, rather than shifting it to users

"The conclusion can be extended that the true purpose of reCAPTCHA v2 is a free image-labeling labor and tracking cookie farm for advertising and data profit masquerading as a security service," the paper declares.

In a statement provided to The Register after this story was filed, a Google spokesperson said: "reCAPTCHA user data is not used for any other purpose than to improve the reCAPTCHA service, which the terms of service make clear. Further, a majority of our user base have moved to reCAPTCHA v3, which improves fraud detection with invisible scoring. Even if a site were still on the previous generation of the product, reCAPTCHA v2 visual challenge images are all pre-labeled and user input plays no role in image labeling."

172

Late Stage Capitalism @lemmygrad.ml sabreW4K3 @lazysoci.al 7mo ago

Google's reCAPTCHAv2 is just labor exploitation, boffins say • The Register

www.theregister.com /2024/07/24/googles_recaptchav2_labor/

1 0

Technology @beehaw.org sabreW4K3 @lazysoci.al 7mo ago

Google's reCAPTCHAv2 is just labor exploitation, boffins say • The Register

www.theregister.com /2024/07/24/googles_recaptchav2_labor/

86 18

172 comments

I kinda figured. It was annoying to do one, but then they wanted you to do two or three and that's absurd. Whenever it comes up now, I usually just close out.
- they wanted you to do two or three and that's absurd
  
  Yea how about 20
  
  VPN? Google will just go in a loop with these things, so I just stopped using Google completely.
  
  if you have to do that many, you either have some privacy setting on or on a flagged ip given from a VPN
  
  I tried to order some components on Digikey a few months ago and I’m still mentally scarred. Probably did a few hundred of those things over the course of 2 weeks.
  
  That’s because you’re shady.
  
  Cries in battlenet sign up process
  
  STOP BEING SNEAKY MICHAEL
- Some captchas have also just gotten obvious AI training. "Click on the living being in this image", "Select every image of the same object as in this example image". And the images you have to select look obviously AI generated.
  
  Heh, I got one just the other day "Select the images containing structures built by people" lmao
  
  Those one answers incorrectly.
- Im surprised that this is in the news right now. This has been acknowledged as fact for a decade or so.
  
  Relevant 1053
- At a certain point I did like 10 of them, and then ended up closing the page, cause it never let me in, all because I was on a vpn
- Funny thing is they stop asking if you do them really slowly. Almost as if to tell you, you‘re too inefficient to even be an unpaid intern or something. Anyway, if they annoy you, take your time.
Getting served a captcha often results in me closing the tab. I'm not doing stupid puzzles for you.
- Do them wrong and then close out
  
  I do it right and it says I’m wrong =\
  
  It knows they’re wrong which is why I don’t really think this article is accurate. Is it training if it already has the answers? Probably not.
- I haven't done an image one in years for the same reason.
  
  My general internet usage has plummeted between ads and captchas and all the other modern website bullshit, which is why I am here so much.
When they slow fade in the picture, I add one more software engineer to my kill list.
- I'm sure they intentionally made it so people get frustrated and leave instead.
- In case you didnt know: This is already a thing with pictures slowly fading in for selecting stuff like traffic cones or busses.
I bypassed 35000 google recaptcha v2 using bots. Don't ever rely on this for security
- Where can I learn this power?
  
  I just spent 3$ worth of bitcoin on NoCaptchaAI. I used their web extension on a server which had a browser opened and controlled by a custom webextension I made so that a solved challenge would be returned to a swarm of clients upon request
- It is neither intended nor even stated to be intended for security.
  
  Except, that's most of its ad copy on Google's own website?
  
  reCAPTCHA uses an advanced risk analysis engine and adaptive challenges to keep malicious software from engaging in abusive activities on your website. Meanwhile, legitimate users will be able to login, make purchases, view pages, or create accounts and fake users will be blocked.
  
  It's literally billed as a security measure for a website.
  
  https://www.google.com/recaptcha/about/
There's nothing that can express my disdain for Google's reCaptcha.

😒 We're training its AI models 😒 It's free labor for Google 😒 Sometimes it wants the corner of an object, sometimes it doesn't 😒 Wildly inconsistent 😒 Always blurry and hard to see 😒 Seemingly endless 😒 It's the robot asking us humans if we're the robots
The objective of reCAPTCHA (or any captcha) isn't to detect bots. It is more of stopping automated requests and rate limiting. The captcha is 'defeated' if the time complexity to solve it, whether human or bot, is less than what expected. Now humans are very slow, hence they can't beat them anyway.
- There are much better ways of rate limiting that don't steal labor from people.
  
  hCaptcha, Microsoft CAPTCHA all do the same. Can you give example of some that can't easily be overcome just by better compute hardware?
- […] reCAPTCHA […] isn’t to detect bots. It is more of stopping automated requests […]
  
  which is bots. bots do automated requests and every automated request doer can also be called a bot (i.e. web crawlers are called bots too and -if kind- also respect robots.txt which has "bots" in its name for this very reason and bots is the shortcut for robots) use of different words does not change reality behind it, but may add a fact of someone trying something on the other.
  
  There isn't a good way to classify human users with scripts without adding too much friction to normal use. Also bots are sometimes welcome amd useful, it's a problem when someone tries to mine data in large volume or effectively DoS the server.
  
  Forget bots, there exist centers in India and other countries where you can employ humans to do 'automated things' (youtube like count, watch hour for example) at the same expense of bots. There are similar CAPTCHA services too. Good luck with those :)
  
  Only rate limiting is the effective option.
- I thought captcha's worked in a way where they provided some known good examples, some known bad examples, and a few examples which aren't certain yet. Then the model is trained depending on whether the user selects the uncertain examples.
  
  Also it's very evident what's being trained. First it was obscured words for OCR, then Google Maps screenshots for detecting things, now you see them with clearly machine-generated images.
Google should bear the cost of detecting bots, rather than shifting it to users

how?
- Yeah. Written by someone who doesn’t really understand the internet.
  
  Considering the article states that reCAPTCHA v2 and v3 can be broken/bypassed by bots 70-100% of the time, they are obviously not the solution.
reCAPTCHA is exploiting users for profit

Well duh.

reCAPTCHA started out as a clever way to improve the quality of OCRing books for Distributed Proofreaders / Project Gutenberg. You know, giving to the community, improving access to public-domain texts. Then Google acquired them. Text CAPTCHAs got phased out. No more of that stuff, just computer vision rubbish to improve Google's own AI models and services.

If they had continued to depend on tasks that directly help community, Google would at least have had to constantly make sure the community's concerns are met. But if they only have to answer to themselves for the quality of the data and nobody else even gets to see it, well, of course it turned into yet another mildly neglected Google project.
- Then Google acquired them. Text CAPTCHAs got phased out
  
  Google kept the text version for five years after the acquisition though. They used it to digitize books on Google Books, to allow full-text search of their book archive.
I will gladly solve a reCAPTCHA for you today if you pay me for it today.
- There's platforms that do that.
  
  I can pay a service to auto solve captcha and anything that can't be solved will be pushed to a human to solve.
  
  Never actually used it but it was interesting learning it existed
Remember the good old days when it was just malformed text you have to solve? I miss those days. AI was complete garbage and they had to use farms of eyeballs to solve them for bots, making it a costly operation. We've now totally gotten away from all of that.

WE ARE THE EYEBALLS AND I AIN'T GETTING PAID IN WOW GOLD TO DO IT EITHER
- that was also to train ai.
  
  No it wasn't... It was human-assisted OCR to help digitize books. Initially for Project Gutenberg, but then for Google Books once Google acquired it in 2009.
Why is that no news to me? How did so many people not know that? Should I have spread the word more, even if all people I told that where likr “yea, yea, of course, but, what can I do? 🤷🏻‍♀️”?
I don't really get where this article is going. They are all over the place.

Let's start with a fuck google. They are a evil company. But:

Other captchas are also not very effective against bots. Arguably most traditional systems would be worst that recaptcha at fighting bots.

Recaptcha agent validation while a privacy violation is faster than solving any other captcha and if you are hit with the puzzle is not that much more time consuming that every other captcha.

That profit number is very questionable and they know it. Anyway, that's no much different and probably less profitable that most google services.

Also is ridiculous how someone can say in the same article that the image puzzle can be solved by bots 100% of the time and that is a scheme to get human labor to solve the puzzle. Am I the only one seeing the logical failure here?

And what's the purpose of all this? Just let bots roam free? Are they trying to sell other solution? What's the point?

I hate google as much as the next guy. But I don't really share this article spirit.

If I were to make a point. They point will be that people and companies should stop making registration only sites and dynamic sites when static websites are enough for their purposes. And only go for registration or other bot-vulnerable kind of sites of there is no way around it. But if you need to make a service that is vulnerable to bots, you need to protect it, and sadly there's not great solutions out there. If your site is small and not targeted by anyone malicious specifically you can get with simpler solutions. But bigger or targeted sites really can't get around needing google or cloudfare and assume that it will only mitigate the damage.

But if anyone knows a better and more ethical solution to prevent bot spam for a service that really need to have registrations, please tell me.
- Also worth noting that Google has always been extremely open about the fact that they use recaptcha for that purpose. It's never been a secret.
  
  Their service to the website owners is the meaningful reduction in effectiveness of bots in places bots are harmful. The website's service to you is the content that that's being used to protect (and the stuff that has recaptcha on it is stuff like games where there's a competitive advantage, things like search engines where there's a meaningful cost to heavy bot use, and login pages where there's a real security cost to mass bot use). I use a VPN, which increases the rate of captchas a lot, and I think it's a pretty reasonable way to do things, personally.
- Also is ridiculous how someone can say in the same article that the image puzzle can be solved by bots 100% of the time and that is a scheme to get human labor to solve the puzzle. Am I the only one seeing the logical failure here?
  
  Most solvers aren't bots. Logical, right?
Try the headphone option.
- Finally heard a clear audio CAPTCHA for the first time in my life this past month. It was glorious. There was slight garbling before and after the characters were read, but that’s it.
  
  Besides that singular experience, all audio CAPTCHAs have been utterly 100% impossible to interpret. Blaring white noise followed by a small squeak of “threeve” or “eleventeen”.
  
  My answer to this is give one word only.
  
  I've found them to be pretty clear usually. Half-formed words at start/end I just ignore. Either way, even on Firefox with uBlock and all the rest, audio captchas have always passed me first try even if I think I got it wrong. I don't like posting about it in-case they tighten it up after it gets more users.
No one makes a company use reCAPTCHA.
The conclusion can be extended that the true purpose of reCAPTCHA v2 is a free image-labeling labor and tracking cookie farm for advertising and data profit masquerading as a security service,” the paper declares.

I thought this was known since it came out. It seemed even more obvious when the images leaned in heavily to traffic related pictures like stoplights.
Gonna have to disagree hard with this, based on extensive first-hand experience (web dev). I've added CAPTCHA to dozens (hundreds?) of web forms, and it all but eliminates spam.
- Right, so similar to locks? Usually can be easily bypassed if you know how, but it at least filters out the people who aren't determined enough to put in the effort.
  
  Basically, yeah. The vast majority of spambots are simple and lazy.
- My experience matches yours. I don’t enjoy putting recapcha v3 on my sites but it takes contact form spam from 70-80 messages per day to 0-2.
  
  I’d switch to other services if they could be as effective. If anybody has real-world experience with another option working I’d love to hear it.
- Honestly at first read, the paper feels like a bunch of whining text to prove a point the author believes in without any alternate proposal.
- It works against basic bots, but if you've got a dedicated adversary, it doesn't do anything
  
  (Granted, most people do not have dedicated adversaries, but when they come, you're in trouble)
  
  OK, sure, but that's like saying it's pointless to use a secure password online because the NSA could hack you if they wanted to.
They were using us to label the data.
- That's why you always make sure that labeling is "garbage in" and label whatever
I thought the whole point of reCaptcha was to provide a reliable set of data to train bots. Entering a fuzzy scanned word, identifying bikes and traffic lights, etc.

The fact that they've now got that, and the bots are trained is hardly a surprise.

Without captchas the problem of spambots would still be a million times worse.
- Yup. I like Cloudflare's checkbox, it works well and probably catches more bots than reCaptcha while being simple for humans.
  
  How does that checkbox work? Does it just look at your cookies?
I had to deal with one yesterday that wouldn't let me in no matter what I did.

So it isn't even good at figuring out who isn't a robot.
- Solving too fast. I shit you not. Sometimes you have to go really slow. Like you're 80 and can't see very well trying to discern what's in those boxes.
  
  Fuck. This explains a lot of frustration I have experienced.
We already knew that, but it's nice re to have data.
reCAPTCHA v2 visual challenge images are all pre-labeled and user input plays no role in image labeling

That's funny, because when I'm faced with this, I keep adding/removing one of the image randomly and it keeps accepting them as ok.
- I like this strategy.
Does this work?

https://addons.mozilla.org/de/firefox/addon/noptcha/
- Judging from the reviews, it doesn't
  
  Ah, right, there are reviews too.
- I tried it before. It worked for me on one small game website for account creation. After that it was more or less useless on any other site. It has a weird focus thing where it'll try to solve the captcha before you can enter in login details so if by chance the extension works, you'll fail the login anyways.
  
  It still needs work. I think if the dev can work out those issues it could be great. Until then, it's pretty much worthless.
Is it only 7200 people solvning reCAPTCHA every hour for the past 13 years? Feels like it should be more?
I thought this was old news 20 years ago?
I thought it was detecting bots based on how you are moving your mouse, etc to solve it, but if they can be solved by AI do they want their AI trained by other AI?
Alright, I don't use google.com

Edit: this was in reply to someone. I guess my app fucked up the reply.
- But you might still be using their captcha
- Sites you visit use Google, their recaptcha, their analytics, their ads.
  
  How often do you get capchas?
  
  It doesn't happen often at all for me.
  
  Yup, and Epic Games' is the absolutely worst. I can't pass it on my phone regardless of what I do, and I can pass it occasionally on my desktop. I only claim their games, so if it stops working on the two computers it apparently likes, I'll probably stop visiting their site.
  
  It seems to have something to do with Firefox and/or my ad blocker.
I always thought they are just getting the training data for AI using these.
I mean, duh? With proof of work captchas existing, there's no reason to have those image selection captchas... Ever...

How those work is by having the server generate a puzzle. Server side this is cheap to generate, while client side solving is "hard". The server can even choose the difficulty of the puzzle, and even set it dynamically. This means that when your website is under light load the captcha can be really easy/fast to solve. If your website is under attack however the captcha can be set to take seconds to solve.
It is undoubtedly a new piece of research, but the cause is always the same: corporations exploit people because they are taken out of government and democratic control effectively everywhere.

Some corporations employ more people and have bigger budgets than some countries and they often influence people's lives more than the government. Yet they're effectively electoral monarchies where electors and monarchs are just a bunch of rich assholes who respond to nobody.

Only when we change that system then those headlines will stop.
I like them, it's a nice mini puzzle break built into my daily grind
Dropping this from Upper Echolon: https://youtu.be/IWUHv3S8JVI?si=KWxZLqJhEPSCXbNV
Sometimes I think writers just try to find things to be edgy about. The straws this grasps at it are incredible. Might as well complain from the billions of unpaid man hours people provide by providing common courtesy for free.
This is bullshit. Author is literally insane.

172 comments