The fact that users are encouraged to include text descriptions with media content makes it perfect training data for AI.
The fact that users are encouraged to include text descriptions with media content makes it perfect training data for AI.
Lemmy hates AI.
I'm fully supportive of the accessibility for persons with disabilities, to be clear. It's ironic though. Does Lemmy's open source code make it easier for bots to scrape it?
It's not perfect training data. Being encouraged to add alt text and actually doing it are two different things. Writing good alt text is another matter all together. And anything that's on the internet is training data whether people want it to be or not. The only difference is ethical whether the scraper accepts and respects a version of robots dot txt, i.e. "do not scrape," that communicates the training data's holders' intentions. And if they torrent books you can guess how respectful they are.
At this point all the imagery data they need is already out there. Not like your picture of a cat you post to Lemmy is gonna help these companies make a better model.