‘Impossible’ to create AI tools like ChatGPT without copyrighted material, OpenAI says
‘Impossible’ to create AI tools like ChatGPT without copyrighted material, OpenAI says
 
 ‘Impossible’ to create AI tools like ChatGPT without copyrighted material, OpenAI says

‘Impossible’ to create AI tools like ChatGPT without copyrighted material, OpenAI says::Pressure grows on artificial intelligence firms over the content used to train their products
OK, so pay for it.
Pretty simple really.
Or let's use this opportunity to make copyright much less draconian.
¿Porque no los dos?
I don't understand why people are defending AI companies sucking up all human knowledge by saying "well, yeah, copyrights are too long anyway".
Even if we went back to the pre-1976 term of 28 years, renewable once for a total of 56 years, there's still a ton of recent works that AI are using without any compensation to their creators.
I think it's because people are taking this "intelligence" metaphor a bit too far and think if we restrict how the AI uses copyrighted works, that would restrict how humans use them too. But AI isn't human, it's just a glorified search engine. At least all standard search engines do is return a link to the actual content. These AI models chew up the content and spit out something based on it. It simply makes sense that this new process should be licensed separately, and I don't care if it makes some AI companies go bankrupt. Maybe they can work adequate payment for content into their business model going forward.
I'm no fan of the current copyright law - the Statute of Anne was much better - but let's not kid ourselves that some of the richest companies in the world have any desire what so ever to change it.
As long as capitalism exist in society, just being able go yoink and taking everyone's art will never be a practical rule set.
Every work is protected by copyright, unless stated otherwise by the author.
If you want to create a capable system, you want real data and you want a wide range of it, including data that is rarely considered to be a protected work, despite being one.
I can guarantee you that you're going to have a pretty hard time finding a dataset with diverse data containing things like napkin doodles or bathroom stall writing that's compiled with permission of every copyright holder involved.
How hard it is doesn't matter. If you can't compensate people for using their work, or excluding work people don't want users, you just don’t get that data.
There's plenty of stuff in the public domain.
Sounds like a OpenAI problem and not an us problem.
I never said it was going to be easy - and clearly that is why OpenAI didn't bother.
If they want to advocate for changes to copyright law then I'm all ears, but let's not pretend they actually have any interest in that.
You make this sound like a bad thing.
And why is that a bad thing?
Why are you entitled to other peoples work, just because “it’s hard to find data”?