How many pages has a human author read and written before they can produce something worth publishing? I’m pretty sure that’s not even a million pages. Why does an AI require a gazillion pages to learn, but the quality is still unimpressive? I think there’s something fundamentally wrong with the way we teach these models.
To be fair, that's all they have to go on. If a picture's worth a thousand words, how many pages is a lifetime (or even a childhood) of sight and sound?
Why does an AI require a gazillion pages to learn, but the quality is still unimpressive?
Because humans learn how to read and interpret those pages in school. Give that book to a toddler and not much will happen other than some bite marks.
AI needs to learn the language structure, grammar, math, logic, reasoning, problem solving and much more before it can even be trained with anything useful. Humans take years to acquire those skills, AI takes more content but can do that training much faster.
Maybe it is the wrong way to train machines but for now we have not invented robot schools yet so it's the best we got.
By the way, I still think companies should be banned from training with copyrighted content and user data behind closed doors. Keep your models in public domain or get out.
The more important question is: Why can a human absorb a ton of material in their learning without anyone crying about them "stealing"? Why shouldn't the same go for AI? What's the difference? I really don't understand the common mindset here. Is it because a trained AI is used for profit?
What you're talking about is if AI is actually inventing new work (imo, yes it is), but that's not the issue.
The issue is these models were trained on our collective knowledge & culture without permission, then sold back to us.
Unless they use only proprietary & public training data, every single one of these models should be open sourced/weighted & free for anyone to use, like libraries.
There is a difference between me reading a book and learning from it and one of the biggest companies in the world pirating millions of books for their business. And it really gets bad when normal users are getting sued for tenthousands of dollars when they download a book or a MP3 and Meta is getting defended for doing the same thing, but in a much larger scale.
Yes, we know that copyright is broken. But if it is broken, it has to be broken for all
It is because a human artist is usually inspired and uses knowledge to create new art and AI is just a mediocre mimic. A human artist doesn't accidentally put six fingers on people on a regular basis. If they put fewer fingers it is intentional.
I’ve been thinking about that as well. If an author has bought 500 books, and read them, it’s obviously going to influence the books they write in the future. There’s nothing illegal about that. Then again, they did pay for the books, so I guess that makes it fine.
What if they got the books from a library? Well, they probably also paid taxes, so that makes it ok.
What if they pirated those books? In that case, the pirating part is problematic, but I don’t think anyone will sue the author for copying the style of LOTR in their own works.