Thousands of authors demand payment from AI companies for use of copyrighted works

How can they prove that not some abstract public data has been used to train algorithms, but their particular intellectual property?
- Well, if you ask e.g. ChatGPT for the lyrics to a song or page after page of a book, and it spits them out 1:1 correct, you could assume that it must have had access to the original.
  
  Or at least excerpts from it. But even then, it's one thing for a person to put up a quote from their favourite book on their blog, and a completely different thing for a private company to use that data to train a model, and then sell it.
  
  Can it recreate anything 1:1? When both my wife and I tried to get them to do that they would refuse, and if pushed they would fail horribly.
  
  you could assume that it must have had access to the original.
  I don't know if that's true. If Google grabs that book from a pirate site. Then publishes the work as search results. ChatGPT grabs the work from Google results and cobbles it back together as the original.
  Who's at fault?
  I don't think it's a straight forward ChatGPT can reproduce the work therefore it stole it.
- there are a lot of possible ways to audit an AI for copyrighted works, several of which have been proposed in the comments here, but what this could lead to is laws requiring an accounting log of all material that has been used to train an AI as well as all copyrights and compensation, etc.
- Not without some seriously invasive warrants! Ones that will never be granted for an intellectual property case.
  Intellectual property is an outdated concept. It used to exist so wealthier outfits couldn't copy your work at scale and muscle you out of an industry you were championing.
  It simply does not work the way it was intended. As technology spreads, the barrier for entry into most industries wherein intellectual property is important has been all but demolished.
  i.e. 50 years ago: your song that your band performed is great. I have a recording studio and am gonna steal it muahahaha.
  Today: "anyone have an audio interface I can borrow so my band can record, mix, master, and release this track?"
  Intellectual property ignores the fact that, idk, Issac Newton and Gottfried Wilhelm Leibniz both independently invented calculus at the same time on opposite ends of a disconnected globe. That is to say, intellectual property doesn't exist.
  Ever opened a post to make a witty comment to find someone else already made the same witty comment? Yeah. It's like that.
  
  Spoken by someone who has never had something you've worked years on, be stolen.
- Personally speaking, I've generated some stupid images like different cities covered in baked beans and have had crude watermarks generate with them where they were decipherable enough that I could find some of the source images used to train the ai. When it comes to photo realistic image generation, if all the ai does is mildly tweak the watermark then it's not too hard to trace back.
  
  All but a very small few generative AI programs use completely destructive methods to create their models. There is no way to recover the training images outside of infantesimally small random chance.
  What you are seeing is the AI recognising that images of the sort you are asking for generally include watermarks, and creating one of its own.
  
  Do you have examples? It should only happen in case of overfitting, i.e. too many identical image for the same subject
- I'd think that given the nature of the language models and how the whole AI thing tends to work, an author can pluck a unique sentence from one of their works, ask AI to write something about that, and if AI somehow 'magically' writes out an entire paragraph or even chapter of the author's original work, well tada, AI ripped them off.
- I think that to protect creators they either need to be transparent about all content used to train the AI (highly unlikely) or have a disclaimer of liability, wherein if original content has been used is training of AI then the Original Content creator who have standing for legal action.
  The only other alternative would be to insure that the AI specifically avoid copyright or trademarked content going back to a certain date.
  
  Why a certain date? That feels arbitrary
- They can't. All they could prove is that their work is part of a dataset that still exists.

There is already a business model for compensating authors: it is called buying the book. If the AI trainers are pirating books, then yeah - sue them.
There are plagiarism and copyright laws to protect the output of these tools: if the output is infringing, then sue them. However, if the output of an AI would not be considered infringing for a human, then it isn’t infringement.
When you sell a book, you don’t get to control how that book is used. You can’t tell me that I can’t quote your book (within fair use restrictions). You can’t tell me that I can’t refer to your book in a blog post. You can’t dictate who may and may not read a book. You can’t tell me that I can’t give a book to a friend. Or an enemy. Or an anarchist.
Folks, this isn’t a new problem, and it doesn’t need new laws.
- It's 100% a new problem. There's established precedent for things costing different amounts depending on their intended use.
  For example, buying a consumer copy of song doesn't give you the right to play that song in a stadium or a restaurant.
  Training an entire AI to make potentially an infinite number of derived works from your work is 100% worthy of requiring a special agreement. This even goes beyond simple payment to consent; a climate expert might not want their work in an AI which might severely mischatacterize the conclusions, or might want to require that certain queries are regularly checked by a human, etc
  
  Well, fine, and I can't fault new published material having a "no AI" clause in its term of service. But that doesn't mean we get to dream this clause into being retroactively for all the works ChatGPT was trained on. Even the most reasonable law in the world can't be enforced on someone who broke it 6 months before it was legislated.
  Fortunately the "horses out the barn" effect here is maybe not so bad. Imagine the FOMO and user frustration when ToS & legislation catch up and now ChatGPT has no access to the latest books, music, news, research, everything. Just stuff from before authors knew to include the "hands off" clause - basically like the knowledge cutoff, but forever. It's untenable, OpenAI will be forced to cave and pay up.
  
  My point is that the restrictions can’t go on the input, it has to go on the output - and we already have laws that govern such derivative works (or reuse / rebroadcast).
  
  The thing is, copyright isn't really well-suited to the task, because copyright concerns itself with who gets to, well, make copies. Training an AI model isn't really making a copy of that work. It's transformative.
  Should there be some kind of new model of renumeration for creators? Probably. But it should be a compulsory licensing model.
- When you sell a book, you don’t get to control how that book is used.
  This is demonstrably wrong. You cannot buy a book, and then go use it to print your own copies for sale. You cannot use it as a script for a commercial movie. You cannot go publish a sequel to it.
  Now please just try to tell me that AI training is specifically covered by fair use and satire case law. Spoiler: you can’t.
  This is a novel (pun intended) problem space and deserves to be discussed and decided, like everything else. So yeah, your cavalier dismissal is cavalierly dismissed.
  
  I completely fail to see how it wouldn't be considered transformative work
  
  No, you misunderstand. Yes, they can control how the content in the book is used - that’s what copyright is. But they can’t control what I do with the book - I can read it, I can burn it, I can memorize it, I can throw it up on my roof.
  My argument is that the is nothing wrong with training an AI with a book - that’s input for the AI, and that is indistinguishable from a human reading it.
  Now what the AI does with the content - if it plagiarizes, violates fair use, plagiarizes- that’s a problem, but those problems are already covered by copyright laws. They have no more business saying what can or cannot be input into an AI than they can restrict what I can read (and learn from). They can absolutely enforce their copyright on the output of the AI just like they can if I print copies of their book.
  My objection is strictly on the input side, and the output is already restricted.
  
  It's specifically distribution of the work or derivatives that copyright prevents.
  So you could make an argument that an LLM that's memorized the book and can reproduce (parts of) it upon request is infringing. But one that's merely trained on the book, but hasn't memorized it, should be fine.
- This is a little off, when you quote a book you put the name of the book you’re quoting. When you refer to a book, you, um, refer to the book?
  I think the gist of these authors complaints is that a sort of “technology laundered plagiarism” is occurring.
  
  Copyright 100% applies to the output of an AI, and it is subject to all the rules of fair use and attribution that entails.
  That is very different than saying that you can’t feed legally acquired content into an AI.
- I asked Bing Chat for the 10th paragraph of the first Harry Potter book, and it gave me this:
  "He couldn’t know that at this very moment, people meeting in secret all over the country were holding up their glasses and saying in hushed voices: ‘To Harry Potter – the boy who lived!’"
  It looks like technically I might be able to obtain the entire book (eventually) by asking Bing the right questions?
  
  Then this is a copyright violation - it violates any standard for such, and the AI should be altered to account for that.
  What I’m seeing is people complaining about content being fed into AI, and I can’t see why that should be a problem (assuming it was legally acquired or publicly available). Only the output can be problematic.
- However, if the output of an AI would not be considered infringing for a human, then it isn’t infringement.
  It's an algorithm that's been trained on numerous pieces of media by a company looking to make money of it. I see no reason to give them a pass on fairly paying for that media.
  You can see this if you reverse the comparison, and consider what a human would do to accomplish the task in a professional setting. That's all an algorithm is. An execution of programmed tasks.
  If I gave a worker a pirated link to several books and scientific papers in the field, and asked them to synthesize an overview/summary of what they read and publish it, I'd get my ass sued. I have to buy the books and the scientific papers. STEM companies regularly pay for access to papers and codes and standards. Why shouldn't an AI have to do the same?
  
  If I gave a worker a pirated link to several books and scientific papers in the field, and asked them to synthesize an overview/summary of what they read and publish it, I’d get my ass sued. I have to buy the books and the scientific papers.
  Well, if OpenAI knowingly used pirated work, that's one thing. It seems pretty unlikely and certainly hasn't been proven anywhere.
  Of course, they could have done so unknowingly. For example, if John C Pirate published the transcripts of every movie since 1980 on his website, and OpenAI merely crawled his website (in the same way Google does), it's hard to make the case that they're really at fault any more than Google would be.
  
  It’s an algorithm that’s been trained on numerous pieces of media by a company looking to make money of it.
  If I read your book... and get an amazing idea... Turn it into a business and make billions off of it. You still have no right to anything. This is no different.
  If I gave a worker a pirated link to several books and scientific papers in the field
  There's been no proof or evidence provided that ANY content was ever pirated. Has any of the companies even provided the dataset they've used yet?
  Why is this the presumption that they did it the illegal way?
- There is already a business model for compensating authors: it is called buying the book. If the AI trainers are pirating books, then yeah - sue them.
  That's part of the allegation, but it's unsubstantiated. It isn't entirely coherent.
  
  It's not entirely unsubstantiated. Sarah Silverman was able to get ChatGPT to regurgitate passages of her book back to her.

You know what would solve this? We all collectively agree this fucking tech is too important to be in the hands of a few billionaires, start an actual public free open source fully funded and supported version of it, and use it to fairly compensate every human being on Earth according to what they contribute, in general?
Why the fuck are we still allowing a handful of people to control things like this??
- No entity on the planet has more money than our governments. It’d be more efficient for a government to fund this than any private company.
  
  There is nothing objectively wrong with your statement. However, we somehow always default to solving that issue by having some dragon hoard enough gold, and there is something objectively wrong with that.
- Setting aside the obvious answer of "because capitalism", there are a lot of obstacles towards democratizing this technology. Training of these models is done on clusters of A100 GPU's, which are priced at $10,000USD each. Then there's also the fact that a lot of the progress being made is being done by highly specialized academics, often with the resources of large corporations like Microsoft.
  Additionally the curation of datasets is another massive obstacle. We've mostly reached the point of diminishing returns of just throwing all the data at the training of models, it's quickly becoming apparent that the quality of data is far more important than the quantity of the data (see TinyStories as an example). This means a lot of work and research needs to go into qualitative analysis when preparing a dataset. You need a large corpus of input, each of which are above a quality threshold, but then also as a whole they need to represent a wide enough variety of circumstances for you to reach emergence in the domain(s) you're trying to train for.
  There is a large and growing body of open source model development, but even that only exists because of Meta "leaking" the original Llama models, and now more recently releasing Llama 2 with a commercial license. Practically overnight an entire ecosystem was born creating higher quality fine-tunes and specialized datasets, but all of that was only possible because Meta invested the resources and made it available to the public.
  Actually in hindsight it looks like the answer is still "because capitalism" despite everything I've just said.
  
  I know the answer to pretty much all of our “why the hell don’t we solve this already?” questions is: capitalism.
  But I mean, as Lrrr would say “why does the working class, as the biggest of the classes, doesn’t just eat the other one?”.
- Because we shy away from responsibility.
  
  I think the longer response to this is more accurate. It’s more “because capitalism” than anything else.
  And capitalism over the course of the 20th century made very successful attempts of alienating completely the working class and destroying all class consciousness or material awareness.
  So people keep thinking that the problems is we as individuals are doing capitalism wrong. Not capitalism.
- You think it is so simple you can just download it and run it on your laptop?
  
  You kind of can though? The bigger models aren't really more complicated, just bigger. If you can cram enough ram or swap into a laptop, lamma.cpp will get there eventually.

Someone should AGPL their novel and force the AI company to open source their entire neural network.

This is a good debate about copyright/ownership. On one hand, yes, the authors works went into 'training' the AI..but we would need a scale to then grade how well a source piece is good at being absorbed by the AI's learning. for example. did the AI learn more from the MAD magazine i just fed it or did it learn more from Moby Dick? who gets to determine that grading system. Sadly musicians know this struggle. there are just so many notes and so many words. eventually overlap and similiarities occur. but did that musician steal a riff or did both musicians come to a similar riff seperately? Authors dont own words or letters so a computer that just copies those words and then uses an algo to write up something else is no more different than you or i being influenced by our favorite heroes or i formation we have been given. do i pay the author for reading his book? or do i just pay the store to buy it?
- Copyright laws are really out of control at this point. Their periods are far too long and, like you said, how can anyone claim to truly be original at this point? A dedicated lawyer can find reasonable prior art for pretty much anything nowadays. The only reason old sources look original is because no records exist of the sources they used.

While I am rooting for authors to make sure they get what they deserve, I feel like there is a bit of a parallel to textbooks here. As an engineer if I learn about statics from a text book and then go use that knowledge to he'll design a bridge that I and my company profit from, the textbook company can't sue. If my textbook has a detailed example for how to build a new bridge across the Tacoma Narrows, and I use all of the same design parameters for a real Tacoma Narrows bridge, that may have much more of a case.
- It's not really a parallel.
  The text books don't have copyrights on the concepts and formulae they teach. They only have copyrights for the actual text.
  If you memorize the text book and write it down 1:1 (or close to it) and then sell that text you wrote down, then you are still in violation of the copyright.
  And that's what the likes of ChatGPT are doing here. For example, ask it to output the lyrics for a song and it will spit out the whole (copyrighted) lyrics 1:1 (or very close to it). Same with pages of books.
  
  The memorization is closer to that of a fanatic fan of the author. It usually knows the beginning of the book and the more well known passages, but not entire longer works.
  By now, ChatGPT is trying to refuse to output copyrighted materials know even where it could, and though it can be tricked, they appear to have implemented a hard filter for some more well known passages, which stops generation a few words in.
- I think that these are fiction writers. The maths you'd use to design that bridge is fact and the book company merely decided how to display facts. They do not own that information, whereas the Handmaid's Tale was the creation of Margaret Atwood and was an original work.
- But you paid for the textbook
  
  Libraries exist
- You have a point but there's a pretty big difference between something like a statistics textbook and the novel "Dune" for instance. One was specifically written to teach mostly pre-existing ideas and the other was created as entertainment to sell to a wide an audience as possible.
- Plagiarism filters frequently trigger on chatgpt written books and articles.
- An AI analyzes the words of a query and generates its response(s) based on word-use probabilities derived from a large corpus of copyrighted texts. This makes its output derivative of those texts in a way that someone applying knowledge learned from the texts is not.
  
  Why, though?
  Is it because we can't explain the causal relationships between the words in the text and the human's output or actions?
  If a very good neuroscientist traced out the engineer's brain and could prove that, actually, if it wasn't for the comma on page 73 they wouldn't have used exactly this kind of bolt in the bridge, now is the human's output derivative of the text?
  Any rule we make here should treat people who are animals and people who are computers the same.
  And even regardless of that principle, surely a set of AI weights is either not copyrightable or else a sufficiently transformative use of almost anything that could go into it? If it decides to regurgitate what it read, that output could be infringing, same as for a human. But a mere but-for causal connection between one work and another can't make text that would be non-infringing if written by a human suddenly infringing because it was generated automatically.

Yea sure, right after Google and Amazon pay me for all the data they've stolen from me. LOL

So what's the difference between a person reading their books and using the information within to write something and an ai doing it?
- Because AIs aren't inspired by anything and they don't learn anything
  
  So uninspired writing is illegal?
  
  Language models actually do learn things in the sense that: the information encoded in the training model isn't usually* taken directly from the training data; instead, it's information that describes the training data, but is new. That's why it can generate text that's never appeared in the data.
  the bigger models seem to remember some of the data and can reproduce it verbatim; but that's not really the goal.
  
  What does inspiration have to do with anything? And to be honest, humans being inspired has led to far more blatant copyright infringement.
  As for learning, they do learn. No different than us, except we learn silly abstractions to make sense of things while AI learns from trial and error. Ask any artist if they've ever looked at someone else's work to figure out how to draw something, even if they're not explicitly looking up a picture, if they've ever seen a depiction of it, they recall and use that. Why is it wrong if an AI does the same?
- the person bought the book before reading it
- A person is human and capable of artistry and creativity, computers aren’t. Even questioning this just means dehumanizing artists and art in general.
  
  Not being allowed to question things is a really shitty precedent, don't you think?
- Large language models can only calculate the probability that words should go together based on existing texts.
  
  Isn't this correct? What's missing?
  Let's ask chatGPT3.5:
  Mostly accurate. Large language models like me can generate text based on patterns learned from existing texts, but we don't "calculate probabilities" in the traditional sense. Instead, we use statistical methods to predict the likelihood of certain word sequences based on the training data.

All this copyright/AI stuff is so silly and a transparent money grab.
They're not worried that people are going to ask the LLM to spit out their book; they're worried that they will no longer be needed because a LLM can write a book for free. (I'm not sure this is feasible right now, but maybe one day?) They're trying to strangle the technology in the courts to protect their income. That is never going to work.
Notably, there is no "right to control who gets trained on the work" aspect of copyright law. Obviously.
- There is nothing silly about that. It's a fundamental question about using content of any kind to train artificial intelligence that affects way more than just writers.
- I seriously doubt Sarah Silverman is suing OpenAI because she's worried ChatGPT will one day be funnier than she is. She just doesn't want it ripping off her work.
  
  What do you mean when you say "ripping off her work"? What do you think an LLM does, exactly?
- Designing and marketing a system to plagiarize works en masse? That's the cash grab.
  
  Can you elaborate on this concept of a LLM "plagiarizing"? What do you mean when you say that?

I don't know how I feel about this honestly. AI took a look at the book and added the statistics of all of its words into its giant statistic database. It doesn't have a copy of the book. It's not capable of rewriting the book word for word.
This is basically what humans do. A person reads 10 books on a subject, studies become somewhat of a subject matter expert and writes their own book.
Artists use reference art all the time. As long as they don't get too close to the original reference nobody calls any flags.
These people are scared for their viability in their user space and they should be, but I don't think trying to put this genie back in the bottle or extra charging people for reading their stuff for reference is going to make much difference.
- It’s not at all like what humans do. It has no understanding of any concepts whatsoever, it learns nothing. It doesn’t know that it doesn’t know anything even. It’s literally incapable of basic reasoning. It’s essentially taken words and converted them to numbers, and then it examines which string is likely to follow each previous string. When people are writing, they aren’t looking at a huge database of information and determining the most likely word to come next, they’re synthesizing concepts together to create new ones, or building a narrative based on their notes. They understand concepts, they understand definitions. An AI doesn’t, it doesn’t have any conceptual framework, it doesn’t even know what a word is, much less the definition of any of them.
  
  How can you tell that our thoughts don't come from a biological LLM? Maybe what we conceive as "understanding" is just a feeling emerging from a more fondamental mechanism like temperature emerges from the movement of particles.
  
  When people are writing, they aren’t looking at a huge database of information and determining the most likely word to come next, they’re synthesizing concepts together to create new ones, or building a narrative based on their notes. They understand concepts, they understand definitions.
  A huge part of what we do is like drawing from a huge mashup of accumulated patterns though. When an image or phrase pops into your head fully formed, on the basis of things that you have seen and remembered, isn't that the same sort of thing as what AI does? Even though there are (poorly understood) differences between how humans think and what machine learning models do, the latter seems similar enough to me that most uses should be treated by the same standard for plagiarism; only considered violating if the end product is excessively similar to a specific copyrighted work, and not merely because you saw a copyrighted work and that pattern being in your brain affected what stuff you spontaneously think of.
  
  I don't think this is true.
  The models (or maybe the characters in the conversations simulated by the models) can be spectacularly bad at basic reasoning, and misunderstand basic concepts on a regular basis. They are of course completely insane; the way they think is barely recognizable.
  But they also, when asked, are often able to manipulate concepts or do reasoning and get right answers. Ask it to explain the water cycle like a pirate, and you get that. You can find the weights that make the Eifel Tower be in Paris and move it to Rome, and then ask for a train itinerary to get there, and it will tell you to take the train to Rome.
  I don't know what "understanding" something is other than to be able to get right answers when asked to think about it. There's some understanding of the water cycle in there, and some of pirates, and some of European geography. Maybe not a lot. Maybe it's not robust. Maybe it's superficial. Maybe there are still several differences in kind between whatever's there and the understanding a human can get with a brain that isn't 100% the stream of consciousness generator. But not literally zero.
  
  I didn't say what you said, that's a lot of words and concepts you're attributing to me that I didn't say.
  I'm saying, LLM ingests data in a way it can average it out, in essence it learns it. It's not wrote memorization, but it's not truly reasoning either, though it's approaching it if you consider we might be overestimating human comprehension. It pulls in the data from all the places and uses the data to create new things.
  People pull in data over a decade or two, we learn it, then end up writing books, or applying the information to work. They're smart and valuable people and we're glad they read everyone's books.
  The LLM ingests the data and uses the statistics behind it to do work, the world is ending.
  
  I think you underestimate the reasoning power of these AIs. They can write code, they can teach math, they can even learn math.
  I've been using GPT4 as a math tutor while learning linear algebra, and I also use a text book. The text book told me that (to write it out) "the column space of matrix A is equal to the column space of matrix A times its own transpose". So I asked GPT4 if that was true and it said no, GPT disagreed with the text book. This was apparently something that GPT did not memorize and it was not just regurgitating sentences. I told GPT I saw it in a text book, the AI said "sorry, the textbook must be wrong". I then explained the mathematical proof to the AI, and the AI apologized, admitted it had been wrong, and agreed with the proof. Only after hearing the proof did the AI agree with the text book. This is some pretty advanced reasoning.
  I performed that experiment a few times and it played out mostly the same. I experimented with giving the AI a flawed proof (I purposely made mistakes in the mathematical proofs), and the AI would call out my mistakes and would not be convinced by faulty proofs.
  A standard that judged this AI to have "no understanding of any concepts whatsoever", would also conclude the same thing if applied to most humans.

I think this is more about frustration experienced by artists in our society at being given so little compensation.
The answer is staring us in the face. UBI goes hand in hand with developments in AI. Give artists a basic salary from the government so they can afford to live well. This isn't a AI problem this is a broken society problem. I support artists advocating for themselves, but the fact that they aren't asking for UBI really speaks to how hopeless our society feels right now.
- What incentive is there at all to work with UBI? Why would anyone try hard at anything if you're not rewarded?
  
  There will always be people who seek to challenge themselves.
  Others will want more money than is included with their UBI. What on earth would be wrong with people having a little more, as opposed to so many struggling, needing roommates, and so on? I imagine with an extra 1k-2k in their pocket monthly, a lot more people would buy or build housing, and a lot of service industries would boom with all the additional potentially disposable income.
  Or how about people being able to retire, like actually retire, without stress. We could lower the retirement age, or people could retire independently from government assistance, leading to more available jobs for younger people as more roles transition away due to automation.
  And frankly, I honestly don't see anything wrong with some portion of the populace just living on UBI and enjoying life if that's how they want to do things. Nothing wrong with people being happier, less stressed, and potentially mentally and/or physically healthier for it.
  
  Also I think it's funny that we can bail out large companies on repeat, but bailing out people is a show stopper. It's backwards. The economy is supposed to serve us, not the other way around.
  
  Good question. I'll admit that I like UBI, but I haven't done any serious reading into it. I have a break from work this month so maybe I'll try and find a book so I can answer this type of question better in the future. I don't think it's so much an issue for good jobs, but the real shit jobs might be an issue, but maybe not, and UBI in the beginning wouldn't just be for everyone it would be for selected groups.
  A brief search gave this: https://www.vox.com/future-perfect/2020/2/19/21112570/universal-basic-income-ubi-map
  Some gains:
  lower crime
  increased fertility (maybe a good thing idk?)
  decrease or eliminate extreme poverty
  improves education
  
  Big surprise, people do things despite not being paid for them!
  Also a UBI should be just enough to live (afford food and shelter) wherever you live. Then you can work for more.
  UBI is about freeing people from having to work multiple dead end jobs just to survive and enables them to have an actual pursuit of happiness. Not everyone will want to work harder, but the option opens to those who do.
  Currently if you’re struggling just to pay for food and shelter, it’s incredibly hard to spend time developing skills needed to make more.
  
  To answer you seriously (and these are out of date figures from memory) that in Australia all it would take to give everyone UBI is to tax every dollar outside of that people make starting at 30%. (Currently its 19% after your first 18k. goes to 32% over 45k and 37% over 120k and 45% after that)
  The positives are that
  People can retire younger, meaning upward job mobility is greatly improved.
  The cost of means testing, managing welfare and aged pensions and combating fraud of those systems effectively vanishes.
  Students could afford to study full time and work only part time or not at all AND pay their rent meaning a better educated population.
  It effectively combats the minimum wage being too low. Its ok for part time baristas to make the minimum when the govt is making sure that people are already at "survival".
  It indirectly funds the arts. Lets be real, how many great musicians had to stop chasing their dreams because it was "practice or go to work and eat.
  For example. Some guy making 100k a year and paying about $25k in tax currently. Under the 30% arrangement would pay 30k in tax, but the ubi would pay about $20k a year. So still $15k in front. I'm no accountant but I think for you to be worse off you have to be on about $200k a year or more.

This is tough. I believe there is a lot of unfair wealth concentration in our society, especially in the tech companies. On the other hand, I don't want AI to be stifled by bad laws.
If we try to stop AI, it will only take it away from the public. The military will still secretly use it, companies might still secretly use it. Other countries will use it and their populations will benefit while we languish.
Our only hope for a happy ending is to let this technology be free and let it go into the hands of many companies and many individuals (there are already decent models you can run on your own computer).
- So, in your "only hope for a happy ending" scenario, how do the artists get paid? Or will we no longer need them after AI runs everything ;)
  
  I don't know. I only believe that things will be worse if individuals cannot control these AIs.
  Maybe these AI have reached a peak (at least for now), and so they aren't good enough to write a compelling novel. In that case, writers who produce good novels and get lucky will still get paid, because people will want to buy their work and read it.
  Or maybe AI will quickly surpass all humans in writing ability, in which case, there's not much we can do. If the AI produces books that are better, then people will want AI produced books. They might have to get those from other countries, or they might have to get them from a secret AI someone is running on a beefy computer in their basement. If AI surpasses humans then that's not a happy day for writers, no way around it. Still, an AI that surpasses humans might help people in other ways, but only if we allow everyone to have and control their own AI.
  As the industrial revolution threatened to swallow society Carl Marx wrote about how important it was that regular people be able to control "the means of production". At least that part of his philosophy has always resonated with me, because I want to be empowered as an individual, I want the power to create and compete in our society. It's the same now, AI threatens to swallow society and I want to be able to control my own AI for my own purposes.
  If strong AI is coming, it's coming. If AI is going to be the source of power in society then I want regular people to have access to that power. It's not yet clear whether this is the case, but if strong AI is coming it's going to be big, and writers complaining about pay isn't going to stop it.
  All that said, I believe we do a terrible job of caring for individuals in our society. We need more social safety nets, we need to change to provide people better and happier lives. So I'm not saying "forget the writers, let them stave".
  
  To be honest, I don't think AI is going to get good enough to replace human creativity. Sure, some of the things that AI can do is pretty nice, but these things are mostly already solved problems - sure, AI can make passable art, but so can humans - and they can go further than art, they can create logical connections between art pieces, and extrapolate art in new reasonably justified ways, instead of the direction-less, grasping in the dark methods that AI seems to do it in.
  Sure, AI can make art a thousand times faster than a person, but if only one in thousand is tolerably good, then what's the problem?

Obligatory xkcd: https://xkcd.com/827/

Isn’t learning the basic act of reading text? I’m not sure what the AI companies are doing is completely right but also, if your position is that only humans can learn and adapt text, that broadly rules out any AI ever.
- Isn’t learning the basic act of reading text?
  not even close. that's not how AI training models work, either.
  if your position is that only humans can learn and adapt text
  nope-- their demands are right at the top of the article and in the summary for this post:
  Thousands of authors demand payment from AI companies for use of copyrighted works::Thousands of published authors are requesting payment from tech companies for the use of their copyrighted works in training artificial intelligence tools
  that broadly rules out any AI ever
  only if the companies training AI refuse to pay
  
  Isn’t learning the basic act of reading text?
  not even close. that’s not how AI training models work, either.
  Of course it is. It's not a 1:1 comparison, but the way generative AI works and the we incorporate styles and patterns are more similar than not. Besides, if a tensorflow script more closely emulated a human's learning process, would that matter for you? I doubt that very much.
  Thousands of authors demand payment from AI companies for use of copyrighted works::Thousands of published authors are requesting payment from tech companies for the use of >> their copyrighted works in training artificial intelligence tools
  Having to individually license each unit of work for a LLM would be as ridiculous as trying to run a university where you have to individually license each student reading each textbook. It would never work.
  What we're broadly talking about is generative work. That is, by absorbing one a body of work, the model incorporates it into an overall corpus of learned patterns. That's not materially different from how anyone learns to write. Even my use of the word "materially" in the last sentence is, surely, based on seeing it used in similar patterns of text.
  The difference is that a human's ability to absorb information is finite and bounded by the constraints of our experience. If I read 100 science fiction books, I can probably write a new science fiction book in a similar style. The difference is that I can only do that a handful of times in a lifetime. A LLM can do it almost infinitely and then have that ability reused by any number of other consumers.
  There's a case here that the renumeration process we have for original work doesn't fit well into the AI training models, and maybe Congress should remedy that, but on its face I don't think it's feasible to just shut it all down. Something of a compulsory license model, with the understanding that AI training is automatically fair use, seems more reasonable.
  
  Okay, given that AI models need to look over hundreds of thousands if not millions of documents to get to a decent level of usefulness, how much should the author of each individual work get paid out?
  Even if we say we are going to pay out a measly dollar for every work it looks over, you’re immediately talking millions of dollars in operating costs. Doesn’t this just box out anyone who can’t afford to spend tens or even hundreds of millions of dollars on AI development? Maybe good if you’ve always wanted big companies like Google and Microsoft to be the only ones able to develop these world-altering tools.
  Another issue, who decides which works are more valuable, or how? Is a Shel Silverstein book worth less than a Mark Twain novel because it contains less words? If I self publish a book, is it worth as much as Mark Twains? Sure his is more popular but maybe mine is longer and contains more content, whats my payout in this scenario?
- A key point is that intellectual property law was written to balance the limitations of human memory and intelligence, public interest, and economic incentives. It's certainly never been in perfect balance. But the possibility of a machine being able to consume enormous amounts of information in a very short period of time has never been a variable for legislators. It throws the balance off completely in another direction.
  There's no good way to resolve this without amending both our common understanding of how intellectual property should work and serve both producers and consumers fairly, as well as our legal framework. The current laws are simply not fit for purpose in this domain.
  
  I very much agree.
  
  Nothing about todays iteration of copyright is reasonable or good for us. And in any other context, this (relatively) leftist forum would clamour to hate on copyright. But since it could now hurt a big corporation, suddenly copyright is totally cool and awesome.
  (for reference, the true problem here is, as always, capitalism)

This is so stupid. If I read a book and get inspired by it and write my own stuff, as long as I'm not using the copyrighted characters, I don't need to pay anyone anything other than purchasing the book which inspired me originally.
If this were a law, why shouldn't pretty much each modern day fantasy author not pay Tolkien foundation or any non fiction pay each citation.
- There's a difference between a sapient creature drawing inspiration and a glorified autocomplete using copyrighted text to produce sentences which are only cogent due to substantial reliance upon those copyrighted texts.
  All AI creations are derivative and subject to copyright law.
  
  There’s a difference between a sapient creature drawing inspiration and a glorified autocomplete using copyrighted text to produce sentences which are only cogent due to substantial reliance upon those copyrighted texts.
  But the AI is looking at thousands, if not millions of books, articles, comments, etc. That's what humans do as well - they draw inspiration from a variety of sources. So is sentience the distinguishing criteria for copyright? Only a being capable of original thought can create original work, and therefore anything not capable of original thought cannot create copyrighted work?
  Also, irrelevant here but calling LLMs a glorified autocomplete is like calling jet engines a "glorified horse". Technically true but you're trivialising it.
  
  The thing is these models aren't aiming to re-create the work of any single authors, but merely to put words in the right order. Imo, If we allow authors to copyright the order of their words instead of their whole original creations then we are actually reducing the threshold for copyright protection and (again imo) increasing the number of acts that would be determined to be copyright protected
  
  But for text to be a derivative work of other text, you need to be able to know by looking at the two texts and comparing them.
  Training an AI on a copyrighted work might necessarily involve making copies of the work that would be illegal to make without a license. But the output of the AI model is only going to be a for-copyright-purposes derivative work of any of the training inputs when it actually looks like one.
  Did the AI regurgitate your book? Derivative work.
  Did the AI spit out text that isn't particularly similar to any existing book? Which, if written by a human, would have qualified as original? Then it can't be a derivative work. It might not itself be a copyrightable product of authorship, having no real author, but it can't be secretly a derivative work in a way not detectable from the text itself.
  Otherwise we open ourselves up to all sorts of claims along the lines of "That book looks original, but actually it is a derivative work of my book because I say the author actually used an AI model trained on my book to make it! Now I need to subpoena everything they ever did to try and find evidence of this having happened!"
- Machine learning algorithms does not get inspired, they replicate. If I tell a MLM to write a scene for a film in the style of Charlie Kaufman, it has to been told who Kaufman is and been fed alot of manuscripts. Then it tries to mimicks the style and guess what words come next.
  This is not how we humans get inspired. And if we do, we get accused of stealing. Which it is.
- Because a computer can only read the stuff, chew it and throw it up. With no permission. Without needing to practice and create its own personal voice. It’s literally recycled work by other people, because computers cannot be creative. On the other hand, human writers DO develop their own style, find their own voice, and what they write becomes unique because of how they write it and the meaning they give to it. It’s not the same thing, and writers deserve to get repaid for having their art stolen by corporations to make a quick and easy buck. Seriously, you wanna write? Pick up a pen and do it. Practice, practice, practice for weeks months years decades. And only then you may profit. That’s how it always was and it always worked fine that way. Fuck computers.
- but to read the book and be inspired by it, you first had to buy the book. That's the difference.

What did you pay the author of the books and papers published that you used as sources in your own work? Do you pay those authors each time someone buys or reads your work? At most you pay $0-$15 for a book anyway.
In regards to free advertising when your source material is used... if your material is a good source and someone asks say ChatGPT, shouldn't your work be mentioned if someone asks for a book or paper and you have written something useful for it? Assuming it doesn't hallucinate.
- That's the "paid in exposure" argument.
  And I'm not sure what my company pays, but they purchase access to scientific papers and industrial standards. The market price I've seen for them is hundreds of dollars. You either pay an ongoing subscription to access the information, or you pay a larger lump sum to own a copy that cannot legally be reproduced.
  Companies pay for this sort of thing. AI shouldn't get an exception.
  
  TBF, access to scientific papers funded by public money should be free to the public anyway. The whole needing a subscription to access them is malarkey. The researchers aren't the ones getting the money.

Love to see it!

democrats