Skip Navigation

Sarah Silverman and other authors are suing OpenAI and Meta for copyright infringement, alleging that they're training their LLMs on books via Library Genesis and Z-Library

123 comments
  • People keep taking issue with this articles use of "summarizing" and linking to wikipedia... Summaries of copyrighted work are obviously not illegal.

    This article is oversimplified and does a crummy job of explaining the problem. Ars Technica does a much better job explaining.

    The fact that the ai can summarize these works in detail is proof that they were trained using copyrighted material without permission, (which is not fair use) Sarah Silverman is obviously not going to be hurt financially by this, but there are hundreds of thousands of authors who definitely will be affected. They have every right to sue.

    • Why does "fair use" even fall into it? I'm not familiar with their specific license, but the general definition of copyright is:

      A copyright is a type of intellectual property that gives its owner the exclusive right to copy, distribute, adapt, display, and perform a creative work, usually for a limited time.

      Nothing was copied, or distributed (in a form that anybody can consider "The Work"), or displayed, or performed. The only possible legal argument they have is adapting as a derivative work. And anybody who is familiar with how an LLM works knows that the form that results from reading in content is completely different from the source.

      LLMs/LDMs are not taking in billions of books and putting them into a database. It is a very lossy process. Out of all of the billions of images trained from the Stable Diffusion database, the resulting model is 4 GBs. There is no universe where you can store billions of images into a mere 4 GBs. Stable Diffusion cannot and will not, pixel-by-pixel, reproduce a Van Gogh. It can make something that kind of looks like a Van Gogh, but styles are not copyrightable.

      The same applies to an LLM like ChatGPT. It cannot reproduce entire books, or anywhere close to that. If you ask it to recreate Page 25 of Silverman's book, it can't do it. If it doesn't even contain a minor portion of the original material, it can't even be considered a derivative work.

      They don't have a case. They have a lot of publicity and noise, but they will lose to inevitability.

      • You make a lot of excellent points, but I think the main issue of contention is just using copyrighted work to train generative AI without the author's permission regardless.

        If they did ask permission, there would be no problem. But an author or artist should be given the choice if their work is going to be used to train an AI.

  • OP, I just wanted to say thank you for writing such a good title. It's rare to get such an informative, clickbait-free title these days.

  • My pie in the sky hope is that copyright somehow becomes less stringent after all of this.

    Don't get me wrong I want protections for creators and support reasonable copyright (life of the author +25 years with the possibility of a 15 year extension) but letting a company lord over an IP for damn near a century isn't ideal for anyone.

    • The major scenario that I at least hope holds true out of this is that the AI "creations" aren't eligible for copyright themselves. If the powers that be allow all this AI created stuff copyright protection it's going to be a gigantic mess.

      • Pure "prompt → image" with nothing in between I absolutely agree. It's lazy and ripe for abuse by copyright trolls. That being said there's a lot more in the world of AI assisted art than what most people are aware of.

        Determining where the legal lines will be drawn is going to be a monumental task but I think there's value in allowing authors to retain copyright on AI assisted works. I also can't see the free open source models not going the way of restricting training data to public domain works like Adobe did with Firefly if that becomes a legal issue.

  • If they're being trained via Library Genesis and Z-Library, shouldn't those be the target of the suit for enabling/allowing that?

    Edit Duh, for the same reason they don't ever go after them, the same reason they don't go after other major piracy sites, etc. They're picking an easy target.

123 comments