FSF statement on copyright infringement lawsuit Bartz v. Anthropic

I'm really confused by the FSF statement here. The court ruled that the use of copyrighted information is fair use. The issue is that Anthropic pirated (obtained illegally) copyrighted work and that was the offense. FSF books are free to download and store etc. The license says: "This is a free license allowing use of the work for any purpose without payment." So how can they claim that their rights were infringed when the court ruled that the problem was the illegal downloading of copyrighted work? It's impossible to illegally download a FSF book.

Thank you FSF!

The hero we need, but not the hero we deserve..

The issue is that every CS masters student & AI researcher knows how to build a SOTA LLM.. But, only a few companies have the resources.

The process:

(1) steal as much data from the internet as possible (data is everything) (2) raise incomprehensible amounts of money (3) find a location where you can take over the energy grid for training (4) put a black box around it so nobody can see the weights (5) charge users $$$ to use (6) retrain models with user session data (opt in by default) (7) peek around at how users are using, (maybe) change policies to stop them from using that way, and (maybe) rapidly develop features for that use case.

(Sorry that last one is jaded and not fair - just included to give you a picture of what could be happening with this sort of tech) …

The entire premise of the product is “built on the backs of any & everyone who has ever published a work”

> It is a class action lawsuit… the parties agreed to settle instead of waiting for the trial…

It would be nice if members of the class could vote to force a case to trial. For the typical token settlement amount, I’m sure many would rather have the precedent-setting case instead.

It looks like the stance of FSF is for proliferation of the copyleft to trained LLMs

> "Therefore, we urge Anthropic and other LLM developers that train models using huge datasets downloaded from the Internet to provide these LLMs to their users in freedom"

The framing of 'share your weights freely' as a remedy is interesting but underspecified. The FSF's argument is essentially that training on copyrighted code without permission is infringement, and the remedy should be open weights. But open weights don't undo the infringement -- they just make a potentially infringing artifact publicly available. That's not how copyright remedies work. What they're actually asking for is more like a compulsory license, which Congress would have to create. The demand for open weights as a copyright remedy is a policy argument dressed up as a legal one.

What weak, counter-productive, messaging. This is like having a bully punching you in the face and responding with “hey man, I’m not going to do anything about this, I’m not even going to tell an adult, but I’d urge you to consider not punching me in the face”. Great news for the bully! You just removed one concern from their mind, essentially giving the permission to be as bad to you as they want.

A related topic that I have in the past thought about is, whether LLM derived code would necessitate the release under a copyleft license because of the training data. Never saw a cogent analysis that explained either why or why not this is the case beyond practicality due to models having been utilized in closed source codebases already…

The FSF seems toothless when it comes to actually enforcing anything regarding license violations.

Huh, I've been waiting for the FSF to say something about the current big issue: mandatory Operating System age-asking. Maybe now that they've meddled in a copyright lawsuit that has no broader ramifications for the public (the people they supposedly fight for), they can get back to that.

>share complete training inputs with every user of the LLM

They don't have the rights to distribute the training data.

How dare they? Defending freedom of these filthy people and dignity of authors against these nice familiar corporations!

The rephrased¹ title "FSF Threatens Anthropic over Infringed Copyright: Share Your LLMs Free" certainly doesn’t dramatise enough how odious an act it can be.

¹ Original title is "The FSF doesn't usually sue for copyright infringement, but when we do, we settle for freedom"

> Among the works we hold copyrights over is Sam Williams and Richard Stallman's Free as in freedom: Richard Stallman's crusade for free software, which was found in datasets used by Anthropic as training inputs for their LLMs.

This is the reason why AI companies won't let anyone inspect which content was in the training set. It turns out the suspicions from many copyright holders (including the FSF) was true (of course).

Anthropic and others will never admit it, hence why they wanted to settle and not risk going to trial. AI boosters obviously will continue to gaslight copyright holders to believe nonsense like: "It only scraped the links, so AI didn't directly train on your content!", or "AI can't see like humans, it only see numbers, binary or digits" or "AI didn't reproduce exactly 100% of the content just like humans do when tracing from memory!".

They will not share the data-set used to train Claude, even if it was trained on AGPLv3 code.

Good. I want to see more lawsuits going after these hyper scalers for blatantly disregarding copyright law while simultaneously benefiting from it. In a just world they would all go down and we would be left with just the OSS models. But we don't live in a fair world :(

Where's the threat? The FSF was notified that as part of the settlement in Bartz v. Anthropic they were potentially entitled to money, but in this case the works in question were released under a license that allowed free duplication and distribution so no harm was caused. There's then a note that if the FSF had been involved in such a suit they'd insist on any settlement requiring that the trained model be released under a free license. But they weren't, and they're not.

(Edit: In the event of it being changed to match the actual article title, the current subject line for this thread is " FSF Threatens Anthropic over Infringed Copyright: Share Your LLMs Freel")

Is the FSF threatening Anthropic? The way I read it looks like they are not:

> We are a small organization with limited resources and we have to pick our battles, but if the FSF were to participate in a lawsuit such as Bartz v. Anthropic and find our copyright and license violated, we would certainly request user freedom as compensation.

Sounds more like “we can’t and won’t sue, but this is the kind of compensation that we think would be appropriate”

HN really needs some stricter rules for editorialized title. The HN title has nothing to do with the link (unless the article is edited?)

The title is:

The FSF doesn't usually sue for copyright infringement, but when we do, we settle for freedom

How dare they? Defending freedom of these filthy people and dignity of authors against these nice familiar corporations!

The rephrased¹ title "FSF Threatens Anthropic over Infringed Copyright: Share Your LLMs Free" certainly doesn’t dramatise enough how odious an act it can be.

¹ Original title is "The FSF doesn't usually sue for copyright infringement, but when we do, we settle for freedom"

Is the FSF threatening Anthropic? The way I read it looks like they are not:

Sounds more like “we can’t and won’t sue, but this is the kind of compensation that we think would be appropriate”

It looks like the stance of FSF is for proliferation of the copyleft to trained LLMs

> "Therefore, we urge Anthropic and other LLM developers that train models using huge datasets downloaded from the Internet to provide these LLMs to their users in freedom"

No, it looks like the stance of the FSF is that models should be free as a matter of principle, the same as their stance when it comes to software. Nothing in the linked post contradicts the description that the judgement was that the training was fair use.

In GPL cases for software, making the offending proprietary code publicly available under the GPL has been the usu outcome.

But whether you can actually be compelled to do that isn't well tested in court. Challenging that the GPL is enforcable in that way leads you down the path that you had no valid license at all, and for past GPL offenders that would have been the worse outcome. AI companies could change that

> The framing of 'share your weights freely' as a remedy is interesting but underspecified. The FSF's argument is essentially that training on copyrighted code without permission is infringement, and the remedy should be open weights.

Ignoring the fact that the statement doesn't talk about FSF code in the training data at all, [0] are you sure about that? From the start of the last of three paragraph in the statement:

  Obviously, the right thing to do is protect computing freedom: share complete training inputs with every user of the LLM, together with the complete model, training configuration settings, and the accompanying software source code. Therefore, we urge Anthropic and other LLM developers that train models using huge datasets downloaded from the Internet to provide these LLMs to their users in freedom.

This seems to me to be consistent with the FSF's stance of "You told the computer how to do it. The right thing to do is to give the humans operating that computer the software, input data, and instructions that they need to do it, too.".

[0] In fact, it talks about the inclusion of a book published under the terms of the GNU FDL, [1] which requires distribution of modified copies of a covered work to -themselves- be covered by the GNU FDL.

[1] <https://www.gnu.org/licenses/fdl-1.3.html>

The short answer is that we don't know. The longer answer based purely on this case is that there's an argument that training is fair use and so copyleft doesn't have any impact on the model, but this is one case in California and doesn't inherently set precedent in the US in general and has no impact at all on legal interpretations in other countries.

>share complete training inputs with every user of the LLM

They don't have the rights to distribute the training data.

This is the reason why AI companies won't let anyone inspect which content was in the training set. It turns out the suspicions from many copyright holders (including the FSF) was true (of course).

They will not share the data-set used to train Claude, even if it was trained on AGPLv3 code.

(Edit: In the event of it being changed to match the actual article title, the current subject line for this thread is " FSF Threatens Anthropic over Infringed Copyright: Share Your LLMs Freel")

HN really needs some stricter rules for editorialized title. The HN title has nothing to do with the link (unless the article is edited?)

The title is:

The FSF doesn't usually sue for copyright infringement, but when we do, we settle for freedom

In GPL cases for software, making the offending proprietary code publicly available under the GPL has been the usu outcome.

So if a user can bring an LLM to output a copy of some training data, then the ones who distribute the LLM are engaging in illegal activity?

There's already legal requirements in the EU that you must publish what goes into your training set. This information must apparently be publshed before the august 2 next year.

They simply have way too much incentive to train on anything they can get their hands on. They are driving businesses, that are billions in losses so far. Someone somewhere is probably being told to feed the monster anything they can get, and not to document it, threatened with an NDA and personal financial ruin, if the proof of it ever came out. Opaque processes acting as a shield, like they do in so many other businesses.

> but in this case the works in question were released under a license that allowed free duplication and distribution so no harm was caused.

FSF licenses contain attribution and copyleft clauses. It's "do whatever you want with it provided that you X, Y and Z". Just taking the first part without the second part is a breach of the license.

It's like renting a car without paying and then claiming "well you said I can drive around with it for the rest of the day, so where is the harm?" while conveniently ignoring the payment clause.

You maybe confusing this with a "public domain" license.

I don't like the editorialized title either but I would say that the actual post title

"The FSF doesn't usually sue for copyright infringement, but when we do, we settle for freedom"

and this sentence at the end

" We are a small organization with limited resources and we have to pick our battles, but if the FSF were to participate in a lawsuit such as Bartz v. Anthropic and find our copyright and license violated, we would certainly request user freedom as compensation."

could be seen as "threatening".

It's just an indication to model trainers that they should take care to omit FSF software from training.

Not a nothing burger, but not totally insignificant either.

It’s pretty fucking simple: If GPL code is integrated into Claude, the Claude needs to be distributed under the terms of the GPL.

The rule is fine and clear, it just wasn’t followed here. There’s no reason to have a stricter rule, what you’re complaining about is its enforcement. Two moderators can’t read everything, if you have a complaint, email them (contact link at the bottom of the page), they are quite responsive.

flag the submission?

I don't like the editorialized title either but I would say that the actual post title

"The FSF doesn't usually sue for copyright infringement, but when we do, we settle for freedom"

and this sentence at the end

could be seen as "threatening".

There's already legal requirements in the EU that you must publish what goes into your training set. This information must apparently be publshed before the august 2 next year.

flag the submission?

The dearth of case law here still makes a negative outcome for FSF pretty dangerous, even if they don't appeal it and set precedent in higher courts. It might not be binding but every subsequent case will be able to site it, potentially even in other common law countries that lack case law on the topic.

And then there is the chilling effect. If FSF can't enforce their license, who is going to sue to overturn the precedent? Large companies, publishers, and governments have mostly all done deals with the devil now. Joe Blow random developer is going to get a strip mall lawyer and overturn this? Seems unlikely

So if a user can bring an LLM to output a copy of some training data, then the ones who distribute the LLM are engaging in illegal activity?

It isn't illegal as a LLM model is transformative.

> but in this case the works in question were released under a license that allowed free duplication and distribution so no harm was caused.

It's like renting a car without paying and then claiming "well you said I can drive around with it for the rest of the day, so where is the harm?" while conveniently ignoring the payment clause.

You maybe confusing this with a "public domain" license.

If what you do with a copyrighted work is covered by fair use it doesn't matter what the license says - you can do it anyway. The GFDL imposes restrictions on distribution, not copying, so merely downloading a copy imposes no obligation on you and so isn't a copyright infringement either.

I used to be on the FSF board of directors. I have provided legal testimony regarding copyleft licenses. I am excruciatingly aware of the difference between a copyleft license and the public domain.

This article is talking about a book though, not software.

"Sam Williams and Richard Stallman's Free as in freedom: Richard Stallman's crusade for free software"

"GNU Free Documentation License (GNU FDL). This is a free license allowing use of the work for any purpose without payment."

I'm not familiar with this license or how it compares to their software licenses, but it sounds closer to a public domain license.

Telling mjg59 they are confused about a license is an audacious move. But I understand your question and I have the same question.

They don't need the "do whatever" permission if everything they do is fair use. They only need the downloading permission, and it's free to download.

It’s pretty fucking simple: If GPL code is integrated into Claude, the Claude needs to be distributed under the terms of the GPL.

If it's pretty fucking simple, can you point to the statement in the linked post that supports this assertion? What it says is "According to the notice, the district court ruled that using the books to train LLMs was fair use", and while I accept that this doesn't mean the same would be true for software, I don't see anything in the FSF's post that contradicts the idea that training on GPLed software would also be fair use. I'm not passing a value judgement here, I'm a former board member of the FSF and I strongly believe in the value and effectiveness of copyleft licenses, I'm just asking how you get from what's in the post to such an absolute assertion.

It's pretty fucking simple: a judge needs to decide that, not armchair lawyers on HN.

It's just an indication to model trainers that they should take care to omit FSF software from training.

Not a nothing burger, but not totally insignificant either.

Is it? The FSF's description of the judgement is that the training was fair use, but that the actual downloading of the material may have been a copyright infringement. What software does the FSF hold copyright to that can't be downloaded freely? Under what circumstances would the FSF be in a position to influence the nature of a settlement if they weren't harmed?

It isn't illegal as a LLM model is transformative.

Telling mjg59 they are confused about a license is an audacious move. But I understand your question and I have the same question.

They don't need the "do whatever" permission if everything they do is fair use. They only need the downloading permission, and it's free to download.

I used to be on the FSF board of directors. I have provided legal testimony regarding copyleft licenses. I am excruciatingly aware of the difference between a copyleft license and the public domain.

> I am excruciatingly aware of the difference between a copyleft license and the public domain.

Then why did you say "no harm was caused"? Clearly the harm of "using our copylefted work to create proprietary software" was caused. Do you just mean economic harm? If so, I think that's where the parent comments confusion originates.

> The GFDL imposes restrictions on distribution, not copying, so merely downloading a copy imposes no obligation on you and so isn't a copyright infringement either.

The restrictions fall not only on verbatim distribution, but derivative works too. I am not aware whether model outputs are settled to be or not to be (hehe) derivative works in a court of law, but that question is at the vey least very much valid.

Models, however, can reproduce copyleft code verbatim, and are being redistributed. Doesn't that count?

Licences like AGPL also don't have redistribution as their only restriction.

That's really interesting. I'm a lawyer, and I had always interpreted the license like a ToS between the developers. That (in my mind) meant that the license could impose arbitrary limitations above the default common law and statutory rules and that once you touched the code you were pregnant with those limitations, but this does make sense. TIL. So, thanks.

This means that you can ignore any part of licenses you don't want to and just copy any software you want, non-free software included.

This article is talking about a book though, not software.

"Sam Williams and Richard Stallman's Free as in freedom: Richard Stallman's crusade for free software"

"GNU Free Documentation License (GNU FDL). This is a free license allowing use of the work for any purpose without payment."

I'm not familiar with this license or how it compares to their software licenses, but it sounds closer to a public domain license.

It sounds that way a bit from the one sentence. But that’s not the case at all.

> 4. MODIFICATIONS

> You may copy and distribute a Modified Version of the Document under the conditions of sections 2 and 3 above, provided that you release the Modified Version under precisely this License, with the Modified Version filling the role of the Document, thus licensing distribution and modification of the Modified Version to whoever possesses a copy of it. In addition, you must do these things in the Modified Version:

Etc etc.

In short, it is a copyleft license. You must also license derivative works under this license.

Just fyi, the gnu fdl is (unsurprisingly) available for free online - so if you want to know what it says, you can read it!

FDL is famously annoying.

wikipedia used to be under FDL and they lobbied FSF to allow an escape hatch to Commons for a few months, because FDL was so annoying.

Yet another instance of people jumping to comments based on the title of the submission alone. They don't mention GPL even once in that post...

Is harm necessary to show in a copyright infringement case?

It's pretty fucking simple: a judge needs to decide that, not armchair lawyers on HN.

Models, however, can reproduce copyleft code verbatim, and are being redistributed. Doesn't that count?

Licences like AGPL also don't have redistribution as their only restriction.

> I am excruciatingly aware of the difference between a copyleft license and the public domain.

FDL is famously annoying.

wikipedia used to be under FDL and they lobbied FSF to allow an escape hatch to Commons for a few months, because FDL was so annoying.

Yet another instance of people jumping to comments based on the title of the submission alone. They don't mention GPL even once in that post...

We know AI will be pushed through. No matter the laws

> The GFDL imposes restrictions on distribution, not copying, so merely downloading a copy imposes no obligation on you and so isn't a copyright infringement either.

It's the third sentence of the article:

> the district court ruled that using the books to train LLMs was fair use but left for trial the question of whether downloading them for this purpose was legal.

This means that you can ignore any part of licenses you don't want to and just copy any software you want, non-free software included.

This is in fact how I operate.

Does the reasoning in the cases where people to whom GPL software was distributed could sue the distributor for source code, rather than relying on the copyright holder suing for breach of copyright strengthen the argument that arbitrary limitations are enforceable?

It sounds that way a bit from the one sentence. But that’s not the case at all.

> 4. MODIFICATIONS

Etc etc.

In short, it is a copyleft license. You must also license derivative works under this license.

Just fyi, the gnu fdl is (unsurprisingly) available for free online - so if you want to know what it says, you can read it!

And the judgement said that the training was fair use, but that the duplication might be an infringement. The GFDL doesn't restrict duplication, only distribution, so if training on GFDLed material is fair use and not the creation of a derivative work then there's no damage.

For this to stand up in court you'd need to show that an LLM is distributing "a modified version of the document".

If I took a book and cut it up into individual words (or partial words even), and then used some of the words with words from every other book to write a new book, it'd be hard to argue that I'm really "distributing the first book", even if the subject of my book is the same as the first one.

This really just highlights how the law is a long way behind what's achievable with modern computing power.

Is harm necessary to show in a copyright infringement case?

Copyright infringement causes harm, so if there's no harm there's no infringement. You can freely duplicate GFDLed material, so downloading it isn't an infringement. If training a model on that downloaded material is fair use then there's no infringement.

The FSF seems toothless when it comes to actually enforcing anything regarding license violations.

This is in fact how I operate.

Ignoring the fact that the statement doesn't talk about FSF code in the training data at all, [0] are you sure about that? From the start of the last of three paragraph in the statement:

  Obviously, the right thing to do is protect computing freedom: share complete training inputs with every user of the LLM, together with the complete model, training configuration settings, and the accompanying software source code. Therefore, we urge Anthropic and other LLM developers that train models using huge datasets downloaded from the Internet to provide these LLMs to their users in freedom.

[1] <https://www.gnu.org/licenses/fdl-1.3.html>

We know AI will be pushed through. No matter the laws

what I keep wondering is what kind of laws will be rendered useless with the precedent they'll cause. Can this be beginning of the end of copyright and intellectual property?

> It is a class action lawsuit… the parties agreed to settle instead of waiting for the trial…

It would be nice if members of the class could vote to force a case to trial. For the typical token settlement amount, I’m sure many would rather have the precedent-setting case instead.

If/when you get a postcard/spam email that you're included in a potential class action lawsuit settlement, you can opt out of the class (in which case you preserve your legal rights to sue separately) or file comments with the Court.

Thank you FSF!

The hero we need, but not the hero we deserve..

The issue is that every CS masters student & AI researcher knows how to build a SOTA LLM.. But, only a few companies have the resources.

The process:

(Sorry that last one is jaded and not fair - just included to give you a picture of what could be happening with this sort of tech) …

The entire premise of the product is “built on the backs of any & everyone who has ever published a work”

> The entire premise of the product is “built on the backs of any & everyone who has ever published a work”

Do any products exist which are not built on uncompensated work of other people in the past?

Generally speaking societies do better when knowledge is shared and not hoarded.

Hoarding knowledge via legal constructs is great at concentrating wealth to the hoarder at the expense of everyone else.

We should restore copyright to its original term lengths.

I agree with the stance of Anthropic et al that these models should be built with all possible information.

I agree with the stance of the FSF that the resulting models should be as freely usable/available as possible.

It's the third sentence of the article:

> the district court ruled that using the books to train LLMs was fair use but left for trial the question of whether downloading them for this purpose was legal.

No, those are separate issues.

The pipeline is something like: download material -> store material -> train models on material -> store models trained on material -> serve output generated from models.

These questions focus on the inputs to the model training, the question I have raised focuses on the outputs of the model. If [certain] outputs are considered derivative works of input material, then we have a cascade of questions which parts of the pipeline are covered by the license requirements. Even if any of the upstream parts of this simplified pipeline are considered legal, it does not imply that that the rest of the pipeline is compliant.

For this to stand up in court you'd need to show that an LLM is distributing "a modified version of the document".

This really just highlights how the law is a long way behind what's achievable with modern computing power.

Presumably, a suitable prompt could get the LLM to produce whole sections of the book which would demonstrate that the LLM contains a modified version.

Last time I checked online LLMs distribute parts of their training corpus when you prompt them.

what I keep wondering is what kind of laws will be rendered useless with the precedent they'll cause. Can this be beginning of the end of copyright and intellectual property?

Doubt it. I'm sure it will have an exclusion where for example using genAI to train on or replicate leaked or reverse-engineered Windows code will constitute copyright infringement, but doing the same for copyleft will be allowed. Always in favor of corporate interests.

Copyright, possibly. Intellectual property more broadly, no. AI has 0 impact on trademark law, quite clearly (which is anchored in consumer protection, in principle). Patent law is perhaps more related, but it's still pretty far.

In a way, I think so. Just let the code recreate existing code, say it's AI code and doesn't break any copyright laws

Presumably, a suitable prompt could get the LLM to produce whole sections of the book which would demonstrate that the LLM contains a modified version.

Yes, and for practical purposes the current consensus (and in case of EU, the law) is that only said document would be converted by FDL

No, those are separate issues.

The pipeline is something like: download material -> store material -> train models on material -> store models trained on material -> serve output generated from models.

Yes, and for practical purposes the current consensus (and in case of EU, the law) is that only said document would be converted by FDL

Consider the net effect and the answer is clear. When these models are properly "trained", are people going to look for the book or a derivative of it, with proper attribution?

Or is the LLM going to regurgitate the same content with zero attribution, and shift all the traffic away from the original work?

When viewed in this frame, it is obvious that the work is derivative and then some.

In a way, I think so. Just let the code recreate existing code, say it's AI code and doesn't break any copyright laws

You can, but then you lose the power of a collective and have to manage a lawsuit yourself. If you are being represented as part of a group, then you should have means to direct that representation.

> The entire premise of the product is “built on the backs of any & everyone who has ever published a work”

Do any products exist which are not built on uncompensated work of other people in the past?

Generally speaking societies do better when knowledge is shared and not hoarded.

Hoarding knowledge via legal constructs is great at concentrating wealth to the hoarder at the expense of everyone else.

We should restore copyright to its original term lengths.

I agree with the stance of Anthropic et al that these models should be built with all possible information.

I agree with the stance of the FSF that the resulting models should be as freely usable/available as possible.

> Generally speaking societies do better when knowledge is shared and not hoarded.

These companies do even better because we're not allowed to share the knowledge (read, illegally copy protected works) and they are.

— Published on Mar 13, 2026 10:05 AM

The Free Software Foundation (FSF), like many others, received a notice regarding settlement in the copyright infringement lawsuit Bartz v. Anthropic. It is a class action lawsuit claiming that Anthropic infringed copyright by downloading works in Library Genesis and Pirate Library Mirror datasets for purposes of training large language models (LLMs). According to the notice, the district court ruled that using the books to train LLMs was fair use but left for trial the question of whether downloading them for this purpose was legal. Apparently, the parties agreed to settle instead of waiting for the trial and they are now reaching out to potential copyright holders to offer money in lieu of potential damages.

The FSF holds copyrights to many programs in the GNU Project, as well as to several books. We publish all works that we hold copyrights to under free (as in freedom) licenses. Among the works we hold copyrights over is Sam Williams and Richard Stallman's Free as in freedom: Richard Stallman's crusade for free software, which was found in datasets used by Anthropic as training inputs for their LLMs. It was published by O'Reilly and by the FSF under the GNU Free Documentation License (GNU FDL). This is a free license allowing use of the work for any purpose without payment.

Obviously, the right thing to do is protect computing freedom: share complete training inputs with every user of the LLM, together with the complete model, training configuration settings, and the accompanying software source code. Therefore, we urge Anthropic and other LLM developers that train models using huge datasets downloaded from the Internet to provide these LLMs to their users in freedom. We are a small organization with limited resources and we have to pick our battles, but if the FSF were to participate in a lawsuit such as Bartz v. Anthropic and find our copyright and license violated, we would certainly request user freedom as compensation.

Consider the net effect and the answer is clear. When these models are properly "trained", are people going to look for the book or a derivative of it, with proper attribution?

Or is the LLM going to regurgitate the same content with zero attribution, and shift all the traffic away from the original work?

When viewed in this frame, it is obvious that the work is derivative and then some.

That is your opinion, but the judge disagreed with you. The decision may have been overturned on appeal, but as it stands, in that courtroom, the training was fair use.

> Generally speaking societies do better when knowledge is shared and not hoarded.

These companies do even better because we're not allowed to share the knowledge (read, illegally copy protected works) and they are.

You can, but then you lose the power of a collective and have to manage a lawsuit yourself. If you are being represented as part of a group, then you should have means to direct that representation.

Surely some firms choose to hold referendums already, but I could see that being a good law! As Better Call Saul explored in its early seasons, the interests of the large law firm can easily diverge significantly from the interests of the plaintiffs.

That is your opinion, but the judge disagreed with you. The decision may have been overturned on appeal, but as it stands, in that courtroom, the training was fair use.

I can memorize a song and it will be fair use too, but it won't be anymore once I start performing it publicly. Training itself is quite obviously fair use, what matters is what happens next.

This is also, unfortunately, the only way this can be settled. Making LLM output legally a derivative work would murder the AI golden rush and nobody wants that

I can memorize a song and it will be fair use too, but it won't be anymore once I start performing it publicly. Training itself is quite obviously fair use, what matters is what happens next.

Hacker Times

Hacker Times

FSF statement on copyright infringement lawsuit Bartz v. Anthropic

Discussion

Discussion