The rephrased¹ title "FSF Threatens Anthropic over Infringed Copyright: Share Your LLMs Free" certainly doesn’t dramatise enough how odious an act it can be.
¹ Original title is "The FSF doesn't usually sue for copyright infringement, but when we do, we settle for freedom"
> We are a small organization with limited resources and we have to pick our battles, but if the FSF were to participate in a lawsuit such as Bartz v. Anthropic and find our copyright and license violated, we would certainly request user freedom as compensation.
Sounds more like “we can’t and won’t sue, but this is the kind of compensation that we think would be appropriate”
> "Therefore, we urge Anthropic and other LLM developers that train models using huge datasets downloaded from the Internet to provide these LLMs to their users in freedom"
They don't have the rights to distribute the training data.
This is the reason why AI companies won't let anyone inspect which content was in the training set. It turns out the suspicions from many copyright holders (including the FSF) was true (of course).
Anthropic and others will never admit it, hence why they wanted to settle and not risk going to trial. AI boosters obviously will continue to gaslight copyright holders to believe nonsense like: "It only scraped the links, so AI didn't directly train on your content!", or "AI can't see like humans, it only see numbers, binary or digits" or "AI didn't reproduce exactly 100% of the content just like humans do when tracing from memory!".
They will not share the data-set used to train Claude, even if it was trained on AGPLv3 code.
(Edit: In the event of it being changed to match the actual article title, the current subject line for this thread is " FSF Threatens Anthropic over Infringed Copyright: Share Your LLMs Freel")
The FSF doesn't usually sue for copyright infringement, but when we do, we settle for freedom
But whether you can actually be compelled to do that isn't well tested in court. Challenging that the GPL is enforcable in that way leads you down the path that you had no valid license at all, and for past GPL offenders that would have been the worse outcome. AI companies could change that
"The FSF doesn't usually sue for copyright infringement, but when we do, we settle for freedom"
and this sentence at the end
" We are a small organization with limited resources and we have to pick our battles, but if the FSF were to participate in a lawsuit such as Bartz v. Anthropic and find our copyright and license violated, we would certainly request user freedom as compensation."
could be seen as "threatening".
FSF licenses contain attribution and copyleft clauses. It's "do whatever you want with it provided that you X, Y and Z". Just taking the first part without the second part is a breach of the license.
It's like renting a car without paying and then claiming "well you said I can drive around with it for the rest of the day, so where is the harm?" while conveniently ignoring the payment clause.
You maybe confusing this with a "public domain" license.
Not a nothing burger, but not totally insignificant either.
And then there is the chilling effect. If FSF can't enforce their license, who is going to sue to overturn the precedent? Large companies, publishers, and governments have mostly all done deals with the devil now. Joe Blow random developer is going to get a strip mall lawyer and overturn this? Seems unlikely
I used to be on the FSF board of directors. I have provided legal testimony regarding copyleft licenses. I am excruciatingly aware of the difference between a copyleft license and the public domain.
"Sam Williams and Richard Stallman's Free as in freedom: Richard Stallman's crusade for free software"
"GNU Free Documentation License (GNU FDL). This is a free license allowing use of the work for any purpose without payment."
I'm not familiar with this license or how it compares to their software licenses, but it sounds closer to a public domain license.
Licences like AGPL also don't have redistribution as their only restriction.
Then why did you say "no harm was caused"? Clearly the harm of "using our copylefted work to create proprietary software" was caused. Do you just mean economic harm? If so, I think that's where the parent comments confusion originates.
wikipedia used to be under FDL and they lobbied FSF to allow an escape hatch to Commons for a few months, because FDL was so annoying.
The restrictions fall not only on verbatim distribution, but derivative works too. I am not aware whether model outputs are settled to be or not to be (hehe) derivative works in a court of law, but that question is at the vey least very much valid.
> 4. MODIFICATIONS
> You may copy and distribute a Modified Version of the Document under the conditions of sections 2 and 3 above, provided that you release the Modified Version under precisely this License, with the Modified Version filling the role of the Document, thus licensing distribution and modification of the Modified Version to whoever possesses a copy of it. In addition, you must do these things in the Modified Version:
Etc etc.
In short, it is a copyleft license. You must also license derivative works under this license.
Just fyi, the gnu fdl is (unsurprisingly) available for free online - so if you want to know what it says, you can read it!
Ignoring the fact that the statement doesn't talk about FSF code in the training data at all, [0] are you sure about that? From the start of the last of three paragraph in the statement:
Obviously, the right thing to do is protect computing freedom: share complete training inputs with every user of the LLM, together with the complete model, training configuration settings, and the accompanying software source code. Therefore, we urge Anthropic and other LLM developers that train models using huge datasets downloaded from the Internet to provide these LLMs to their users in freedom.
This seems to me to be consistent with the FSF's stance of "You told the computer how to do it. The right thing to do is to give the humans operating that computer the software, input data, and instructions that they need to do it, too.".[0] In fact, it talks about the inclusion of a book published under the terms of the GNU FDL, [1] which requires distribution of modified copies of a covered work to -themselves- be covered by the GNU FDL.
It would be nice if members of the class could vote to force a case to trial. For the typical token settlement amount, I’m sure many would rather have the precedent-setting case instead.
The hero we need, but not the hero we deserve..
The issue is that every CS masters student & AI researcher knows how to build a SOTA LLM.. But, only a few companies have the resources.
The process:
(1) steal as much data from the internet as possible (data is everything) (2) raise incomprehensible amounts of money (3) find a location where you can take over the energy grid for training (4) put a black box around it so nobody can see the weights (5) charge users $$$ to use (6) retrain models with user session data (opt in by default) (7) peek around at how users are using, (maybe) change policies to stop them from using that way, and (maybe) rapidly develop features for that use case.
(Sorry that last one is jaded and not fair - just included to give you a picture of what could be happening with this sort of tech) …
The entire premise of the product is “built on the backs of any & everyone who has ever published a work”
> the district court ruled that using the books to train LLMs was fair use but left for trial the question of whether downloading them for this purpose was legal.
If I took a book and cut it up into individual words (or partial words even), and then used some of the words with words from every other book to write a new book, it'd be hard to argue that I'm really "distributing the first book", even if the subject of my book is the same as the first one.
This really just highlights how the law is a long way behind what's achievable with modern computing power.
The pipeline is something like: download material -> store material -> train models on material -> store models trained on material -> serve output generated from models.
These questions focus on the inputs to the model training, the question I have raised focuses on the outputs of the model. If [certain] outputs are considered derivative works of input material, then we have a cascade of questions which parts of the pipeline are covered by the license requirements. Even if any of the upstream parts of this simplified pipeline are considered legal, it does not imply that that the rest of the pipeline is compliant.
Do any products exist which are not built on uncompensated work of other people in the past?
Generally speaking societies do better when knowledge is shared and not hoarded.
Hoarding knowledge via legal constructs is great at concentrating wealth to the hoarder at the expense of everyone else.
We should restore copyright to its original term lengths.
I agree with the stance of Anthropic et al that these models should be built with all possible information.
I agree with the stance of the FSF that the resulting models should be as freely usable/available as possible.
— Published on Mar 13, 2026 10:05 AM
The Free Software Foundation (FSF), like many others, received a notice regarding settlement in the copyright infringement lawsuit Bartz v. Anthropic. It is a class action lawsuit claiming that Anthropic infringed copyright by downloading works in Library Genesis and Pirate Library Mirror datasets for purposes of training large language models (LLMs). According to the notice, the district court ruled that using the books to train LLMs was fair use but left for trial the question of whether downloading them for this purpose was legal. Apparently, the parties agreed to settle instead of waiting for the trial and they are now reaching out to potential copyright holders to offer money in lieu of potential damages.
The FSF holds copyrights to many programs in the GNU Project, as well as to several books. We publish all works that we hold copyrights to under free (as in freedom) licenses. Among the works we hold copyrights over is Sam Williams and Richard Stallman's Free as in freedom: Richard Stallman's crusade for free software, which was found in datasets used by Anthropic as training inputs for their LLMs. It was published by O'Reilly and by the FSF under the GNU Free Documentation License (GNU FDL). This is a free license allowing use of the work for any purpose without payment.
Obviously, the right thing to do is protect computing freedom: share complete training inputs with every user of the LLM, together with the complete model, training configuration settings, and the accompanying software source code. Therefore, we urge Anthropic and other LLM developers that train models using huge datasets downloaded from the Internet to provide these LLMs to their users in freedom. We are a small organization with limited resources and we have to pick our battles, but if the FSF were to participate in a lawsuit such as Bartz v. Anthropic and find our copyright and license violated, we would certainly request user freedom as compensation.
Or is the LLM going to regurgitate the same content with zero attribution, and shift all the traffic away from the original work?
When viewed in this frame, it is obvious that the work is derivative and then some.
These companies do even better because we're not allowed to share the knowledge (read, illegally copy protected works) and they are.