With the advent of LLMs, AI-autocomplete, and agent-based development workflows, my ability to deliver reliable, high-quality code is restored and (arguably) better. Personally, I love the "hallucinations" as they help me fine-tune my prompts, base instructions, and reinforce intentionality; e.g. is that >really< the right solution/suggestion to accept? It's like peer programming without a battle of ego.
When analyzing problems, I think you have to look at both upsides and downsides. Folks have done well to debate the many, many downsides of AI and this tends to dominate the conversation. Probably thats a good thing.
But, on the flip side, I personally advocate hard for AI from the point-of-view on accessibility. I know (more-or-less) exactly what output I'm aiming for and control that obsessively, but it's AI and my voice at the helm instead of my fingertips.
I also think it incorrect to look at it from a perspective of "does the good outweigh the bad?". Relevant, yes, but utilitarian arguments often lead to counter-intuitive results and end up amplifying the problems they seek to solve.
I'd MUCH rather see a holistic embrace and integration of these tools into our ecosystems. Telling people "no AI!" (even if very well defined on what that means) is toothless against people with little regard for making the world (or just one specific repo) a better place.
That doesn't address the controversy because you are a reasonable person assuming that other people using AI are reasonable like you, and know how to use AI correctly.
The rumors we hear have to do with projects inundated with more pull requests that they can review, the pull requests are obviously low quality, and the contributors' motives are selfish. IE, the PRs are to get credit for their Github profile. In this case, the pull requests aren't opened with the same good faith that you're putting into your work.
In general, a good policy towards AI submission really has to primarily address the "good faith" issue; and then explain how much tolerance the project has for vibecoding.
So leaving that aside, it just seems to be the revulsion that programmers feel towards a lot of LLM slop and the aggravation of getting a lot of slop submissions? Something that seems to be universal in the FOSS social environment, but also seems to be indicative of a boundary issue for me:
The fact that machines have started to write reasonable code doesn't mean that you don't have any responsibility to read or review it before you hand it to someone. You could always write shit code and submit it without debugging it or refactoring it sanely, etc. Projects have always had to deal with this, and I suspect they've dealt with this through limiting the people they talk to to their friends, putting arbitrary barriers in front of people who want to contribute, and just being bitchy. While they were doing this, non-corporate FOSS was stagnating and dying because 1) no one would put up with that without being paid, and/or 2) money could buy your way past barriers and bitchiness.
Projects need to groom contributors, not simply pre-filter contributions by identity in order to cut down on their workload. There has to be an onboarding process, and that onboarding process has to include banning and condemning people that give you unreviewed slop, and spreading their names and accounts to other projects that could be targeted. Zero tolerance for people who send you something to read that they didn't bother to read. If somebody is getting AI to work for them, then trust grows in that person, and their contributions should be valued.
I think the AI part is a distraction. AI is better for Debian that almost anyone else, because Debian is copyleft and avoids the problems that copyleft poses for other software. The problem is that people working within Free Software need some sort of structured social/code interaction where there are reputations to be gained and lost that aren't isolated to single interactions over pull requests, or trying to figure out how and where to submit patches. Where all of the information is in one place about how to contribute, and also about who is contributing.
Priority needs to be placed on making all of this stuff clear. Debian is a massive enough project, basically all-encompassing, where it could actually set up something like this for itself and the rest of FOSS could attach itself later. Why doesn't Debian have a "github" that mirrors all of the software it distributes? Aren't they the perfect place? One of the only good, functional examples of online government?
edit: There's no reason that Debian shouldn't be giving attribution to every online FOSS project that could possibly be run on Linux (it will be run on Debian, and hopefully distributed through apt-get.) Maybe a Debian contributor slash FOSS-in-general social network is the way to do that? Isn't debian.org almost that already?
Something might be required now as some people might think that just asking an LLM is "the most he can done", but it's not about using AI it's about being aware and responsible about using it.
If a change used to take a day or two, and now requires a few minutes, then it's fair to ask for a couple hours more prompting to add the additional tangible tests to compensate for any risks of hallucinations or low quality code sneaking in
(If anything, the copyright to model-generated code cannot possibly be said to belong to the human contributor. They… didn’t write it! I’m glad to see that aspect was discussed though I’m surprised it wasn’t the main thrust.)
My simple solution: I use Whisper to transcribe my text, and feed the output to an LLM for cleanup (custom prompt). It's fantastic. Way better than stuff like Dragon. Now I get frustrated with transcribing using Google's default mechanism on Android - so inaccurate!
But the ability to take notes, dictate emails, etc using Whisper + LLM is invaluable. I likely would refuse to work for a company that won't let me put IP into an LLM.
Similarly, I take a lot of notes on paper, and would have to type them up. Tedious and painful. I switched to reading my notes aloud and use the above system to transcribe. Still painful. I recently realized Gemini will do a great job just reading my notes. So now I simply convert my notes to a photo and send to Gemini.
I categorize all my expenses. I have receipts from grocery stores where I highlight items into categories. You can imagine it's painful to enter that into a financial SW. I'm going to play with getting Gemini to look at the photo of the receipt and categorize and add up the categories for me.
All of these are cool applications on their own, but when you realize they're also improving your health ... clear win.
The accessibility angle is really important here. What we need is a way to stop people who make contributions they don't understand and/or can not vouch they are the author for (the license question is very murky still, and no what the US supreme court said doesn't matter here in EU). This is difficult though.
Quality should always be the responsibility of the person submitting changes. Whether a person used LLMs should not be a large concern if someone is acting in good-faith. If they submitted bad code, having used AI is not a valid excuse.
Policies restricting AI-use might hurt good contributors while bad contributors ignore the restrictions. That said, restrictions for non-quality reasons, like copyright concerns, might still make sense.
It it really true the LLM's are non-deterministic? I thought if you used the exact input and seed with the temperature set to 0 you would get the same output. It would actually be interesting to probe the commit prompts to see how slight variants preformed.
Quixotic, unworkable, pointless. It’s fundamentally impossible (at least without a level of surveillance that would obviously be unavceptable) to prove the “artisanal hand-crafted human code” label.
> contributors should "fully understand" their submissions and would be accountable for the contributions, "including vouching for the technical merit, security, license compliance, and utility of their submissions".
This is in the right direction.
I think the missing link is around formalizing the reputation system; this exists for senior contributors but the on-ramp for new contributors is currently not working.
Perhaps bots should ruthlessly triage in-vouched submissions until the actor has proven a good-faith ability to deliver meaningful results. (Or the principal has staked / donated real money to the foundation to prove they are serious.)
I think the real problem here is the flood of low-effort slop, not AI tooling itself. In the hands of a responsible contributor LLMs are already providing big wins to many. (See antirez’s posts for example, if you are skeptical.)
Sure now it is easy, but in 3-10 years AI will get significantly better. It is a lot like the audio quality of an MP3 recording. It is not perfect (lossless audio is better), but for the majority of users it is "good enough".
At a certain point AI generated content, PR's, etc will be good enough for humans to accept it as "human". What happens then, when even the best checks and balances are fooled?
A lot of low quality AI contributions arrive using free tiers of these AI models, the output of which is pretty crap. On the other hand, if you max out the model configs, i.e. get "the best money can buy", then those models are actually quite useful and powerful.
OSS should not miss out on the power LLMs can unleash. Talking about the maxed out versions of the newest models only, i.e. stuff like Claude 4.5+ and Gemini 3, so developments of the last 5 months.
But at the same time, maintainers should not have to review code written by a low quality model (and the high quality models, for now, are all closed, although I heard good things about Minmax 2.5 but I haven't tried it).
Given how hard it is to tell which model made a specific output, without doing an actual review, I think it would make most sense to have a rule restricting AI access to trusted contributors only, i.e. maintainers as a start, and maybe some trusted group of contributors where you know that they use the expensive but useful models, and not the cheap but crap models.
Seriously how is lwn.net even still so popular with such an atrocious unreadable ugly website. Well yes I get the irony of asking that on HN (I use an extension to make it better).
No AI needed. Spam on the internet is a great example of the amount of unreasonable people on the internet. And for this I'll define unreasonable as "committing an action they would not want committed back at them".
AI here is the final nail in the coffin that many sysadmins have been dealing with for decades. And that is that unreasonable actors are a type of asymmetric warfare on the internet, specifically the global internet, because with some of these actors you have zero recourse. AI moved this from moderately drowning in crap to being crushed under an ocean of it.
Going to be interesting to see how human systems deal with this.
This is the technique I've picked up and got the most from over the past few months. I don't give it hard, high-level problems and then review a giant set of changes to figure it out. I give it the technical solution I was already going to implement anyway, and then have it generate the code I otherwise would have written.
It cuts back dramatically on the review fatigue because I already know exactly what I'm expecting to see, so my reviews are primarily focused on the deviations from that.
This reads almost like satire of an AI power user. Why would you like it when an LLM makes things up? Because you get to write more prompts? Wouldn't it be better if it just didn't do that?
It's like saying "I love getting stuck in traffic because I get to drive longer!"
Sorry but that one sentence really stuck out to me
I think that's backwards, at least as far as accepting a PR. Better that all code is reviewed as if it is probably a carefully thought out Trojan horse from a dedicated enemy until proven otherwise.
But like the XZ attack, we kind of have to assume that advanced perissitant threats are a reality for FOSS too.
I can envisage a Sybil attack where several seemingly disaparate contributors are actually one actor building a backdoor.
Right now we have a disparity in that many contributors can use LLMs but the recieving projects aren't able to review them as effectively with LLMs.
LLM generated content often (perhaps by definition) seems acceptable to LLMs. This is the critical issue.
If we had means of effectively assessing PRs objectively that would make this moot.
I wonder if those is a whole new class of issue. Is judging a PR harder than making one? It seems so right now
It is. You haven't argued it at all, right here. You just asserted it as if it were self-evident, talked about your feelings, then demanded policy.
Your only job here was to convince people to align with you, and you didn't bother. It makes me suspect that you haven't really solidified the argument in your own mind.
The problem is having an unwritten rule is sometimes worse than a written one, even if it "works".
No, it's not that simple. AI generated code isn't owned by anyone, it can't be copyrighted, so it cannot be licensed.
This matters for open source projects that care about licensing. It should also matter for proprietary code bases, as anyone can copy and distribute "their" AI generated code for any purpose, including to compete with the "owner".
we do often choose automation when possible (especially in computer realms), but there are endless examples in programing and other fields of not-so-surprising-in-retrospect failures due to how automation affects human behavior.
so it's clearly not true. what we're debating is the amount of harm, not if there is any.
I also use LLM assistance, and I love it because it helps my ADHD brain get stuff done, but I definitely miss stuff that I wouldn’t miss by myself. It’s usually fairly simple mistakes to fix later but I still miss them initially.
I’ve been having luck with LLM reviewers though.
This is how I've found myself to be productive with the tools, or since productivity is hard to measure, at least it's still a fun way to work. I do not need to type everything but I want a very exact outcome nonetheless.
Both can look like the same exact type of AI-generated code. But one is a broken useless piece of shit and the other actually does what it claims to do.
The problem is just how hard it is to differentiate the two at a glance.
Think of it like random noise in an image editor: you do own the random pixels since they're generated by the computer, but you can still use them as part of making your art - you do not lose copyright to your art because you used a random noise filter.
I agree with you that there's a huge distinction between code that a person understands as thoroughly as if they wrote it, and vibecoded stuff that no person actually understands. but actually doing something practical with that distinction is a difficult problem to solve.
AI also generates spam though, so this is a much bigger problem than merely "unreasonable" people alone.
Debian is the latest in an ever-growing list of projects to wrestle (again) with the question of LLM-generated contributions; the latest debate stared in mid-February, after Lucas Nussbaum opened a discussion with a draft general resolution (GR) on whether Debian should accept AI-assisted contributions. It seems to have, mostly, subsided without a GR being put forward or any decisions being made, but the conversation was illuminating nonetheless.
Nussbaum said that Debian probably needed to have a discussion "to understand where we stand regarding AI-assisted contributions to Debian" based on some recent discussions, though it was not clear what discussions he was referring to. Whatever the spark was, Nussbaum put forward the draft GR to clarify Debian's stance on allowing AI-assisted contributions. He said that he would wait a couple of days to collect feedback before formally submitting the GR.
This LWN.net subscription-only content has been made available to you by an LWN subscriber. To see more of this content, please take advantage of the following special offer.
Free trial subscription
Try LWN for free for 1 month: no payment or credit card required. Activate your trial subscription now and see why thousands of readers subscribe to LWN.net.
His proposal would allow "AI-assisted contributions (partially or fully generated by an LLM)" if a number of conditions were met. For example, it would require explicit disclosure if "a significant portion of the contribution is taken from a tool without manual modification", and labeling of such contributions with "a clear disclaimer or a machine-readable tag like '[AI-Generated]'." It also spells out that contributors should "fully understand" their submissions and would be accountable for the contributions, "including vouching for the technical merit, security, license compliance, and utility of their submissions". The GR would also prohibit using generative-AI tools with non-public or sensitive project information, including private mailing lists or embargoed security reports.
It is fair to say that it is difficult to have an effective conversation about a technology when pinning down accurate terminology is like trying to nail Jell-O to a tree. AI is the catch-all term, but much (not all) of the technology in question is actually tooling around large language models (LLMs). When participants have differing ideas of what is being discussed, deciding whether the thing should be allowed may pose something of a problem.
Russ Allbery asked for people to be more precise in their descriptions of the technologies that their proposals might affect. He asserted that it has become common for AI, as a term, "to be so amorphously and sloppily defined that it could encompass every physical object in the universe". If the project is going to make policy, he said, it needed to be very specific about what it was making policy about:
An LLM has some level of defined meaning, although even there it would be nice if people were specific. Reinforcement learning is a specific technique with some interesting implications, such as the existence of labeled test data used to train the algorithm. "AI" just means whatever the person writing a given message wants it to mean and often changes meaning from one message to the next, which makes it not useful for writing any sort of durable policy.
Gunnar Wolf agreed with Allbery, but Nussbaum claimed that the specific technology did not matter. The proposal boiled down to the use of automated tools for code analysis and generation:
I see the problem we face as similar to the historical questions surrounding the use of BitKeeper by Linux (except that the choice of BitKeeper imposed its use by other contributors). It is also similar to the discussions about proprietary security analysis tools: since those tools are proprietary, should we ignore the vulnerability reports they issue?
If we were to adopt a hard-line "anti-tools" stance, I would find it very hard to draw a clear line.
Drawing clear lines, however, is something that a number of Debian developers felt was important. Sean Whitton proposed that the GR should not only say "LLM" rather than "AI", but it should also distinguish between the uses of LLMs, such as code review, generating prototypes, or generating production code. He envisioned ballot options that could allow some, but not all, of those uses. Distinguishing between the various so-called AI technologies would help in that regard. He urged Nussbaum "not to argue too hard for something that is more general than LLMs because that might alienate the people you want to agree to disagree with." Andrea Pappacoda said that the specific technology mattered a lot; he wanted the proposal to have clear boundaries and avoid broad terms like AI. He was uncomfortable with the idea of banning LLMs, and not sure where to draw the line. "What I can confidently say, though, is that a project like Claude's C Compiler should not have a place in Debian."
The conversation did not focus solely on the terminology, of course. Simon Richter had questions about the implications of allowing AI-driven contributions from the standpoint of onboarding new contributors to Debian. An AI agent, he said, could take the place of a junior developer. Both could perform basic tasks under guidance, but the AI agent would not learn anything from the exchange; the project resources spent in guiding such a tool do not result in long-lasting knowledge transfer.
AI use presents us (and the commercial software world as well) with a similar problem: there is a massive skill gap between "gets some results" and "consistently and sustainably delivers results", bridging that gap essentially requires starting from scratch, but is required to achieve independence from the operators of the AI service, and this gap is disrupting the pipeline of new entrants.
He called that the onboarding problem, and said that an AI policy needed to solve that problem; he did not want to discourage people by rejecting contributions or expend resources on mentoring people who did not want to be mentored. Accepting AI-assisted drive-by contributions is harmful because it is a missed opportunity to onboard a new contributor. "The best-case outcome is that a trivial problem got solved without actually onboarding a new contributor, and the worst-case outcome is that the new contributor is just proxying between an AI and the maintainer". He also expressed concerns around the costs associated with such tools, and speculated it might discourage contribution from users who could not afford to use for-pay tools.
Nussbaum agreed that the cost could be a problem in the future. For now, he said, it is not an issue because there are vendors providing access for free, but that could change. He disagreed that Debian was likely to run out of tasks suitable for new contributors, even if it does accept AI-driven contributions, and suggested that it may make harder tasks more accessible. He pointed to a study written by an Anthropic employee and a person participating in the company's fellows program, about how the use of AI impacts skill formation: "A takeaway is that there are very different ways to interact with AI, that produce very different results both in terms of speed and of understanding". He did not seem to be persuaded that use of AI tools would be a net negative in onboarding new contributors.
Ted Ts'o argued against the idea that AI would have a negative impact:
Some anti-AI voices are concerned that use of AI will decrease the ability to gain seasoned contributors, with the implied concern that this is self-defeating because it restricts the ability to gain new members in the future. And you are now saying we should gate keep contributors that might be using AI as being unworthy of contributing to Debian? I'd say that is even more self-defeating.
Matthew Vernon said that the proposed GR minimized the ethical dimension of using generative AI. The organizations that are developing and marketing tools like ChatGPT and Claude are behaving unethically, he said, by systematically damaging the wider commons in the form of automated scraping and doing as they like with others' intellectual property. "They hoover up content as hard as they possibly can, with scant if any regard to its copyright or licensing". He also cited environmental concerns and other harms that are attributed to generative AI tools, "from non-consensual nudification to the flooding of free software projects with bogus security reports". He felt that Debian should take a clear stand against those tools and encourage other projects to do the same:
At its best, Debian is a group of people who come together to make the world a better place through free software. I think we should be centering the appalling behaviour of the organisations who are pushing genAI on everyone, and the real harms they are causing; and we should be pushing back on the idea that genAI is either a social good or inevitable.
There was also debate around the question of copyright, both in terms of the licenses of material used to train models, as well as the output of LLM tools. Jonathan Dowland thought that it might be better to forbid some contributions now, since some see risks in accepting such contributions, and then relax the project's position later on when the legal situation is clearer.
Thorsten Glaser took a particularly harsh stance against LLM-driven contributions, going so far as to suggest that some upstream projects should be forced out of Debian's main archive into non-free unless "the maintainers revert known slop commits". Ansgar Burchardt pointed out that would have the effect of banning the Linux kernel, Python, LLVM, and others. Glaser's proposal did not seem particularly popular. He had taken a similar stance on AI models in 2025; he argued most should be outside the main archive, when the project discussed a GR about AI models and the Debian Free Software Guidelines (DFSG). That GR never came to a vote, in part because it was unclear whether the language would forbid anti-spam technologies because one could not include the corpus of spam used as training data along with filters.
Allbery did not want to touch on copyright issues but had a few words to say about the quality of AI-assisted code. It is common for people to object to code generated by LLMs on quality grounds, but he said that argument does not make sense. Humans are capable of producing better code than LLMs, but they are also capable of producing worse code too. "Writing meaningless slop requires no creativity; writing really bad code requires human ingenuity."
Bdale Garbee seconded that notion, and said that he was reluctant to take a hard stance one way or the other. "I see it as just another evolutionary stage we don't really understand the longer term positive and negative impacts of yet." He wanted to focus on long-term implications and questions such as "what is the preferred form of modification for code written by issuing chat prompts?" Nussbaum answered that would be "the input to the tool, not the generated source code".
That may not be an entirely satisfying answer, however, given that LLM output is not deterministic and the various providers of LLM tools retire models with some frequency. A user may have the prompt and other materials fed to an LLM to generate a result at a specific point in time, but it might generate a much different result later on, even if one has access to the same vendor's tools or models to run locally.
It is clear from the discussion that Debian developers are not of one mind on the question of accepting AI-generated contributions; the developers have not yet even converged on a shared definition of what constitutes an AI-generated contribution.
What many do seem to agree on is that Debian is not quite ready to vote on a GR about AI-generated contributions. On March 3, Nussbaum said that he had proposed the GR "in response to various attacks against people using AI in the context of Debian"; he felt then it was something that needed to be dealt with urgently. However, the GR discussion had been civil and interesting. As long as the discussions around AI remained calm and productive, the project could just continue exploring the topic in mailing-list discussions. He guessed that, if there were a GR, "the winning option would probably be very nuanced, allowing AI but with a set of safeguards".
The questions of what to do about AI models in the archive, how to handle upstream code generated with LLMs, and LLM-generated contributions written specifically for Debian remain unanswered. For now, it seems, they will continue to be handled on a case-by-case basis by applying Debian's existing policies. Given the complexity of the questions, diverse opinions, and rapid rate of change of technologies lumped in under the "AI" umbrella, that may be the best possible, and least disruptive, outcome for now.
I think they can also be differences on different hardware, and also usually temperature is set higher than zero because it produces more "useful/interesting" outputs
Difficulty of enforcing is a detail. Since the rule exists, it can be used when detection is done. And importantly it means that ignoring the rule means you’re intentionally defrauding the project.
AI is predictive at a token level. I think the usefulness and power of this has been nothing short of astonishing; but this token prediction is fundamentally limiting. The difference between human _driven_ vs AI generated code is usually in design. Overly verbose and leaky abstractions, too many small abstractions that don't provide clear value, broad sweeping refactors when smaller more surgical changes would have met the immediate goals, etc. are the hallmarks of AI generated code in my experience. I don't think those will go away until there is another generational leap beyond just token prediction.
That said, I used human "driven" instead of human "written" somewhat intentionally. I think AI in even its current state will become a revolutionary productivity boosting developer aid (it already is to some degree). Not dissimilar to a other development tools like debuggers and linters, but with much broader usefulness and impact. If a human uses AI in creating a PR, is that something to worry about? If a contribution can pass review and related process checks; does it matter how much or how little AI was used in it's creation?
Personally, my answer is no. But there is a vast difference between a human using AI and an AI generated contribution being able to pass as human. I think there will be increasing degrees of the former, but the latter is improbable to impossible without another generational leap in AI research/technology (at least IMO).
---
As a side note, over usage of AI to generate code _is_ a problem I am currently wrangling with. Contributors who are over relying on vibecoding are creating material overhead in code review and maintenance in my current role. It's making maintenance, which was already a long tail cost generally, an acute pain.
Can you reliably tell that the contributor is truly the author of the patch and that they aren't working for a company that asserts copyright on that code? No, but it's probably still a good idea to have a policy that says "you can't do that", and you should be on the lookout for obvious violations.
It's the same story here. If you do nothing, you invite problems. If you do something, you won't stop every instance, but you're on stronger footing if it ever blows up.
Of course, the next question is whether AI-generated code that matches or surpasses human quality is even a problem. But right now, it's academic: most of the AI submissions received by open source projects are low quality. And if it improves, some projects might still have issues with it on legal (copyright) or ideological grounds, and that's their prerogative.
This is the basis of the argument - it doesn't matter if you use AI or not, but it does matter if you know what you're doing or not.
Depends on the assumptions. If you assume good intent of the submitter and you spend time to explain what he should improve, why something is not good, etc, than it's a lot of effort. If you assume bad intent, you can just reject with something like "too large review from unproven user, please contribute something smaller first".
Yes, we might need to take things a bit slower, and build relations to the people you collaborate with in order to have some trust (this can also be attacked, but this was already possible).
The core issue is that it takes a large amount of effort to even assess this, because LLM generated code looks good superficially.
It is said that static FP languages make it hard to implement something if you don't really understand what you are implementing. Dynamically typed languages makes it easier to implement something when you don't fully understand what you are implementing.
LLMs takes this to another level when it enables one to implement something with zero understanding of what they are implementing.
McDonalds cooks ~great~ (edit: fair enough, decent) burgers when measured objectively, but people still go to more niche burger restaurants because they want something different and made with more care.
That's not to say that an human can't use AI with intent, but then AI becomes another tool and not an autonomous code generating agent.
Crystal ball or time machine?
I like it because I have no expectation of perfection-- out of others, myself, and especially not AI. I expect "good enough" and work upwards from there, and with (most) things, I find AI to be better than good enough.
Which it might. And needs to be judged on a case-by-case basis, under current copyright law.
Now, with that said I don't think we're very far from automated agents causing problems all on their own.
This is one of those areas where you might have been right.. 4-6 months ago. But if you're paying attention, the floor has moved up substantially.
For the work I do, last year the models would occasionally produce code with bugs, linter errors, etc, now the frontier models produce mostly flawless code that I don't need to review. I'll still write tests, or prompt test scenarios for it but most of the testing is functional.
If the exponential curve continues I think everyone needs to prepare for a step function change. Debian may even cease to be relevant because AI will write something better in a couple of hours.
For AI generated code if previous PRs aren't loaded into context then there's no lasting benefit from the time taken to review and it's blank slate each time. I think ultimately it can be solved with workflow changes (i.e. AI written code should be attributed to the AI in VCS, the full trace and manual edits should be visible for review, all human input prompts to the AI should be browsable during review without having scroll 10k lines of AI reasoning.)
The people following the policies are the most likely to use AI responsibly and not submit low-effort contributions.
I’m more interested in how we might allow people to build trust so that reviewers can positively spend time on their contributions, whilst avoiding wasting reviewers time on drive-by contributors. This seems like a hard problem.
Therefore, policies restricting AI-use on the basis of avoiding low-quality contributions are probably hurting more than they’re helping.
Actually not shrink, but just transfer it to reviewers.
IIRC Mitchell Hashimoto recently proposed some system of attestations for OSS contributors. It’s non-obvious how you’d scale this.
If everything the maintainer wants can (hypothetically) be one-shotted, then there is no need to accept PR's at all. Just allow forks in case of open source.
That's an OK view to hold, but I'll point out two things. First, it's not how the tech is usually wielded to interact with open-source software. Second, your worldview is at odds with the owners of this technology: the main reason why so much money is being poured into AI coding is that it's seen by investors as a replacement for the individual.
They can spin up LLM-backed contributors faster than you can ban them.
But the projects aren't drowning under PRs from reputable people. They're drowning in drive-by PRs from people with no reputation to speak of. Even if you outright ban their account, they'll just spin up a new one and try again.
Blocking AI submissions serves as a heuristic to reduce this flood of PRs, because the alternative is to ban submissions from people without reputation, and that'd be very harmful to open source.
And AI cannot be the solution here, because open source projects have no funds. Asking maintainers to fork over $200/month for "AI code reviews" just kills the project.
Wait, what? In what world are McDonalds burgers "great"? They're cheap. Maybe even a good value. But that's not the same as great.
Past performance does not guarantee future results, of course. But acting like AI is now magically going to stagnate is also a really bold bet.
If you believe the outputs of LLMs are derivative products of the materials the LLMs were trained on (which is a position I lean towards myself, but I also understand the viewpoint of those who disagree), then no, that's not a good thing, because it would be a license violation to accept those derived products without following the original material's license terms, such as attribution and copyleft terms. You are now party to violating the original materials' copyright by accepting AI generated code. That's ethically dubious, even if those original authors may have a hard time bringing a court case against you.
Without that policy it feels rude to ask, and rude to ignore in case they didn't use AI.
Hence why banning AI contributions is meaningless, you literally only punish 'good' actors.
Hmmm, no? That's actually very common in open source. Maybe "banning" isn't the right word, but lots of projects don't accept random drive-by submissions and never have. Debian is a perfect example, you are very unlikely to get a nontrivial patch or package into Debian unless you have some kind of interaction or rapport with a package maintainer, or commit to the process of building trust to become a maintainer yourself.
I have seen high profile GitHub projects that summarily close PRs if you didn't raise the bug/feature as an issue or join their discord first.
It seems that gun control—though imperfect—in regions that have implemented it has had a good bit of success and the legitimate/non-harmful capabilities lost seem worth it to me in trade for the gains. (Reasonable people can disagree here!)
Whereas it seems to me that if we accept the proposition that the vast majority of code in the future is going to be written by AI (and I do), these valuable projects that are taking hard-line stances against it are going to find themselves either having to retreat from that position or facing insurmountable difficulties in staying relevant while holding to their stance.
1. You layout policy stating that all code, especially AI code has to be written to a high quality level and have been reviewed for issues prior to submission.
2. Given that even the fastest AI models do a great job of code reviews, you setup an agent using Codex-Spark or Sonnnet, etc to scan submissions for a few different dimensions (maintainability, security, etc).
3. If a submission comes through that fails review, that's a strong indication that the submitter hasn't put even the lowest effort into reviewing their own code. Especially since most AI models will flag similar issues. Knock their trust score down and supply feedback.
3a. If the submitter never acts on the feedback - close the submission and knock the trust score down even more.
3b. If the submitter acts on the feedback - boost trust score slightly. We now have a self-reinforcing loop that pushes thoughtful submitters to screen their own code. (Or ai models to iterate and improve their own code)
4. Submission passes and trust score of submitter meets some minimal threshold. Queued for human review pending prioritization.
I haven't put much thought into this but it seems like you could design a system such that "clout chasing" or "bot submissions" would be forced to either deliver something useful or give up _and_ lose enough trust score that you can safely shadowban them.
We need to rethink some UX design and processes here, not pretend low quality people are going to follow your "no low quality pls i'm serious >:(" rules. Rather, design the processes against low quality.
Also, we're in a new world where code-change PRs are trivial, and the hard part isn't writing code anymore but generating the spec. Maybe we don't even allow PRs anymore except for trusted contributors, everyone else can only create an issue and help refine a plan there which the code impl is derived?
You know, even before LLMs, it would have been pretty cool if we had a better process around deliberating and collaborating around a plan before the implementation step of any non-trivial code change. Changing code in a PR with no link to discussion around what the impl should actually look like always did feel like the cart before the horse.
Some of the best burgers I've ever had came from fast food.
For example, someone might have done a lot of investigation to find the root cause of an issue, followed by getting Claude Code to implement the fix, which they then tested. That has a good chance of being a good contribution.
I prefer tackling this from the trust side. One approach would be to only allow new contributors to make small patches. If they get those contributions accepted, then allow progressively larger contributions. That would be one approach that I think would be more effective at helping with the real problem, which is higher volumes of low-effort contributions overwhelming maintainers.
(as an aside - this reminds me of the trend of Object Oriented Ontology that specifically /tried/ to imbue agency onto large-scale phenomena that were difficult to understand discretely. I remember "global warming" being one of those things - and I can see now how this philosophy would have done more to obscure the dominion of experts wrt that topic)
I sincerely doubt that, because it still can't even generate a few hundred line script that runs on the first try. I would know, I just tried yesterday. The first attempt was using hallucinated APIs and while I did get it to work eventually, I don't think it can one shot a complex application if it can't one shot a simple script.
IMO, AI has already stagnated and isn't significantly better than it was 3 years ago. I don't see how it's supposed to get better still when the improvement has already stopped.
In that case a lot of proprietary software is in breach of copyleft licences. Its probably by far the commonest breach.
> You are now party to violating the original materials' copyright by accepting AI generated code. That's ethically dubious
That is arguable. Is it always ethically dubious to breach a law? If not, which is it ethically dubious to breach this law in this particular way?
But post Sandy Hook, it's clear which side prevailed in this argument.
It is the conservative position: it will be easier to walk back the policy and start accepting AI produced code some time down the road when its benefits are clearer than it will be to excise AI produced code from years prior if there's a technical or social reason to do that.
Even if the promise of AI is fulfilled and projects that don't use it are comparatively smaller, that doesn't mean there's no value in that, in the same way that people still make furniture in wood with traditional methods today even if a company can make the same widget cheaper in an almost fully automated way.
> you are very unlikely to get a nontrivial patch or package into Debian unless you have some kind of interaction or rapport with a package maintainer
I did mean the "trivial" patches as well, as often it's a lot of these small little fixes to single issues that improve software quality overall.
But yes, it's true that it's not uncommon for projects to refuse outside PRs.
This already causes massive amounts of friction and contributes (heh) heavily to what makes Open Source such a pain in the ass to use.
Conversely, many popular "good" open source libraries rely extensively on this inflow of small contributions to become comprehensively good.
And so it's a tradeoff. Forcing all open source into refusing drive-by PRs will have costs. What makes sense for major security-sensitive projects with large resources doesn't make sense for others.
It's not that we won't have open source at all. It's that it'll just be worse and encourage further fragmentation. e.g. One doesn't build a good .ZIP library by carefully reading the specification, you get it by collecting a million little examples of weird zip files in the wild breaking your code.
This is even true despite the fact that there are bad actors only a few minutes drive away in many cases (Chicago->Indiana border, for example).
In terms of your plan though, you're just building a generative adversarial network here. Automated review is relatively easy to "attack".
Yet human contributors don't put up with having to game an arbitrary score system. StackOverflow imploded in no small part because of it.
And for the major projects where there was a flood of PRs, it was fairly easy to identify if someone knew what they were talking about by looking at their language; Correct use of jargon, especially domain-specific jargon.
The broader reason why "unknown contributor" PRs were held in high regard is that, outside of some specific incidents (thank you, DigitalOcean and your stupid tshirts), the odds were pretty good of a drive by PR coming from someone who identified a problem in your software by using it. Those are incredibly valuable PRs, especially as the work of diagnosing the problem generally also identifies the solution.
It's very hard to design a UX that impedes clueless fools spamming PRs but not the occasional random person finding sincere issues and having the time to identify (and fix them) but not permanent project contribution.
> and the hard part isn't writing code anymore but generating the spec
My POV: This is a bunch of crap and always has been.
Any sufficiently detailed specification is code. And the cost of writing such a specification is the cost of writing code. Every time "low code" has been tried, it doesn't work for this very reason.
e.g. The work of a ticket "Create a product category for 'Lime'" consists not of adding a database entry and typing in the word 'Lime', it consists of the human work of calling your client and asking whether it should go under Fruit or Cement.
I routinely generate applications for my personal use using OpenCode + Claude Sonnet/Opus.
Yesterday I generated an app for my son to learn multiplication tables using spaced repetition algorithm and score keeping. It took me like 5 minutes.
Of course if you use ChatGPT it will not work but there is no way Claude Code/Open Code with any modern model isn't able to generate a one hundred line script on the first try.
The latter is where you get all known contributors from! So if you close off unknown contributors the project will eventually stagnate and die.
In reality it's Logarithmic. Maybe with the occasional jolt. You'd think with Moores "law" that we'd know better by now that explosive growth isn't forever. Or at least that we're bound to physics as a cap to hit.
Eh?
Ever hear the saying the first 90% of a problem is 90% of the work, the last 10% of the program is also 90% of the work.
AI/LLMs have improved massively in that context. That's not even including the other model types such as visual/motion-visual/audio which are to the point that telling their output from reality is a chore.
And one shotting a simple script simply doesn't mean much without context. I have it dump relatively complex powershell scripts often enough and it's helped me a lot with being able to explain scripting actions to other humans where before I'd make assumptions about the other users knowledge where it was not warranted.
Sure, but this doesn't really seem relevant to the conversation. Someone else violating software license terms doesn't justify me (or Debian, in the case of TFA) doing so.
> Is it always ethically dubious to breach a law?
I'm not really concerned with the law, here. I think it is ethically dubious to use someone else's work without compensating them in the manner they declared. Copyright law happens to be the method we've used for a couple hundred years to standardize the discussion about that compensation, and sometimes enforce it. Breaching the law doesn't really enter into the conversation, except as a way our society agrees to hold everyone to a minimum ethical standard.
OK, that is reasonable. I do not think copyright is a good mechanism though, and I think the need to compensate depends on multiple factors depending on what you use a work for and under what circumstances.