https://github.com/LibreTranslate/LibreTranslate/blob/main/A...
imo AI bots have significantly affected OSS and we need better qualitative measures to define success
not just this issue — but the entire repo.
contributors like @ethanwater, @developerfred, and @Geetk172 — people actively working on bounties — were getting buried.
two identity fields — author and committer — and they can be different people.
metric growth — a substantial part of
This sentence also illustrates the absurdity of this investment model. It imposes a trade-off between building good software, and complying with the investor's metrics. They probably call such metrics evidence-based, but this example shows that they arbitrarily capture some numbers to obscure the lack of meaningful measurements.
When the article mentioned email matching, I was concerned that it would break down when a contributor's email address changes. (I have contributed to more than a few projects over the years, using email addresses that no longer exist.)
However, it looks like they're not using the email address recorded in the author's original git commit, but instead a GitHub-generated address whose unique parts are the GitHub user ID and username. That should survive authors changing their email addresses. It would still break down if a contributor loses access to their account and has to create a new one, but that's probably less common.
The captcha - maybe.
- Protect the PR submitting feature behind some CAPTCHA
- Give repo owners some way to manage external contributors, instead of forcing them to do hack like this article
Just move to Codeberg, src.hut, or Gitlab even. Serious contributors will go there with you, the lazy people with LLM farming Github karma probably won't.
Seriously. Just ask for a US$10 deposit for the each PR. If the PR is accepted (not even merged, just accepted as "this is a good effort"), give it back. Hell, give double the amount for good effort and you got yourself a cheap way to attract good contributors.
Best case, bots will balk at the payment. Worst case, the funds can be used to hire someone specifically for triage.
The writing style in their onboarding doc has common AI tells (in the quote: em dashes, “it’s not A, it’s B” sentence).
I can understand that, perhaps they want to fight fire with fire or don’t have time as they already say. Still, it all feels like inadequate half measures to me.
Also please let us delete PRs just like we can delete issues.
Negative score would be reports from other users because of spammy content or not acknowledged issues, with a middle ground of neutral score (+-0) or little positive score to issues or whatever with clear good intention, but couldn't reach a proper merged PR or were not issues (e.g. issue existed but wasn't the correct repo to be addressed, PR was good but needed other stuff to be implemented prior to it, maybe in the long run, etc)
Altman after the verdict: "It's okay to steal a charity"
> While GitHub reports massive metric growth — a substantial part of which is AI-generated — we as an open source project team have to do the heavy lifting of cleaning up AI slop from our repository and come up with esoteric workarounds to keep the level of legitimacy of our open source audience.
AI generated slop!
Soon there will be no more AI doomer comments. The bots will take over that job too.
---
I'm working for an open source company, and my God, are 95% of contributions useless.
There are really dumb ones where the bot writes 10 paragraphs about how he implemented the feature, but the entire changeset is adding one line to .gitignore or adding a CLAUDE.md file.
There are even worse ones where the bot submits 3000 lines of code that seemingly works, but you have to spend an hour to figure out why it doesn't work.
The dumb ones are so much better.
why not use hooks to automatically reject issue comments / PRs etc. from users that didnt go through onboarding, rather than repurposing GH features that aren't really designed for that use (and are hence in danger of being changed someday)?
Currently, more than 10% of all commits in the archestra repo are essentially noise (369 of 3521 commits), accounting for more than half of all commits in the last month (303 of 578 commits).
But maybe (probably) the amount of such commits will go down over time, compared to the growing amounts of AI slop
> When requiring approvals only for first-time contributors (the first two settings), a user that has had any commit or pull request merged into the repository will not require approval. A malicious user could meet this requirement by getting a simple typo or other innocuous change accepted by a maintainer, either as part of a pull request they have authored or as part of another user's pull request.
Also devs: stop giving us real world problems to solve
Wonder if a dollar would work for now until more people give bots credit cards.
(Why there is a race for AI commits/PRs to projects is beyond me though...)
Cowboy coders got a virtual cowgirl coder and sold it to everyone, hmm, maybe... (respected or not, solo devs don't always have the requisite skills to not be a cowboy, either due to lack of experience or lack of innate skill)
I don't know that I completely buy this narrative, though. There has been a strong, top-down push for this since the "beginning".
We made "Github contributions" a metric for people applying for dev jobs. So, of course, because devs are the kind of people we are, they started working out how to game that metric.
Some folks decided to start paying bounties on bug fixes, features, etc. Those bounties are fairly trivial by western standards, but are significant for developing countries. This creates a new career for developers; racing to collect the bounties on offer.
LLMs have exacerbated these problems by allowing existing people doing this to do it faster, and also allowing more people to pretend to be software developers and get in on the action.
If we stopped allowing LLM-authored contributions we'd still have too many shitty PRs. It would just be back to pre-LLM levels of "too many".
The answer is to make Github contributions valueless. Stop paying bounties, and stop using them to assess candidates.
Try talking more about the meta of coding itself. Get into the developers head by _talking_ to them and understanding how they would approach and attack different problems. You can show them code and ask them what they would do differently / how they would go about implementing X-Y-Z. Just because you can write foobar doesn't mean you understand how to apply algorithms or w/e specific problems [your] team has. It's _far_ better to understand how they would solve a problem over their syntax anyway.
At least bringing up the underlying method (restrict to contributors) has spawned the discussion about how that's probably a bad idea on the security side.
Your solution would be great if GitHub would also allow me to whitelist specific users, but unfortunately this still won't block "implementation plans" in comments.
I strongly prefer the git email model, where it’s often trivial to control the flow of changes proposal. GitHub does not have the same wealth of tools and versatility.
You can’t submit a PR because your laptop is too slow? Rent some hash rate from someone, and now you’ve just made a system of paying botnet owners to be able to make a typo fix on a github repo. HashCash was never used in the real world for a reason, it sounds cute but the incentives are so insane as to only work in a vacuum where you assume everyone isn’t cheating.
If you are insecure because someone has had one of their otherwise completely innocent PRs merged into your repo... you are insecure, period.
Seriously, chill, then think about how you'd implement it. Then think how it'd go wrong. Then think about how to fix those problems. Repeat until you realize there's a better solution or until you solve the problem without making it overly convoluted. More often than not the former is the better option. More often the latter is just a variant of the sunk cost fallacy and your ego. Reality is (un)surprisingly complex and solutions aren't usually trivial
Unless I knew the maintainers personally, this would prevent most of my contributions, which are most often accepted. Maybe it's worth losing out on my small contributions to avoid slop. But things would absolutely be lost this way.
---
I know it's against convention to comment on downvotes, but really? Really? This is controversial? The OP came up with an elegant solution that cleanly solved their problem without subjecting contributors to anything more than a captcha. Then somebody comes along and says "oh, it's so easy, just charge $10". You're going to set up payment infrastructure, incur administrative overhead with human support managing refunds, and deter 99% of actual humans from contributing, and then call that the easy solution that OP is so stupid for not thinking of first? Give me a fucking break. This site really is just Reddit-lite, anyone who thinks about engineering problems seriously would realise this does not stand up as anything beyond a pithy internet solution with three seconds of thought into what actually implementing it would entail.
Let's say I'm a maintainer of an open source project on Github/Gitlab. How would you actually implement this deposit-refund loop in practice?
Or teenagers without full access to online banking.
Or the unemployed.
No one, meat or chip, would just set aside $10 "for the opportunity to contribute"
This is "let them eat cake" level of out of touchness.
Imagine you want to get a doctor's opinion, or maybe a couple of opinions. But a zillion AI-amateurs have registered themselves as doctors. How do you separate wheat from the chaff?
The xz supply chain attacker hid their real identity, created fakes one and gained recognition over time in order to gain more access and add the backdoor. So TLAs and other bad actors at least are interested in gaining recognition.
How does the website trigger the CI script? Through GH rest API?
Discussion on Hacker News:
When a few months ago GitHub shared statistics about celebrating an enormous contribution of AI in their product metrics, completely missing the point of degraded contribution quality, we already felt that things were going south.
The first worrying moment was the issue we posted with a $900 bounty. We were hoping to motivate someone to contribute and bring shiny new "MCP Apps" support to our platform. We quickly got the attention of legitimate contributors proposing plans, asking questions, submitting attempts — but soon...
AI bots arrived and blew up the issue, taking it to 253 comments total, poisoning the conversation with pointless "implementation plans" and even pure aggression toward the maintainers!
AI accounts started flooding not just this issue — but the entire repo. Every sloppy comment triggered a notification for every team member watching the repo. Our GitHub notifications became a wall of noise. Real conversations from contributors like @ethanwater, @developerfred, and @Geetk172 — people actively working on bounties — were getting buried.
Later, the problem took the form of an epidemic. For example, just for the issue to add x.ai provider support to Archestra, we received 27 pull requests, most of which contributors didn't even try testing.
One of our team members had to spend half a day every week cleaning AI garbage out of the repo, removing untested PRs and closing hallucinated issues. When we forgot to do so, our repo quickly became a place completely unfriendly to legitimate contributors.
At first, we tried to calculate the "reputation" of contributors and built "London-Cat", a tiny bot calculating a contributor's reputation based on merged PRs and a few other signals (example). It obviously didn't stop the spam, but it helped us figure out "who is who".
As a next step, we built an "AI sheriff" (example), which obviously closed a few legitimate PRs 🤦.
The constant flow of useless AI comments and proposals was only getting worse, turning legitimate contributors away and making us reconsider: should we stop motivating contributions with bounties? Should we stop giving fun test tasks to our job candidates?
We've decided that we need to fight back and insist on making our repo a comfortable and safe space for legitimate contributors, responsible AI users, newbies, and seasoned engineers.
Today we're blocking the ability to create issues, open PRs, and leave comments for those who didn't go through the onboarding.

Contributor onboarding, five steps to get whitelisted
It's a nuclear option, yes. It's especially sensitive for a VC-backed startup that is measured thoroughly by GitHub activity, but we have to pull the trigger: we value quality over quantity. We don't value metrics pumped by AI slop.
We want Archestra to be a great piece of software that everyone can contribute to, without it being swallowed by AI bots.
There is no straightforward way to whitelist those who can comment or create PRs on an open source repo, so we had to hack around.
There is a setting called "Limit to prior contributors." Simple rule: if you haven't previously committed to main, you can't comment on issues or PRs.

Prior contributors setting
The setting can't tell the difference between an AI bot and a real developer who signed up to work on a bounty. Both are "not prior contributors." Both get locked out.
GitHub defines "prior contributor" as someone whose GitHub account is the author of a commit on main. Git commits have two identity fields — author and committer — and they can be different people.
You can create a commit attributed to someone else using Git's --author flag. If the email matches their GitHub account, GitHub links the commit to their profile and grants them contributor status.
Every GitHub account has a noreply email: <id>+<username>@users.noreply.github.com. Look up the ID via the API and commit:
gh api users/their-username --jq '.id'
git commit \
--author="their-username <ID+their-username@users.noreply.github.com>" \
-m "chore: add their-username to external contributors"
Push to main, and they can comment immediately.

Commit attributed to external user
The external user shows up as the author, our account as the committer. That's all GitHub needs to consider them a prior contributor.
The full flow:
EXTERNAL_CONTRIBUTORS.md file, and pushes a commit to main authored under their account.While GitHub reports massive metric growth — a substantial part of which is AI-generated — we as an open source project team have to do the heavy lifting of cleaning up AI slop from our repository and come up with esoteric workarounds to keep the level of legitimacy of our open source audience.
Slop is not only demotivating contributors who want to spend their time doing good and have to break through the wall of noise instead, it's also bringing a substantial security risk, as it happened in the LiteLLM repo when attackers tried to steer the conversation using AI bots.
Dear community, it's time to have a serious talk about the effect AI has on open source.
Sure, but looking at the cost to do it at scale is the wrong metric. I surely can't compete with a career spammer on emails-per-second or even emails-per-dollar, but I also don't need to.
It's more about the expected-value versus the cost. For example, my expected benefit from one email to my family is (while hard to quantify) hopefully much higher than a spammer's expected benefit of one spam email going out, which has a very small chance of leading to any amount of money. Attaching a CPU-churn cost per email is something I can ignore on my desktop, but they have to at least budget for it.
I'd also like to note that the win-condition isn't as extreme as making spam (or other "crimes") truly unprofitable, it just needs to be less profitable than other things the time/resources could be used for.
We really need to solve SPAM itself here, I think there may be a way to do it. I.e., the problem of spam is NtoN scaling connections. The network has never been able to solve that problem (exponential is the hardest). Limiting communication in terms of mesh networking may be the ultimate solution - bots can't get to you because they can't reach you.
What needs to be invented is a bridging protocol - some way to establish "legitimate" lines of communication over a network, while preserving (to some degree) privacy and decentralization. AI can only enter this network by being explicitly added to the channel, and thereby explicitly and easily blocked (and also solving the general SPAM issue once and for all).
If companies can screw you over and claim it's a mistake, there isn't much a person can do.
It's all about level's of trust, a maintainer going rogue is less likely, a past contributor going rogue more likely but not too much, a stranger with a typo pr merged even more likely but still, a complete stranger least trust worthy.
> Maybe GitHub should temporarily block accounts from raising PRs if like 95%+ of them are getting rejected.
It's so bad I'd be okay with a lower bar where it's flagged if they're posting the same message over multiple repos... FFS they aren't even stopping this shit https://news.ycombinator.com/item?id=47964617the rate of comits/PRs total
The rate of PRs to repos they don’t own
The reject rate of PRs
The number of ban
An estimated “AI” or bot score or status flag
There are a few better attempts at GitHub metrics calculators but I have not seen any that move beyond the paradigm of more vomits is default assumed good. It’s time to foreground quality not just quantity. The GitHub “4 kpis” are entirely action oriented.
Frontier users: 527,865 Light indexed: 527,865 Ready to queue: 9,083 Fast scores ready: 0 Activity events 24h: 30,266 Fast scores completed 24h: 19,123 Deep jobs completed 24h: 3,043 Fast-score ETA: n/a Deep-hydrate ETA: 69h Stale running jobs: 0 GitHub backpressure jobs: 19,113 High automation signals: 4,608 Medium automation signals: 1,327 Completed jobs: 74,714
Biggest challenge is Github's rate limits. At this pace it will take two more months to have 98% coverage. But after that the maintenance should be quite straight forward.
The Elo rating system doesn't make sense in this context; it's designed around collecting zero sum game results for a given community of players and building a model around it.
A similar system would be nice for issues, though I'm not sure what it'd look like if issues are the springboard for contributing PRs.
Not likely to ever happen (as others said), GitHub/MS want to sell CoPilot subscriptions/tokens and LLM-generated PRs are a part of that business model.
Given any manipulatable scheme, AI will figure out how to manipulate it. For the OP, what happens if a single AI manages to get through to contributor? Then it starts elevating other AIs to contributor, and we're off again. There doesn't have to be a purpose to this. Trolls will troll, and trolls armed with AI bots can devote endless energy to doing so. The more you work to keep them out, the more fun it becomes for them.
I wish I had an answer for that problem. But I don't.
If you don't trust the maintainer, you can always fork a repo and let them merge on their own.
Your suggestion would help a bit but I would prefer the opposite: before someone can 'pollute' my pull request space and draw attention from subscribers I would prefer an acceptance step (just like a moderator on a forum) instead of having to archive the PRs.
This is especially important as (AI) spam increases and just because I am away for a few days or weeks I don't want those PRs lurking around.
Right, but that's not what happened though.
Someone went to the public square, said "Hey, I'm looking for any sort of doctor, and I'll pay you $900 if you tell me your plan and then whatever plan I chose wins" and then they get surprised they get flooded by zillion AI-amateurs.
You don't generate a ton of chaff then try to find the wheat, you ensure your process doesn't generate a ton of chaff in the first place. Offering large monetary rewards for relatively simple work for anyone in the public is bound to generate a ton of chaff...
More than likely GitHub would have to maintain their own internal wallet solution for this, which is a big engineering lift. But we're all just having a discussion.
``` # FIrst-time contributors
Due to the increased number of AI bots and low-effort contributions, we are being forced to add some friction for first-time contributors. PRs are closed for anyone not explicitly added to our list of authorized users.
To be accepted in the list, you must do one of the following:
- Show a history of meaningful contributions in projects from related technologies done before Jan 1st, 2023.
- Be vouched by one of the existing contributors in the core team
- (If you have github sponsors/polar/patreon) Be a sponsor for the project for the last 3 months)
- Submit a small payment, which will be held in escrow until your PR is accepted. The following methods are accepted (choose all that apply: paypal, SEPA, Crypto, Venmo, Pix, UPI, M-Pesa, etc) ```
Polar.sh is already doing things that are a lot more complex in this space.
If you are in a civilized country which allow direct payments (i.e, anything but North American nowadays) and you don't want to deal with Github or any external system, there is always good old "make a M-PESA/SEPA/Pix/UPI transfer to account XYZ")
> the thought put into it as the actual solution by people with a stake in actually solving the problem
Let me flip your argument: think of how much time and thought is poured into problems like this one by people who don't even try to implement a Pfand system beforehand.
If I was told that I could make a deposit of $10 to get less stressed maintainers and a faster PR review cycle, I wouldn't even blink. I wouldn't even ask for the money back.
Then they'll get removed by the humans? Its about cutting down work, not about eliminating the work entirely
The current approach removes about 99% of their overhead it would seem. If they have to do a few manual interventions here and there, that seems like a huge win overall
There are a lot of political tricks that get used.
What is scary is that one of those kinds of users are malicious state actors. Like North Korea and Russia...
The totality of someone's currency is their reputation.
Of course, now the decision becomes...who is the central currency issuer that creates it?
> to a genuine solution
Except it isn't. It is a lazy solution and impractical one > More than likely GitHub would have to maintain their own internal wallet solution
Great, so you even found one of the main issues, which pushes off the problem to a third party which makes it an impossible solution for anyone but GitHub (still a problematic "solution" though) > This is an overly negative response
Yet it isn't because even as you noted it's not realistic to implement.There's two types of lazy, and this is the kind that creates more work, not less
...which is not available to maintainers to use in this way.
> there is always good old "make a M-PESA/SEPA/Pix/UPI transfer to account XYZ"
And then lock out anyone who is not from the same country as the maintainer, on a platform that is known for its global reach.
Moreover, you're introducing significant anti-human friction. For privacy-conscious people, it's a complete non-starter; I'm not giving my payment information, not for a $1 transaction, and compromising my anonymity just to make a PR for the benefit of other people. That's a small subset. Then, you have the lazy people. The majority of the population will simply not bother with something if it has friction. Getting out their credit card is one of those things, and it's why products/services that offer free trials or a free tier tend to be overwhelmingly more successful -- people want to see a tangible benefit to themselves before they engage in high-friction processes (where "high-friction" is as little friction as requiring a payment, yes). "Free to play" video games with microtransactions engineer first-time purchases to be cheap ($1 or $5) and have 5x or 10x the value of the normal microtransactions, because that first hurdle of getting somebody to hand over their payment information is by far the biggest.
I'll take the captcha, thanks. And maintainers will too, because they'd rather have the solution that filters bots and keeps humans contributing rather than the one that filters out both humans and bots.
I was also wondering how automated or manual you would envision the review process. I'm guessing your hope would be that the small deposit would stem the flow of submissions enough to make it all possible to review manually again, and you would also manually return all the payments sent to escrow?
/s
You could probably use some kind of pairwise ranking algorithm (like anything based on the Bradley-Terry model) to rate human vs. AI contributions, but that would take a lot of manual effort. Google is using it to (supposedly) improve their searching algorithms. They give testers two different versions and ask them what's better.
And a registration system that amounts to a more complicated captcha doesn't? How long until someone starts farming accounts and run bots that jump through these hoops as well?
> wonder if the idea is fundamentally broken.
It's only "fundamentally broken" if you need to build a perfect system that needs to accept 100% of legit PRs without raising any level of friction.
But we don't need that. Pfand systems are not meant to be perfect, and they are not meant to the single solution to any problem involving the commons. They will not get rid of all bad behavior, but they will certainly bring it a global-scale problem down to levels that can be then managed by other smaller, context-aware systems.
Paypal/SEPA transfers are free in Europe. And even if I lived in the US and had to pay a small processor fee, I'd be more than willing to cover the $0.50 in fees if that meant I was receiving contributions from people who went through all the trouble.
The issue here is the core model is broken (misaligned incentives). That's not something you are going to fix with a github "downstream". A token system could help but it's easy to imagine ways that could be gamed, if not implemented well.
Yes, that friction is intentional. The lazy people don't want to do it? Great, there is very little chance their contributions are worthwhile. The privacy conscious people won't do it? Then let them work on their own repositories and complain loudly about the idiot maintainer who puts these insane barriers. Then the maintainer can go take a look at that forks done by the loud complainers and see if it is worth to whitelist them.
> it's why products/services that offer free trials or a free tier tend to be overwhelmingly more successful
Drug dealers also offer the first hit for free, why don't you use that as an example as well? ;)
To answer this properly in case the quip was too vague: there is no reason for "number of PRs opened by new contributors" to be a viable/interesting KPI for any FOSS project.
> I'll take the captcha, thanks.
First you need to show me all your cool FOSS projects.
Mind you: that's on one of the most convulated ways there is to get involved, because it involves a bunch of smart contract operations and on-chain voting. If we are talking about crypto only as a payment network, things are even simpler.
If search ads are blocked on search engines, then there is no revenue for the browser. It's that simple (on top of that Brave has other revenues, but the majority is search ads).
So it's a game of hoping that the majority won't change the default.
This is the main reason Brave does not block search ads specifically by default, but still block the other ads. Blocking the other ads there are no consequences, since anyway this revenue is not shared back to the browser.
This is why the business model of Brave is cynical.
-> It's the same model as AdBlock and the "Acceptable Ads" (block all ads, except the acceptable ads, unless you disallow them)
A generic python library used by generic people who have no interest in this field is something else.
Also, we are talking about people who are tech-savvy enough to be interested in participating in a FOSS project. Opening an account at an exchange is not rocket science.
2) You are not "paying" $10. The money would be returned to you. In case you haven't heard of Pfand systems: https://en.wikipedia.org/wiki/Container-deposit_legislation
And those who are living in countries the USA doesn't like will probably have no issue to learn how to work with crypto. Of all the complex things they need to do to work around the restrictions, setting up a wallet barely registers.