¹ Glossing over the "what they're getting in return" part. ² https://www.warpbuild.com/
https://docs.github.com/en/enterprise-cloud@latest/organizat...
but then their status center isn't really trust-able anymore and a lot of temporary issues I have been running into seem to be temporary, partial, localized failures which sometimes fall under temp. slow to a point of usability. Temporary served outdated (by >30min) main/head. etc.
so that won't even show up in this statistics
I find it hard to believe that an Azure migration would be that detrimental to performance, especially with no doubt "unlimited credit" to play with?
You can provision Linux machines easily on Azure and... that's all you need? Or is the thinking that without bare metal NVMe mySQL it can't cope (which is a bit of a different problem tbf).
A migration like this is a monumental undertaking to the level of where the only sensible way to do a migration like this is probably to not do it. I fully expect even worse reliability over the next few years before it'll get better.
More recently:
Addressing GitHub's recent availability issues
https://github.blog/news-insights/company-news/addressing-gi...
(with a smattering of submissions here the last few weeks but no discussion)
Just don’t like the slop that’s getting us there.
This sounded crazy in 2020 when I said that in [0]. Now it doesn't in 2026 and many have realized how unreliable GitHub has become.
If there was a prediction market on the next time GitHub would have at least one major outage per week, you would be making a lot of money since it appears that AI chatbots such as Tay.ai, Zoe and Copilot are somewhat in charge of wrecking the platform.
Any other platform wouldn't tolerate such outages.
GHA can’t even be called Swiss cheese anymore, it’s so much worse than that. Major overhauls are needed. The best we’ve got is Immutable Releases which are opt in on a per-repository basis.
I never use Github Copilot; it does go down a lot, if their status page is to be believed; I don't really care when it goes down, because it going down doesn't bring down the rest of Github. I care about Github's uptime ignoring Copilot. Everyone's slice of what they care about is a little different, so the only correct way to speak on Github's uptime is to be precise and probably focus on a lot of the core stuff that tons of people care about and that's been struggling lately: Core git operations, website functionality, api access, actions, etc.
> For us, availability is job #1, and this migration ensures GitHub remains the fast, reliable platform developers depend on
That went about as well as everyone thought back then.
Does anyone else remember back in ~2014-2015 sometime, when half the community was screaming at GitHub to "please be faster at adding more features"? I wish we could get back to platforms (or OSes for that matter) focusing in reliability and stability. Seems those days are long gone.
People on lobsters a month ago were congratulating Github on achieving a single nine of uptime.[1]
I make jokes about putting all our eggs in one basket under the guise of “nobody got fired for buying x; but there are sure a lot of unemployed people”- but I think there’s an insidious conversation that always used to erupt:
“Hey, take it easy on them, it’s super hard to do ops at this scale”.
Which lands hard on my ears when the normal argument in favour of centralising everything is that “you can’t hope to run things as good as they do, since there’s economies of scale”.
These two things can’t be true simultaneously.. this is the evidence.
[0]: https://mrshu.github.io/github-statuses/
[1]: https://lobste.rs/s/00edzp/missing_github_status_page#c_3cxe...
The real problem today IMO is that Microsoft waited so long to drop the charade that they now felt like they had to rip the bandaid. From what I've heard the transition hasn't gone very smoothly at all, and they've mostly been given tight deadlines with little to no help from Microsoft counterparts.
I understand how appealing it is to build an AI coding agent and all that, but shouldn't they - above everything else - make sure they remain THE platform for code distribution, collaboration and alike? And it doesnt need to be humans, that can be agents as well.
They should serve the AI agent world first and foremost. Cause if they dont pull that off, and dont pull off building one of the best coding agents - whcih so far they didnt - there isn't much left.
There's so many new features needed in this new world. Really unclear why we hear so little about it, while maintainers smack the alarm bell that they're drowning in slop.
2026-02-27T10:11:51.1425380Z ##[error]The runner has received a shutdown signal. This can happen when the runner service is stopped, or a manually started runner is canceled. 2026-02-27T10:11:56.2331271Z ##[error]The operation was canceled.
I had to disable the workflows.
GitHub support response has been
“ We recommend reviewing the specific job step this occurs at to identify any areas where you can lessen parallel operations and CPU/memory consumption at one time.”
That plus other various issues makes me start to think about alternatives, and it would have never occurred to me one year back.
[0] https://github.com/Barre/ZeroFS/actions/runs/22480743922/job...
That's a high bar though. Few things are better than Swiss cheese.
Sure they can. Perhaps a useful example of something like this would be to consider cryptography. Crypto is ridiculously complex and difficult to do correctly. Most individual developers have no hope of producing good cryptographic code on the same scale and dependability of the big crypto libraries and organizations. At the same time these central libraries and organizations have bugs, mistakes and weaknesses that can and do cause big problems for people. None of that changes the fact that for most developers “rolling your own crypto” is a bad idea.
We wouldn’t couple so much if we knew reliability would be this low. It will influence future decisions.
Then Azure Dev Ops (formerly known as Visual Studio Team System) dead o n the ocean floor.
Although given how badly GitHub seems to be doing, perhaps it's better to be ignored.
When I saw his interview: https://thenewstack.io/github-ceo-on-why-well-still-need-hum... i thought "oh, there is some semblance of sanity at Microsoft".
This was after seeing those ridiculous PRs where microsoft engineers patiently deconstructed AI slop PRs they were forced to deal with on the open source repos they maintained.
When he was gone a few months later and github was folded into microsoft's org chart the writing was firmly on the wall.
That's the reason you hear the complaints: they're from people who no longer want to be using this product but have no choice.
Because Microsoft doesn't need to innovate or even provide good service to keep the flies glued, they do what they've been doing: focus all their resources on making the glue stickier rather than focusing on making people want to stay even if they had an option to leave.
Codespaces specifically is quite good for agent heavy teams. Launch a full stack runtime for PRs that are agent owned.
> keep hearing that Github is terrible
I do not doubt people are having issues and I'm sure there have been outages and problems, but none that have affected my work for weeks.GH is many things to many teams and my sense is that some parts of it are currently less stable than others. But the overall package is still quite good and delivers a lot of value, IMO.
There is a bit of an echo chamber effect with GH to some degree.
Once we got the email that they were going to charge for self-hosted runners that was the final nail in the coffin for us. They walked it back but we've lost faith entirely in the platform and vision.
You can pin actions versions to their hash. Some might say this is a best practice for now. It looks like this, where the comment says where the hash is supposed to point.
Old --> uses: actions/checkout@v4
New --> uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4
There is a tool to sweep through your repo and automate this: https://github.com/mheap/pin-github-actionPerhaps mixing the CI with the CD made that worse because usually deployment and delivery has complexities of its own. Back in the day you'd probably use Jenkins for the delivery piece, and the E2E nightlies, and use something more lightweight for running your tests and linters.
For that part I feel like all you need, really, is to be able to run a suite of well structured shell scripts. Maybe if you're in git you follow its hooks convention to execute scripts in a directory named after the repo event or something. Forget about creating reusable 'actions' which depend on running untrusted code.
Provide some baked in utilities to help with reporting status, caching, saving junit files and what have you.
The only thing that remains is setting up a base image with all your tooling in it. Docker does that, and is probably the only bit where you'd have to accept relying on untrusted third parties, unless you can scan them and store your own cached version of it.
I make it sound simpler than it is but for some reason we accepted distributed YAML-based balls of mud for the system that is critical to deploying our code, that has unsupervised access to almost everything. And people are now hooking AI agents into it.
They're not even struggling to get their average to three 9s, they're struggling to get ANY service to that level.
Copilot may be the least stable at one 9, but the services I would consider most critical (Git & Actions) are also at one 9.
It's just that everybody is using 100 tools and dependencies which themselves depend on 50 others to be working.
That's only a valid sentiment if you only use the big players. Both of those have medium/smaller competitors that have shown (for decades) that they are extremely boring, therefore stable.
The main desiderata with these kinds of action pinning tools is that they (1) leave a tag comment, (2) leave that comment in a format that Dependabot and/or Renovate understands for bumping purposes, and (3) actually put the full tag in the comment, rather than the cutesy short tag that GitHub encourages people to make mutable (v4.x.y instead of v4).
[1]: https://github.com/suzuki-shunsuke/pinact
[1] https://app.radicle.xyz/nodes/radicle.dpc.pw/rad%3Az2tDzYbAX...
I'm at a much smaller outfit now so we have more freedom but I'd dread to think the arguments I would've had at the 4000+ employee companies I was at before.
the pages got slower, rendering became a nightmare.
then they introduced GitHub actions (half baked) - again very unreliable
then they introduced Copilot - again not very reliable
it's easy to see why availability has gone down the drain.
are they still on the rails monolith ? they speak about it less these days ?
These days it is very common that something like opening the diff view of a trivial PR takes 15-30 seconds to load. Sure, it will eventually load after a long wait or an F5, but it is still negatively impacting my productivity.
That’s… one 9 of reliability. You could argue the title understates the problem.
> You don't need every single service to be online in order to use GitHub.
Well that’s how they want you to use it, so it’s an epic failure in their intended use story. Another way to put this is ”if you use more GitHub features, your overall reliability goes down significantly and unpredictably”.
Look, I have never been obsessed with nines for most types of services. But the cloud service providers certainly were using it as major selling/bragging points until it got boring and old because of LLMs. Same with security. And GitHub is so upstream that downstream effects can propagate and cascade quite seriously.
I’d go so far as to say that there are more crypto libraries than there are “default” options for SaaS Git VCS (Gitlab and Github are the mainstay in companies and maybe Azure Devops if you hate your staff- nobody sensible is using bitbucket) but for TLS implementations there’s RustTLS, GnuTLS, BoringSSL, LibreSSL, WolfSSL, NSS, and AWS-LC that come to mind immediately.
Also of note is that the Microsoft org chart always showed GitHub in that structure while the org chart available to GitHub stopped at their CEO. Its not that they were finally rolled into Microsoft's org chart so much as they lifted the veil and stopped pretending.
If you have ever operated GitHub Enterprise Server, it’s a nightmare.
It doesn’t support active-active. It only supports passive standbys. Minor version upgrades can’t be done without downtime, and don’t support rollbacks. If you deploy an update, and it has a bug, the only thing you can do is restore from backup leading to data loss.
This is the software they sell to their highest margin customers, and it fails even basic sniff tests of availability.
Data loss for source code is a really big deal.
Downtime for source control is a really big deal.
Anyone that would release such a product with a straight face, clearly doesn’t care deeply about availability.
So, the fact that their managed product is also having constant outages isn’t surprising.
I think the problem is that they just don’t care.
And then on top of all that, their traffic is probably skyrocketing like mad because of everyone else using AI coders. Look at popular projects -- a few minutes after an issue is filed they have sometimes 10+ patches submitted. All generating PRs and forks and all the things.
That can't be easy on their servers.
I do not envy their reliability team (but having been through this myself, if you're reading this GitHub team, feel free to reach out!).
TravisCI
Jenkins
scripts dir
Etc
Scarcely a day goes by without an outage at a cloud service. Forget five nines – the way things are going, one nine is looking like an ambitious goal.
GitHub has had a rough month so far. On February 9, Actions, pull requests, notifications, and Copilot all experienced issues. The Microsoft tentacle admitted it was having problems with "some GitHub services" at 1554 UTC before it confessed to notification delays of "around 50 minutes".
It took until 1929 UTC for the company to confirm that things were back to normal, although the delay was down to "approximately 30 minutes" by 1757 UTC.
One of its flagship technologies, Copilot, also suffered. From 1629 UTC on February 9 to 0957 UTC on February 10, GitHub reported problems in Copilot policy propagation for some users. The code shack said: "This may prevent newly enabled models from appearing when users try to access them."
And so it goes on. GitHub changed its status page a while ago, making it harder to visualize the availability of its services. Yes, the details are front and center, but getting a sense of how things have gone over the last 90 days, particularly overall uptime, is trickier.
The "missing" status page exists in reconstructed form via the public status feed, though this is an unofficial source so requires caution. It reveals that GitHub's stability has been poor: uptime dropped below 90 percent at one point in 2025.
The code shack isn't alone in experiencing service instability. While five nines (99.999 percent uptime) represents the gold standard, some vendors struggle to maintain even 90 percent — a concern for customers relying on these platforms.
GitHub's Service Level Agreement for Enterprise Cloud customers specifies 99.9 percent uptime, although the company does not guarantee this for all users.
The travails of GitHub customers highlight the need to plan for downtime as well as uptime. ®
Like they are down to one 9 availability and very very close to losing that to (90.2x%).
This also fit more closely to my personal experience, then the 99.900-99.989 range the article indicates...
Through honestly 99.9% means 8.76h downtime a year, if we say no more then 20min down time per 3 hours (sliding window), and no more then 1h a day, and >50% downtime being (localized) off-working hours (e.g. night, Sat,Sun) then 99.9% is something you can work with. Sure it would sometimes be slightly annoying. But should not cause any real issues.
On the other hand 90.21%... That is 35.73h outage a year. Probably still fine if for each location the working hour availability is 99.95% and the previous constraints are there. But uh, wtf. that just isn't right for a company of that size.
Like: https://github.com/actions/checkout/tree/11bd71901bbe5b1630c...
So I'm pretty sure that for the same commit hash, I'll be executing the same content.
The improvements to PR review have been nice though
There's clearly one small team that works on it. There are pros and cons to that.
It hasn't even got an obnoxious Copilot button yet for example, but on the other hand it was only relatively recently you could properly edit comments in markdown.
If the client has existing AzDo Pipelines then I'd suggest keeping them there.
Nonetheless it looks like he was both willing and able to push back on a good deal of the AI stupidity raining down from above and then he was removed and then, well, this...
This article[0] gives a good overview of the challenges, and also has a link to a concrete attack where this was exploited.
[0]: https://nesbitt.io/2025/12/06/github-actions-package-manager...
I think this is a really important point that is getting overlooked in most conversations about GitHub's reliability lately.
GitHub was not designed or architected for a world where millions of AI coding agents can trivially generate huge volumes of commits and PRs. This alone is such a huge spike and change in user behavior that it wouldn't be unreasonable to expect even a very well-architected site to struggle with reliability. For GitHub, N 9s of availability pre-AI simply does not mean the same thing as N 9s of availability post-AI. Those are two completely different levels of difficulty, even when N is the same.
I dunno, probably the worst UX downgrade so far, almost no PRs are "fully available" on page load, but requires additional clicks and scrolling to "unlock" all the context, kind of sucks.
Used to be you loaded the PR diff and you actually saw the full diff, except really large files. You could do CTRL+F and search for stuff, you didn't need to click to expand even small files. Reviewing medium/large PRs is just borderline obnoxious today on GH.
They have somehow found the worst possible amount of context for doing review. I tend to pull everything down to VS Code if I want to have any confidence these days.