A change made to how Cloudflare's Web Application Firewall parses requests caused Cloudflare's network to be unavailable for several minutes this morning. This was not an attack; the change was deployed by our team to help mitigate the industry-wide vulnerability disclosed this week in React Server Components. We will share more information as we have it today.
Weird that https://www.cloudflarestatus.com/ isn't reporting this properly. It should be full of red blinking lights.
Wise was just down which is a pretty big one.
Also odd how some websites were down this time that previously weren't down with the global outage in November
The previous one affected European users for >1h and made many Cloudflare websites nearly unusable for them.
Cynicism aside, something seems to be going wrong in our industry.
All give me
"500 Internal Server Error cloudflare.."
So I'm guessing yes.
Then I go to Hacker News to check. Lo and behold, it's Cloudflare. This is sort of worrying...
And they're back before I finished the comment. Such a pity, I was hoping to hog some more Claude for myself through Claude Code.
The Cloudflare status page says that it's the dashboard and Cloudflare APIs that are down. I wonder if the problem is focused on larger sites because they are more dependent on / integrated with Cloudflare APIs. Or perhaps it's only an Enterprise tier feature that's broken.
If it's not everything that is down, I guess things are slightly more resilient than last time?
Imagine how productive we'll be now!
What solutions are there for Multi DNS/CDN failover that don't rely on a single point of failure?
So My guess is yes It´s down.
If it turns out that this was really just random bad luck, it shouldn't affect their reputation (if humans were rational, that is...)
But if it is what many people seem to imply, that this is the outcome of internal problems/cuttings/restructuring/profit-increase etc, then I truly very much hope it affects their reputation.
But I'm afraid it won't. Just like Microsoft continues to push out software, that, compared to competitors, is unstable, insecure, frustrating to use, lacks features, etc, without it harming their reputation or even bottomlines too much. I'm afraid Cloudflare has a de-facto monopoly (technically: big moat) and can get away with offering poorer quality, for increasing pricing by now.
How do they not have better isolation of these issues, or redundancy of some sort?
It turns out so far, there isn't one. Other than contacting the CEO of Cloudflare rather than switching on a temporary mitigation measure to ensure minimal downtime.
Therefore, many engineers at affected companies would have failed their own systems design interviews.
They problem is architectural.
And no staged rollout I assume?
I will repeat it because it's so surreal: React (a frontend JS framework) can now bring down critical Internet infrastructure.
"Cloudflare Dashboard and Cloudflare API service issues"
Investigating - Cloudflare is investigating issues with Cloudflare Dashboard and related APIs.
Customers using the Dashboard / Cloudflare APIs are impacted as requests might fail and/or errors may be displayed. Dec 05, 2025 - 08:56 UTC
(edit: it's working now (detecting downdetector's down))
At this point picking vendors that don't use Cloudflare in any way becomes the right thing to do.
The other companies working at that scale have all sensibly split off into geographical regions & product verticals with redundancy & it's rare that "absolutely all of AWS everywhere is offline". This is two total global outages in as many weeks from Cloudflare, and a third "mostly global outage" the week before.
Representative of having the best developers behind it.
So much for the react being just a frontend library, amirite
Whenever I deploy a new release to my 5 customers, I am pedantic about having a fast rollback.. Maybe I'm not following the apparent industry standard and instead should just wing it.
So it seems like it's just the big ol' "throw this big orange reverse proxy in front of your site for better uptime!" is what's broken...
[0] Workers, Durable Objects, KV, R2, etc
>We are sorry, something went wrong. >Please try refreshing the page in a few minutes. If the problem persists, please visit status.cloud.microsoft for updates regarding known issues.
The status page of course says nothing
Either way it's been interesting to see the bullets I've been dodging.
The issue is the uninformed masses being led to use Windows when they buy a computer. They don't even know how much better a system could work, and so they accept whatever is shoved down their throats.
Eh.... This is _kind_ of a counterfactual, tho. Like, we are not living in the world where MS did not do that. You could argue that MS was in a good place to be the dominant server and mobile OS vendor, and simply screwed both up through poor planning, poor execution, and (particularly in the case of server stuff) a complete disregard for quality as a concept.
I think someone who'd been in a coma since 1999 waking up today would be baffled at how diminished MS is, tbh. In the late 90s, Microsoft practically _was_ computers, with only a bunch of mostly-dying UNIX vendors for competition. And one reasonable lens through which to interpret its current position is that it's basically due to incompetence on Microsoft's part.
I've said to many people/friends that use Cloudflare to look elsewhere. When such a huge percentage of the internet flows through a single provider, and when that provider offers a service that allows them to decrypt all your traffic (if you let them install HTTPS certs for you), not only is that a hugely juicy target for nation-states but the company itself has too much power.
But again, what other companies can offer the insane amount of protection they can?
Our connectivity cloud is the best place to
Over 60 cloud services on one unified platform, uniquely powered by a global cloud network. We call it the connectivity cloud.
Modernize your network and secure your workspace against unauthorized access, web browsing attacks and phishing. Accelerate your journey to Zero Trust with our SASE platform today.
First name: *
Last name: *
Phone: *
Work email: *
Company: *
Choose an option or search here
Job level: *
No options
Select your job level... *
C-Level
VP
Director
Manager
Individual Contributor
Student
Other
Choose an option or search here
Job function: *
No options
Select your job function... *
IT
Security
Network
Infrastructure
Engineering
DevOps
Executive
Product
Finance/ Procurement
Sales / Marketing
Student
Press / Media
Other
Choose an option or search here
Country: *
No options
Select your country...
Afghanistan
Aland Islands
Albania
Algeria
Andorra
Angola
Anguilla
Antarctica
Antigua and Barbuda
Argentina
Armenia
Aruba
Australia
Austria
Azerbaijan
Bahamas
Bahrain
Bangladesh
Barbados
Belarus
Belgium
Belize
Benin
Bermuda
Bhutan
Bolivia, Plurinational State of
Bonaire, Sint Eustatius and Saba
Bosnia and Herzegovina
Botswana
Bouvet Island
Brazil
British Indian Ocean Territory
Brunei Darussalam
Bulgaria
Burkina Faso
Burundi
Cambodia
Cameroon
Canada
Cape Verde
Cayman Islands
Central African Republic
Chad
Chile
China
Christmas Island
Cocos (Keeling) Islands
Colombia
Comoros
Congo, the Democratic Republic of the
Congo
Cook Islands
Costa Rica
Cote d'Ivoire
Croatia
Cuba
Curaçao
Cyprus
Czech Republic
Denmark
Djibouti
Dominica
Dominican Republic
Ecuador
Egypt
El Salvador
Equatorial Guinea
Eritrea
Estonia
Ethiopia
Falkland Islands (Malvinas)
Faroe Islands
Fiji
Finland
France
French Guiana
French Polynesia
French Southern Territories
Gabon
Gambia
Georgia
Germany
Ghana
Gibraltar
Greece
Greenland
Grenada
Guadeloupe
Guatemala
Guernsey
Guinea-Bissau
Guinea
Guyana
Haiti
Heard Island and McDonald Islands
Holy See (Vatican City State)
Honduras
Hong Kong
Hungary
Iceland
India
Indonesia
Iran
Iraq
Ireland
Isle of Man
Israel
Italy
Jamaica
Japan
Jersey
Jordan
Kazakhstan
Kenya
Kiribati
Kuwait
Kyrgyzstan
Lao People's Democratic Republic
Latvia
Lebanon
Lesotho
Liberia
Libya
Liechtenstein
Lithuania
Luxembourg
Macao
Macedonia, the former Yugoslav Republic of
Madagascar
Malawi
Malaysia
Maldives
Mali
Malta
Martinique
Mauritania
Mauritius
Mayotte
Mexico
Moldova, Republic of
Monaco
Mongolia
Montenegro
Montserrat
Morocco
Mozambique
Myanmar
Namibia
Nauru
Nepal
Netherlands
New Caledonia
New Zealand
Nicaragua
Niger
Nigeria
Niue
Norfolk Island
North Korea
Norway
Oman
Pakistan
Palestine
Panama
Papua New Guinea
Paraguay
Peru
Philippines
Pitcairn
Poland
Portugal
Puerto Rico
Qatar
Reunion
Romania
Russian Federation
Rwanda
Saint Barthélemy
Saint Helena, Ascension and Tristan da Cunha
Saint Kitts and Nevis
Saint Lucia
Saint Martin (French part)
Saint Pierre and Miquelon
Saint Vincent and the Grenadines
Samoa
San Marino
Sao Tome and Principe
Saudi Arabia
Senegal
Serbia
Seychelles
Sierra Leone
Singapore
Sint Maarten (Dutch part)
Slovakia
Slovenia
Solomon Islands
Somalia
South Africa
South Georgia and the South Sandwich Islands
South Korea
South Sudan
Spain
Sri Lanka
Sudan
Suriname
Svalbard and Jan Mayen
Swaziland
Sweden
Switzerland
Syria
Taiwan
Tajikistan
Tanzania, United Republic of
Thailand
Timor-Leste
Togo
Tokelau
Tonga
Trinidad and Tobago
Tunisia
Turkey
Turkmenistan
Turks and Caicos Islands
Tuvalu
Uganda
Ukraine
United Arab Emirates
United Kingdom
United States
Uruguay
Uzbekistan
Vanuatu
Venezuela, Bolivarian Republic of
Viet Nam
Virgin Islands, British
Wallis and Futuna
Western Sahara
Yemen
Zambia
Zimbabwe
Use our industry-leading WAF, DDoS, and bot protection to protect your websites, apps, APIs, and AI workloads while accelerating performance with our ultra-fast CDN. Get started in 5 minutes.
Agents are the future of AI, and Cloudflare is the best place to get started. Use our agents framework and orchestration tools to run the models you choose and deliver new agents quickly. Build, deploy, and secure access for remote MCP servers so agents can access the features of your apps.
Only Cloudflare offers an intelligent, global cloud network built from the ground up for security, speed, and reliability.
60+
cloud services available globally
234B
cyber threats blocked each day
20%
of all websites are protected by Cloudflare
330+
cities in 125+ countries, including mainland China

Stop bot attacks in real time by harnessing data from millions of websites protected by Cloudflare.
Build serverless applications and deploy instantly across the globe for speed, reliability, and scale.
Get easy, instant access to Cloudflare security and performance services.
Get a personalized product recommendation for your specific needs.
Have questions or want to get a demo? Get in touch with one of our experts.
Started after the GFC and the mass centralisation of infrastructure
No, I did (metaphorically, for the websites I control). And I did it because otherwise those sites are fully offline or unusable thanks to the modern floods of unfilterable scrapers.
Months of piecemeal mitigations, but Attack Mode is the only thing that worked. Blame the LLM gold rush and the many, many software engineers with no ethics and zero qualms about racing to find the bottom of the Internet.
It also went down multiple times in the past; not to say that's bad, everyone does from time to time.
You haven't actually watched Mad Max, have you? I do recommend it.
Even if you could, having two sets of TLS termination is going to be a pain as well.
P.S. it’s a joke, guys, but you have to admit it’s at least partially what’s happening
>A change made to how Cloudflare's Web Application Firewall parses requests caused Cloudflare's network to be unavailable for several minutes this morning.
>The change was deployed by our team to help mitigate the industry-wide vulnerability disclosed this week in React Server Components.
>We will share more information as we have it today.
It doesn’t look good when similar WAF issues caused their big outage a few years back.
In some cases it is also a valid business decision. If you have 2 hour down time every 5 years, it may not have a significant revenue impact. Most customers think it's too much bother to switch to a competitor anyway, and even if it were simple the competition might not be better. Nobody gets fired for buying IBM
The decision was probably made by someone else who moved on to a different company, so they can blame that person. It's only when down time significantly impacts your future ARR (and bonus) that leadership cares (assuming that someone can even prove that they actually lose customers).
It’s actually fairly easy to know which 3rd party services a SaaS depends on and map these risks. It’s normal due diligence for most companies to do so before contracting a SaaS.
There are many alternatives
Of varying quality depending on the service. Most of the anti-bot/catpcha crap seems to be equivalently obnoxious, but the handful of sites that use PerimeterX… I've basically sworn off DigiKey as a vendor since I keep getting their bullshit "press and hold" nonsense even while logged in.I don't like that we're trending towards a centralized internet, but that's where we are.
https://blog.cloudflare.com/deep-dive-into-cloudflares-sept-...
It's feels noteworthy because React started out frontend-only, but pedantically it's just another backend with a vulnerability.
If communication disappears entirely during an outage, the whole operation suffers. And if that is truly how a company handles incidents, then it is not a practice I would want to rely on. Good operations teams build processes that protect both the system and the people using it. Communication is one of those processes.
There is no quicker way for customers to lose trust in your service than it to be down and for them to not know that you're aware and trying to fix it as quickly as possible. One of the things Cloudflare gets right is the frequent public updates when there's a problem.
You should give someone the responsibility for keeping everyone up to date during an incident. It's a good idea to give that task to someone quite junior - they're not much help during the crisis, and they learn a lot about both the tech and communication by managing it.
"How do you know?"
"I'm holding it!"
software was a mistake
How many awful things in tech can be rationalized away by "sorry, but this is for you/our protection"?
On what? There are lots of CDN providers out there.
If you switch from CF to the next CF competitor, you've not improved this dependency.
The alternative here, is complex or even non-existing. Complex would be some system that allows you to hotswap a CDN, or to have fallback DDOS protection services, or to build you own in-house. Which, IMO, is the worst to do if your business is elsewhere. If you sell, say, petfood online, the dependency-risk that comes with a vendor like CF, quite certainly is less than the investment needed- and risk associted with- building a DDOS protection or CDN on your own; all investment that's not directed to selling more pet-food or get higher margins at doing so.
I guess it's an organizational consequence of mitigating attacks in real time, where rollout delays can be risky as well. But if you're going to do that, it would appear that the code has to be written much more defensively than what they're doing it right now.
They have blameless post mortems, but maybe "We actually do make mistakes so this practice is not good" wasn't a lesson anybody wanted to hear.
Mentioning React Server Components in the status page can be seen as a bad way to shift the blame. Would have been better to not specify which CVE they were trying to patch. The issue is their rollout management, not the Vendor and CVE.
That's not how status pages if implemented correctly work. The real reason status pages aren't updated is SLAs. If you agree on a contract to have 99.99% uptime your status page better reflect that or it invalidates many contracts. This is why AWS also lies about it's uptime and status page.
These services rarely experience outages according their own figures but rather 'degraded performance' or some other language that talks around the issue rather than acknowledging it.
It's like when buying a house you need an independent surveyor not the one offered by the developer/seller to check for problems with foundations or rotting timber.
> Investigating - Cloudflare is investigating issues with Cloudflare Dashboard and related APIs.
> These issues do not affect the serving of cached files via the Cloudflare CDN or other security features at the Cloudflare Edge.
> Customers using the Dashboard / Cloudflare APIs are impacted as requests might fail and/or errors may be displayed.
Reddit was once down for a full day and that month they reported 99.5% uptime instead of 99.99% as they normally claimed for most months.
There is this amazing combination of nonsense going on to achieve these kinds of numbers:
1. Straight up fraudulent information on status page. Reporting incendents as more minor than any internal monitors would claim.
2. If it's working for at least a few percent of customers it's not down. Degraded is not counted.
3. If any part of anything is working then it's not down. For example with the reddit example even if the site was dead as long as the image server is still at 1% functional with some internal ping the status is good.
This one is green: https://downdetectorsdowndetector.com
This one is not openning: https://downdetectorsdowndetectorsdowndetector.com
This one is red: https://downdetectorsdowndetectorsdowndetectorsdowndetector....
Plus most people don't get blamed when AWS (or to a lesser extent Cloudflare) goes down, since everyone knows more than half the world is down, so there's not an urgent motivation to develop multi-vendor capability.
Left alone corporations to rival governments emerge, which are completely unaccountable. At least there is some accountability of governments to the people, depending on your flavour of government.
the problem is, below a certain scale you can't operate anything on the internet these days without hiding behind a WAF/CDN combo... with the cut-off mark being "we can afford a 24/7 ops team". even if you run a small niche forum no one cares about, all it takes is one disgruntled donghead that you ban to ruin the fun - ddos attacks are cheap and easy to get these days.
and on top of that comes the shodan skiddie crowd. some 0day pops up, chances are high someone WILL try it out in less than 60 minutes. hell, look into any web server log, the amount of blind guessing attacks (e.g. /wp-admin/..., /system/login, /user/login) or path traversal attempts is insane.
CDN/WAFs are a natural and inevitable outcome of our governments and regulatory agencies not giving a shit about internet security and punishing bad actors.
They need that same mindset for themselves in config/updates/infra changes but probably easier said than done.
So now a vuln check for a component deployed on, being generous, 1% of servers causes an outage for 30% of the internet.
The argument is dumb.
It's a scheduled maintenance, so SLA should not apply right ?
If the in house tech team breaks something and fixes it, that's great from an engineer point of view - we like to be useful, but the person at the top is blamed.
If an outsourced supplier (one which the consultants recommend, look at Gartner Quadrants etc) fails, then the person at the top is not blamed, even though they are powerless and the outage is 10 times longer and 10 times as frequent.
Outsourcing is not about outcome, it's about accountability, and specifically avoiding it.
bunny.net
fastly.com
gcore.com
keycdn.com
Cloudfront
Probably some more I forgot now. CF is not the only option and definitely not the best option.
> Yeah, now we'll save everyone from DDoS, everything's perfect, we'll speed up your site,
... and host the providers selling DDoS services. https://privacy-pc.com/articles/spy-jacking-the-booters.html
That sums up my gripe with the vocal cloudflare haters. They will tell you all day long to move but every solution they push costs more time and money.
I really don’t buy this requirement to always deploy state changes 100% globally immediately. Why can’t they just roll out to 1%, scaling to 100% over 5 minutes (configurable), with automated health checks and pauses? That will go along way towards reducing the impact of these regressions.
Then if they really think something is so critical that it goes everywhere immediately, then sure set the rollout to start at 100%.
Point is, design the rollout system to give you that flexibility. Routine/non-critical state changes should go through slower ramping rollouts.
Blue/green and temporarily ossify capacity? Regional?
The intent of the postmortems is to learn what the issues are and prevent or mitigate similar issues happening in the future. If you don't make changes as a result of a postmortem then there's no point in conducting them.
Or they could say, "we want to continue to prioritise speed of security rollouts over stability, and despite our best efforts, we do make mistakes, so sometimes we expect things will blow up".
I guess it depends what you're optimising for... If the rollout speed of security patches is the priority then maybe increased downtime is a price worth paying (in their eyes anyway)... I don't agree with that, but at least it's an honest position to take.
That said, if this was to address the React CVE then it was hardly a speedy patch anyway... You'd think they could have afforded to stagger the rollout over a few hours at least.
React seems to think that it was React:
https://react.dev/blog/2025/12/03/critical-security-vulnerab...
Reality is that in an incident, everyone is focused on fixing issue, not updating status pages; automated checks fail or have false positives often too. :/
I'm sure there are gray areas in such contracts but something being down or not is pretty black and white.
Their own website seems down too https://www.cloudflare.com/
--
500 Internal Server Error
cloudflare
> The idea that new code is better than old is patently absurd. Old code has been used. It has been tested. Lots of bugs have been found, and they’ve been fixed. There’s nothing wrong with it. It doesn’t acquire bugs just by sitting around on your hard drive.
> Back to that two page function. Yes, I know, it’s just a simple function to display a window, but it has grown little hairs and stuff on it and nobody knows why. Well, I’ll tell you why: those are bug fixes. One of them fixes that bug that Nancy had when she tried to install the thing on a computer that didn’t have Internet Explorer. Another one fixes that bug that occurs in low memory conditions. Another one fixes that bug that occurred when the file is on a floppy disk and the user yanks out the disk in the middle. That LoadLibrary call is ugly but it makes the code work on old versions of Windows 95.
> Each of these bugs took weeks of real-world usage before they were found. The programmer might have spent a couple of days reproducing the bug in the lab and fixing it. If it’s like a lot of bugs, the fix might be one line of code, or it might even be a couple of characters, but a lot of work and time went into those two characters.
> When you throw away code and start from scratch, you are throwing away all that knowledge. All those collected bug fixes. Years of programming work.
From https://www.joelonsoftware.com/2000/04/06/things-you-should-...
Sometimes the solution is to not let certain people do certain things which are risky.
The compensation is peanuts. $137 off a $10,000 bill for 10 hours of downtime, or 98.68% uptime in a month, is well within the profit margins.
Is it? Say you've got some big geographically distributed service doing some billions of requests per day with a background error rate of 0.0001%, what's your threshold for saying whether the service is up or down? Your error rate might go to 0.0002% because a particular customer has an issue so that customer would say it's down for them, but for all your other customers it would be working as normal.
"Might fail"
Or negligently.
This is so obviously not true that I'm not sure if you're even being serious.
Is the control panel being inaccessible for one region "down"? Is their DNS "down" if the edit API doesn't work, but existing records still get resolved? Is their reverse proxy service "down" if it's still proxying fine, just not caching assets?
.unwrap() literally means “I’m not going to handle the error branch of this result, please crash”.
Needs an ASN and a decent chunk of PI address space, though, so not exactly something a random startup will ever be likely to play with.
Netflix doesn't put in the contract that they will have high-quality shows. (I guess, don't have a contract to read right now.)
Most of the time people will just get by and ignore even full day of downtime as minor inconvenience. Loss of revenue for the day - well you most likely will have to eat that, because going to court and having lawyers fighting over it most likely will cost you as much as just forgetting about it.
If your company goes bankrupt because AWS/Cloudflare/GCP/Azure is down for a day or two - guess what - you won't have money to sue them ¯\_(ツ)_/¯ and most likely will have bunch of more pressing problems on your hand.
1. Accident happens 2. Investigators conclude Accident would not happen if people did X. Recommend regulator requires that people do X, citing previous such recommendations each iteration 3. Regulator declined this recommendation, arguing it's too expensive to do X, or people already do X, or even (hilariously) both 4. Go to 1.
Too often, what happens is that eventually
5. Extremely Famous Accident Happens, e.g. killing loved celebrity Space Cowboy 6. Investigators conclude Accident would not happen if people did X, remind regulator that they have previously recommended requiring X 7. Press finally reads dozens of previous reports and so News Story says: Regulator killed Space Cowboy! 8. Regulator decides actually they always meant to require X after all
Next.JS just happens to be the biggest user of this part of React, but blaming Next.JS is weird...
Having no backup / contingency plan even if any third party system goes down on a time critical service means that you want to risk another disaster around the corner.
In those industries, accepting to wait for them for a "day or two" is not only unacceptable, it isn't even an option.
I'm not blaming anyone. Mostly outlining who was impacted as it's not really related to the front-end parts of the framework that the initial comment was referring to.
On the one hand, you'd like to prevent the thing the regulation is seeking to prevent.
On the other hand, you'd have costs for the regulation to be implemented (one-time and/or ongoing).
"Is the good worth the costs?" is a question worth asking every time. (Not least because sometimes it lets you downscope/target regulations to get better good ROI)
*Yes, the easy pessimistic take is 'industry fights all regulation on cost grounds', but the fact that the argument is abused doesn't mean it doesn't have some underlying merit
There is indeed a good reason regulators aren't just obliged to institute all recommendations - that would be a lot of new rules. The only accident report I remember reading with zero recommendations was a MAIB (Maritime accidents) report here which concluded that a crew member of a fishing boat has died at sea after their vessel capsized because they both they and the skipper (who survived) were on heroin, the rationale for not recommending anything was that heroin is already illegal, operating a fishing boat while on heroin is already illegal, and it's also obviously a bad idea, so, there's nothing to recommend. "Don't do that".
Cost is rarely very persuasive to me, because it's very difficult to correctly estimate what it will actually cost to change something once you decided it's required - based on current reality where it is not. Mass production and clever cost reductions resulting from the normal commercial pressures tend to drive down costs when we require something but not before (and often not after we cease to require it either)
It's also difficult to anticipate all benefits from a good change without trying it. Lobbyists against a regulation will often try hard not to imagine benefits after all they're fighting not to be regulated. But once it's in action, it may be obvious to everyone that this was just a better idea and absurd it wasn't always the case.
Remember when you were allowed to smoke cigarettes on aeroplanes? That seems crazy, but at the time it was normal and I'm sure carriers insisted that not being allowed to do this would cost them money - and perhaps for a short while it did.
For trapping a bad data load it's as simple as:
try {
data = loadDataFile();
} catch (Exception e) {
LOG.error("Failed to load new data file; continuing with old data", e);
}
This kind of code is common in such codebases and it will catch almost any kind of error (except out of memory errors). try {
data = loadDataFile();
} catch (Exception e) {
LOG.error("Failed to load new data file", e);
System.exit(1);
}
So the "bad data load" was trapped, but the programmer decided that either it would never actually occur, or that it is unrecoverable, so it is fine to .unwrap(). It would not be any less idiomatic if, instead of crashing, the programmer decided to implement some kind of recovery mechanism. It is that programmer's fault, and has nothing to do with Rust.Also, if you use general try-catch blocks like that, you don't know if that try-catch block actually needs to be there. Maybe it was needed in the past, but something changed, and it is no longer needed, but it will stay there, because there is no way to know unless you specifically look. Also, you don't even know the exact error types. In Rust, the error type is known in advance.
> It is that programmer's fault, and has nothing to do with Rust.
It's Rust's fault. It provides a function in its standard library that's widely used and which aborts the process. There's nothing like that in the stdlibs of Java or .NET
> Also, if you use general try-catch blocks like that, you don't know if that try-catch block actually needs to be there.
I'm not getting the feeling you've worked on many large codebases in managed languages to be honest? I know you said you did but these patterns and problems you're raising just aren't problems such codebases have. Top level exception handlers are meant to be general, they aren't supposed to be specific to certain kinds of error, they're meant to recover from unpredictable or unknown errors in a general way (e.g. return a 500).
Difficult, but not impossible.
What are calculable and do NOT scale down is cost for compliance documentation and processes. Changing from 1 form of documentation to 4 forms of documentation has measurable cost, that will be imposed forever.
> It's also difficult to anticipate all benefits from a good change without trying it.
That's not a great argument, because it can be counterbalanced by the equally true opposite: it's difficult to anticipate all downsides to a change without trying it.
> Remember when you were allowed to smoke cigarettes on aeroplanes?
Remember when you could walk up to a gate 5 minutes before a flight, buy a ticket, and fly?
The current TSA security theater has had some benefits, but it's also made using airports far worse as a traveler.
In the Rust case you just don’t call unwrap() if you want to swallow errors like that.
It’s also false that catching all exceptions is how you end up with reliable software. In highly available architectures (e.g. many containers managed by kubernetes), if you end up in a state where you can’t complete work at all, it’s better to exit the process immediately to quickly get removed from load balancing groups, etc.
General top level exceptions handlers are a huge code smell because catching exceptions you (by definition) didn’t expect is a great way to have corrupted data.
Unit test, Integration Test, Staging Test, Staging Rollout, Production Test, Canary, Progressive Rollout
Can all be automated can smash through all that quickly with no human intervention.
It is the same as runtime exceptions in Java. In Rust, if you want to have a top-level "exception handler" that catches everything, you can do
::std::panic::catch_unwind(|| {
// ...
})
In case of Cloudflare, the programmer simply chose to not handle the error. It would have been the same if the code was written in Java. There simply would be no top-level try-catch block.The TSA makes no sense as a safety intervention, it's theatre, it's supposed to look like we're trying hard to solve the problem, not be an attempt to solve the problem, and if there was an accident investigation for 9/11 I can't think why, that's not an accident.
As to your specific claim about enforcement, actually we don't even know whether we'd increase paperwork overhead in many cases. Rationalization driven by new regulation can actually reduce this instead.
For a non-regulatory (at least in the sense that there's no government regulators involved) example consider Let's Encrypt's ACME which was discussed here recently. ACME complies with the "Ten Blessed Methods". But prior to Let's Encrypt the most common processes weren't stricter, or more robust, they were much worse and much more labour intensive. Some of them were prohibited more or less immediately when the "Ten Blessed Methods" were required because they're just obviously unacceptable.
The Proof of Control records from ACME are much better than what had been the usual practice prior yet Let's Encrypt is $0 at point of use and even if we count the actual cost (borne by donations rather than subscribers) it's much cheaper than the prior commercial operators had been for much more value delivered.