We found an undocumented bug in the Apollo 11 guidance computer code

For anyone who liked this, I highly suggest you take a look at the CuriousMarc youtube channel, where he chronicles lots of efforts to preserve and understand several parts of the Apollo AGC, with a team of really technically competent and passionate collaborators.

One of the more interesting things they have been working on, is a potential re-interpretation of the infamous 1202 alarm. It is, as of current writing, popularly described as something related to nonsensical readings of a sensor which could (and were) safely ignored in the actual moon landing. However, if I remember correctly, some of their investigation revealed that actually there were many conditions which would cause that error to have been extremely critical and would've likely doomed the astronauts. It is super fascinating.

Still my all time favorite snippet of code.

    TC    BANKCALL    # TEMPORARY, I HOPE HOPE HOPE
    CADR  STOPRATE    # TEMPORARY, I HOPE HOPE HOPE
    TC    DOWNFLAG    # PERMIT X-AXIS OVERRIDE

https://github.com/chrislgarry/Apollo-11/blob/master/Luminar...

Has someone verified this was an actual bug?

One of AI’s strengths is definitely exploration, f.e. in finding bugs, but it still has a high false positive rate. Depending on context that matters or it wont.

Also one has to be aware that there are a lot of bugs that AI won’t find but humans would

I don’t have the expertise to verify this bug actually happened, but I’m curious.

Super interesting. I wish this article wasn’t written by an LLM though. It feels soulless and plastic.

Software that ran on 4KB of memory and got humans to the moon still has undiscovered bugs in it. That says something about the complexity hiding in even the smallest codebases.

> The specs were derived from the code itself

Oh dear. I strongly suggest this author look specification up in a dictionary.

Someone please amend the title and add "using claude code" because that's customary nowadays.

This is so insightfully and powerfully written I had literal chills running down my spine by the end.

What a horrible world we live in where the author of great writing like this has to sit and be accused of "being AI slop" simply because they use grammar and rhetoric well.

Are there any consequences for the Artemis 2 mission (ironic)?

Still my all time favorite snippet of code.

    TC    BANKCALL    # TEMPORARY, I HOPE HOPE HOPE
    CADR  STOPRATE    # TEMPORARY, I HOPE HOPE HOPE
    TC    DOWNFLAG    # PERMIT X-AXIS OVERRIDE

https://github.com/chrislgarry/Apollo-11/blob/master/Luminar...

Are there any consequences for the Artemis 2 mission (ironic)?

And that's why it's harder (or easier?) to make the same landing again -- we taking way less chances. Today we know of way more failure modes than back then.

Has someone verified this was an actual bug?

One of AI’s strengths is definitely exploration, f.e. in finding bugs, but it still has a high false positive rate. Depending on context that matters or it wont.

Also one has to be aware that there are a lot of bugs that AI won’t find but humans would

I don’t have the expertise to verify this bug actually happened, but I’m curious.

It's not even clear if AI was used to find the bug: they mention modeling the software with an "ai native" language, whatever that means. What is not clear is how they found themselves modeling the gyros software of the apollo code to begin with.

But, I do think their explanation of the lock acquisition and the failure scenario is quite clear and compelling.

Super interesting. I wish this article wasn’t written by an LLM though. It feels soulless and plastic.

Software that ran on 4KB of memory and got humans to the moon still has undiscovered bugs in it. That says something about the complexity hiding in even the smallest codebases.

My guess is that in such low memory regimes, program length is very loosely correlated with bug rate.

If anything, if you try to cram a ton of complexity into a few kb of memory, the likelihood of introducing bugs becomes very high.

> The specs were derived from the code itself

Oh dear. I strongly suggest this author look specification up in a dictionary.

It's (what they're describing is) just reverse engineering. That's what reverse engineering is.

Someone please amend the title and add "using claude code" because that's customary nowadays.

Also add "AI can make mistakes". Thank you.

This is so insightfully and powerfully written I had literal chills running down my spine by the end.

What a horrible world we live in where the author of great writing like this has to sit and be accused of "being AI slop" simply because they use grammar and rhetoric well.

I was completely riveted the whole read. The description of Collins' dilemma is the first time I've seen an actual real world scenario described that might cause him to return to Earth alone.

If an LLM wrote that, then I no longer oppose LLM art.

It's not setting off any LLM alarm bells to me. It just reads like any other scientific article, which is very often soulless

For what it’s worth, Pangram thinks this article is fully human-written: https://www.pangram.com/history/f5f68ce9-70ac-4c2b-b0c3-0ca8...

Any specific sections that stick out? Juxt in the past had really great articles, even before LLMs, and know for a fact they don't lack the expertise or knowledge to write for themselves if they wanted and while I haven't completely read this article yet, I'd surprise me if they just let LLMs write articles for them today.

"Written by an LLM" based on what data or symptom?

I'm starting to develop a physiological response when I recognize AI prose. Just like an overwhelming frustration, as if I'm hearing nails on chalkboard silently inside of my head.

Not to single out your comment, but it feels like it's gotten to the point where HN could use a rule against complaining about AI generated content.

It seems like almost every discussion has at least someone complaining about "AI slop" in either the original post or the comments.

You have no evidence that it was.

This is the top reply on a substantial percentage of HN posts now and we should discourage it.

It is:

- sneering

- a shallow dismissal (please address the content)

- curmudgeonly

- a tangential annoyance

All things explicitly discouraged in the site guidelines. [1]

Downvoting is the tool for items that you think don't belong on the front page. We don't need the same comment on every single article.

[1] - https://news.ycombinator.com/newsguidelines.html

I've seen way, way worse. Either someone LLM-polished something they already wrote, or they did their own manual editing pass.

The short sentence construction is the most suspicious, but I actually don't see anything glaring. It normally jumps out and hits me in the face.

And that's why it's harder (or easier?) to make the same landing again -- we taking way less chances. Today we know of way more failure modes than back then.

It's not setting off any LLM alarm bells to me. It just reads like any other scientific article, which is very often soulless

"Written by an LLM" based on what data or symptom?

You have no evidence that it was.

But, I do think their explanation of the lock acquisition and the failure scenario is quite clear and compelling.

They have some spec language and here,

https://github.com/juxt/Apollo-11/tree/master/specs

have many thousands of lines of code in it.

Anyways, it seems it would take a dedicated professional serious work to understand if this bug is real. And considering this looks like an Ad for their business, I would be skeptical.

> It's not even clear if AI was used to find the bug: they mention modeling the software with an "ai native" language, whatever that means.

Could the "AI native language" they used be Apache Drools? The "when" syntax reminded me of it...

https://kie.apache.org/docs/10.0.x/drools/drools/language-re...

(Apache Drools is an open source rule language and interpreter to declaratively formulate and execute rule-based specifications; it easily integrates with Java code.)

>It's not even clear if AI was used to find the bug

It's not even clear you read the article

How did you pick out AI native and miss the rest of the SAME sentence?

> We found this defect by distilling a behavioural specification of the IMU subsystem using Allium, an AI-native behavioural specification language.

> It's not even clear if AI was used to find the bug

The intro says “We used Claude and Allium”. Allium looks like a tool they’ve built for Claude.

So the article is about how they used their AI tooling and workflow to find the bug.

For what it’s worth, Pangram thinks this article is fully human-written: https://www.pangram.com/history/f5f68ce9-70ac-4c2b-b0c3-0ca8...

Then pangram isn't very good, because that article is full of Claude-isms.

Pangram doesn't reliably detect individual LLM-generated phrases or paragraphs among human written text.

It seems to look at sections of ~300 words. And for one section at least it has low confidence.

I tested it by getting ChatGPT to add a paragraph to one of my sister comments. Result is "100% human" when in fact it's only 75% human.

Pangram test result: https://www.pangram.com/history/1ee3ce96-6ae5-4de7-9d91-5846...

ChatGPT session where it added a paragraph that Pangram misses: https://chatgpt.com/share/69d4faff-1e18-8329-84fa-6c86fc8258...

The AI writing detectors are very unreliable. This is important to mention because they can trigger in the opposite direction (reporting human written text as AI generated) which can result in false accusations.

It’s becoming a problem in schools as teachers start accusing students of cheating based on these detectors or ignore obvious signs of AI use because the detectors don’t trigger on it.

So you're saying Pangram isn't worth much?

Here's one tell-tale of many: "No alarm, no program light."

Another one: "Two instructions are missing: [...] Four bytes."

One more: "The defensive coding hid the problem, but it didn’t eliminate it."

Not to single out your comment, but it feels like it's gotten to the point where HN could use a rule against complaining about AI generated content.

It seems like almost every discussion has at least someone complaining about "AI slop" in either the original post or the comments.

I disagree. I like to read articles and explore Show HN posts, but in the past 6 months I’ve wasted a lot of time following HN links that looked interesting but turned out to be AI slop. Several Show HN posts lately have taken me to repos that were AI generated plagiarisms of other projects, presented on HN as their own original ideas.

Seeing comments warning about the AI content of a link is helpful to let others know what they’re getting into when they click the link.

For this article the accusations are not about slop (which will waste your time) but about tell-tell signs of AI tone. The content is interesting but you know someone has been doing heavy AI polishing, which gives articles a laborious tone and has a tendency to produce a lot of words around a smaller amount of content (in other words, you’re reading an AI expansion of someone’s smaller prompt, which contained the original info you’re interested in)

Being able to share this information is important when discussing links. I find it much more helpful than the comments that appear criticizing color schemes, font choices, or that the page doesn’t work with JavaScript disabled.

You're suggesting this is the complainant's fault?

Stop voting up slop articles and I'll stop commenting on it.

HN has gotten to the point where it’s not even worth clicking the link because of course it’s ai slop.

There is some real content in the haystack, but we almost need some kind of curator to find and display it rather than a vote system where most people vote on the title alone.

I'm starting to develop a physiological response when I recognize AI prose. Just like an overwhelming frustration, as if I'm hearing nails on chalkboard silently inside of my head.

I feel ya.... and i have to admit in the past i tried it for one article in my own blog thinking it might help me to express... tho when i read that post now i dont even like it myself its just not my tone.

therefor decided not gonne use any llm for blogging again and even tho it takes alot more time without (im not a very motivated writer) i prefer to release something that i did rather some llm stuff that i wouldnt read myself.

This is the top reply on a substantial percentage of HN posts now and we should discourage it.

It is:

- sneering

- a shallow dismissal (please address the content)

- curmudgeonly

- a tangential annoyance

All things explicitly discouraged in the site guidelines. [1]

Downvoting is the tool for items that you think don't belong on the front page. We don't need the same comment on every single article.

[1] - https://news.ycombinator.com/newsguidelines.html

It's not a shallow dismissal; it's a dismissal for good reason. It's tangential to the topic, but not to HN overall. It's only curmudgeonly if you assume AI-written posts are the inevitable and good future (aka begging the question). I really don't know how it's "sneering", so I won't address that.

The site guidelines were written pre-AI and stop making sense when you add AI-generated content into the equation.

Consider that by submitting AI generated content for humans to read, the statement you're making is "I did not consider this worth my time to write, but I believe it's worth your time to read, because your time is worth less than mine". It's an inherently arrogant and unbalanced exchange.

> Downvoting is the tool for items that you think don't belong on the front page.

You can’t downvote submissions. That’s literally not a feature of the site. You can only flag submissions, if you have more that 31 karma.

No idea why you're being downvoted. I've done my bit to redress the balance, I hope others do the same.

> It's not even clear if AI was used to find the bug: they mention modeling the software with an "ai native" language, whatever that means.

Could the "AI native language" they used be Apache Drools? The "when" syntax reminded me of it...

https://kie.apache.org/docs/10.0.x/drools/drools/language-re...

(Apache Drools is an open source rule language and interpreter to declaratively formulate and execute rule-based specifications; it easily integrates with Java code.)

They have some spec language and here,

https://github.com/juxt/Apollo-11/tree/master/specs

have many thousands of lines of code in it.

Anyways, it seems it would take a dedicated professional serious work to understand if this bug is real. And considering this looks like an Ad for their business, I would be skeptical.

How did you pick out AI native and miss the rest of the SAME sentence?

> We found this defect by distilling a behavioural specification of the IMU subsystem using Allium, an AI-native behavioural specification language.

> It's not even clear if AI was used to find the bug

The intro says “We used Claude and Allium”. Allium looks like a tool they’ve built for Claude.

So the article is about how they used their AI tooling and workflow to find the bug.

It’s becoming a problem in schools as teachers start accusing students of cheating based on these detectors or ignore obvious signs of AI use because the detectors don’t trigger on it.

So you're saying Pangram isn't worth much?

It's (what they're describing is) just reverse engineering. That's what reverse engineering is.

Also add "AI can make mistakes". Thank you.

>It's not even clear if AI was used to find the bug

It's not even clear you read the article

Even worse, the other child comments are speculating (and didn't RTFA either) when the answer is clear in the article.

> We found this defect by distilling a behavioural specification of the IMU subsystem using Allium, an AI-native behavioural specification language.

Then pangram isn't very good, because that article is full of Claude-isms.

Pangram has a very low false positive rate, but not the best false negative rate: https://www.pangram.com/blog/third-party-pangram-evals

> because that article is full of Claude-isms

Not sure how I feel about the whole "LLMs learned from human texts, so now the people who helped write human texts are suddenly accused of plagiarizing LLMs" thing yet, but seems backwards so far and like a low quality criticism.

Is it possible for a tool to know if something is AI written with high confidence at all? LLMs can be tuned/instructed to write in an infinite number of styles.

Don't understand how these tools exist.

It has Claude-isms, but it doesn't feel very Claude-written to me, at least not entirely.

What's making it even more difficult to tell now is people who use AI a lot seem to be actively picking up some of its vocab and writing style quirks.

You sound like a flat earther and a moon landing denier combined.

Pangram doesn't reliably detect individual LLM-generated phrases or paragraphs among human written text.

It seems to look at sections of ~300 words. And for one section at least it has low confidence.

I tested it by getting ChatGPT to add a paragraph to one of my sister comments. Result is "100% human" when in fact it's only 75% human.

Pangram test result: https://www.pangram.com/history/1ee3ce96-6ae5-4de7-9d91-5846...

ChatGPT session where it added a paragraph that Pangram misses: https://chatgpt.com/share/69d4faff-1e18-8329-84fa-6c86fc8258...

This is useful, thanks! TIL

Here's one tell-tale of many: "No alarm, no program light."

Another one: "Two instructions are missing: [...] Four bytes."

One more: "The defensive coding hid the problem, but it didn’t eliminate it."

That's just writing. I frequently write like that.

This insistence that certain stylistics patterns are "tell-tale" signs that an article was written by AI makes no sense, particularly when you consider that whatever stylistic ticks an LLM may possess are a result of it being trained on human writing.

This is my exact writing style - I'm screwed.

This is just writing; terse maybe and maybe not grammatically correct, but people write like that.

I've seen way, way worse. Either someone LLM-polished something they already wrote, or they did their own manual editing pass.

The short sentence construction is the most suspicious, but I actually don't see anything glaring. It normally jumps out and hits me in the face.

>Hemingway's 4 Fast Rules For Effective Writing

1. Use Short Sentences

https://www.wordsthatsing.com.au/post/hemingway-rules

My guess is that in such low memory regimes, program length is very loosely correlated with bug rate.

If anything, if you try to cram a ton of complexity into a few kb of memory, the likelihood of introducing bugs becomes very high.

Yet here we are compounding the issues by adding more and more layers to these systems... The higher the level it becomes the more security risks we take.

I was completely riveted the whole read. The description of Collins' dilemma is the first time I've seen an actual real world scenario described that might cause him to return to Earth alone.

If an LLM wrote that, then I no longer oppose LLM art.

Pangram has a very low false positive rate, but not the best false negative rate: https://www.pangram.com/blog/third-party-pangram-evals

It has Claude-isms, but it doesn't feel very Claude-written to me, at least not entirely.

What's making it even more difficult to tell now is people who use AI a lot seem to be actively picking up some of its vocab and writing style quirks.

This is useful, thanks! TIL

You sound like a flat earther and a moon landing denier combined.

The site guidelines were written pre-AI and stop making sense when you add AI-generated content into the equation.

No idea why you're being downvoted. I've done my bit to redress the balance, I hope others do the same.

>Hemingway's 4 Fast Rules For Effective Writing

1. Use Short Sentences

https://www.wordsthatsing.com.au/post/hemingway-rules

Yet here we are compounding the issues by adding more and more layers to these systems... The higher the level it becomes the more security risks we take.

I thought that was the least likeable part of the article. They speculated wildly, somehow making the leap that a trained astronaut would not resort to a computer reset if the problems persisted to weave the narrative that this bug was super-duper-serious indeed. They didn't need that and it weakened the presentation.

Even worse, the other child comments are speculating (and didn't RTFA either) when the answer is clear in the article.

> We found this defect by distilling a behavioural specification of the IMU subsystem using Allium, an AI-native behavioural specification language.

> distilling

A.k.a. as fabricating. No wonder they chose to use "AI".

> because that article is full of Claude-isms

Real talk. You're not just making a good point -- you're questioning the dominant paradigm

I'm sure some human writers would write:

> The specification forces this question on every path through the IMU mode-switching code. A reviewer examining BADEND would see correct, complete cleanup for every resource BADEND was designed to handle.

> The specification approaches from the other direction: starting from LGYRO and asking whether any paths fail to clear it.

> *Tests verify the code as written; a behavioural specification asks what the code is for.*

However this is a blog post about using Claude for XYZ, from an AI company whose tagline is

"AI-assisted engineering that unlocks your organization's potential"

Do you really think they spent the time required to actually write a good article by hand? My guess is that they are unlocking their own organizations potential by having Claude writes the posts.

Is it possible for a tool to know if something is AI written with high confidence at all? LLMs can be tuned/instructed to write in an infinite number of styles.

Don't understand how these tools exist.

The WikiEDU project has some thoughts on this. They found Pangram good enough to detect LLM usage while teaching editors to make their first Wikipedia edits, at least enough to intervene and nudge the student. They didn’t use it punatively or expect authoritative results however. https://wikiedu.org/blog/2026/01/29/generative-ai-and-wikipe...

They found that Pangram suffers from false positives in non-prose contexts like bibliographies, outlines, formatting, etc. The article does not touch on Pangram’s false negatives.

I personally think it’s an intractable problem, but I do feel pangram gives some useful signal, albeit not reliably.

This is my exact writing style - I'm screwed.

I doubt you write like that. Where can I find your writing other than your comments which IMO don't read like the blog post?

That's just writing. I frequently write like that.

These are just some of the good examples I found.

My hunch that this is substantially LLM-generated is based on more than that.

In my head it's like a Bayesian classifier, you look at all the sentences and judge whether each is more or less likely to be LLM vs human generated. Then you add prior information like that the author did the research using Claude - which increases the likelihood that they also use Claude for writing.

Maybe your detector just isn't so sensitive (yet) or maybe I'm wrong but I have pretty high confidence at least 10% of sentences were LLM-generated.

Yes, the stylistic patterns exist in human speech but RLHF has increased their frequency. Also, LLM writing has a certain monotonicity that human writing often lacks. Which is not surprising: the machine generates more or less the most likely text in an algorithmic manner. Humans don't. They wrote a few sentences, then get a coffee, sleep, write a few more. That creates more variety than an LLM can.

Fun exercise: https://en.wikipedia.org/wiki/Wikipedia:AI_or_not_quiz

I am reminded of the Simpsons episode in which Principal Skinner tries to pass off the hamburgers from a near-by fast food restaurant for an old family recipe, 'steamed hams,' and his guest's probing into the kitchen mishaps is met with increasingly incredible explanations.

I’m so glad the witch hunt has moved on to phrasing so I get less grief for my em dashes.

See also: “I'm Kenyan. I Don't Write Like ChatGPT. ChatGPT Writes Like Me” by Marcus Olang', https://marcusolang.substack.com/p/im-kenyan-i-dont-write-li...

For what it’s worth, Pangram reports that Marcus’ article is 100% LLM-written: https://www.pangram.com/history/640288b9-e16b-4f76-a730-8000...

I hate that I can’t write em dashes freely anymore without people accusing the writing of being AI generated.

Even though they are perfect for usage in writing down thoughts and notes.

Seeing comments warning about the AI content of a link is helpful to let others know what they’re getting into when they click the link.

> you’re reading an AI expansion of someone’s smaller prompt, which contained the original info you’re interested in

This got me thinking: what if LLMs are used to do the opposite? To condense a long prompt into a short article? That takes more work but might make the outcome more enjoyable as it contains more information.

You're suggesting this is the complainant's fault?

Yes, because all of them are now irrational about the possibility of LLM writing something they read.

Stop voting up slop articles and I'll stop commenting on it.

HN has gotten to the point where it’s not even worth clicking the link because of course it’s ai slop.

There is some real content in the haystack, but we almost need some kind of curator to find and display it rather than a vote system where most people vote on the title alone.

> Downvoting is the tool for items that you think don't belong on the front page.

You can’t downvote submissions. That’s literally not a feature of the site. You can only flag submissions, if you have more that 31 karma.

> distilling

A.k.a. as fabricating. No wonder they chose to use "AI".

They found that Pangram suffers from false positives in non-prose contexts like bibliographies, outlines, formatting, etc. The article does not touch on Pangram’s false negatives.

I personally think it’s an intractable problem, but I do feel pangram gives some useful signal, albeit not reliably.

I’m so glad the witch hunt has moved on to phrasing so I get less grief for my em dashes.

> you’re reading an AI expansion of someone’s smaller prompt, which contained the original info you’re interested in

Yes, because all of them are now irrational about the possibility of LLM writing something they read.

If you’re looking for a place that surfaces only human-written content regardless of whether it’s interesting, rather than interesting content regardless of how it was written, HN is not the place.

There might be a market for your alternative though. Should be easy enough to build with Claude Code.

It’s a dismissal with no evidence i.e. it’s a witch hunt. And no one should support that.

And flagging is appropriate when you think content is not authentic

Twelve year old account and who knows how much lurking before that and I've never noticed this. Good lord.

Optimistically, I guess I can call myself some sort of live-and-let-live person.

I didn't say they're dispositive. I said they're suspicious. Most people don't write effectively.

Real talk. You're not just making a good point -- you're questioning the dominant paradigm

I'm sure some human writers would write:

> The specification approaches from the other direction: starting from LGYRO and asking whether any paths fail to clear it.

> *Tests verify the code as written; a behavioural specification asks what the code is for.*

However this is a blog post about using Claude for XYZ, from an AI company whose tagline is

"AI-assisted engineering that unlocks your organization's potential"

Do you really think they spent the time required to actually write a good article by hand? My guess is that they are unlocking their own organizations potential by having Claude writes the posts.

Your guess is worth what you paid for it.

> Do you really think they spent the time required to actually write a good article by hand?

Given I'm familiar with Juxt since before, used plenty of their Clojure libraries in the past and hanged out with people from Juxt even before LLMs were a thing, yes, I do think they could have spent the time required to both research and write articles like these. Again, won't claim for sure I know how they wrote this specific article, but I'm familiar with Juxt enough to feel relatively confident they could write it.

Juxt is more of a consultancy shop than "AI company", not sure where you got that from, guess their landing page isn't 100% clear what they actually does, but they're at least prominent in the Clojure ecosystem and has been for a decade if not more.

These are just some of the good examples I found.

My hunch that this is substantially LLM-generated is based on more than that.

Maybe your detector just isn't so sensitive (yet) or maybe I'm wrong but I have pretty high confidence at least 10% of sentences were LLM-generated.

Fun exercise: https://en.wikipedia.org/wiki/Wikipedia:AI_or_not_quiz

Here's an alternative way of thinking about this...

Someone probably expended a lot of time and effort planning, thinking about, and writing an interesting article, and then you stroll by and casually accuse them of being a bone idle cheat, with no supporting evidence other than your "sensitive detector" and a bunch of hand-wavy nonsense that adds up to naught.

Those aren’t good examples - that’s just LLMs living for free in your head.

I doubt you write like that. Where can I find your writing other than your comments which IMO don't read like the blog post?

See also: “I'm Kenyan. I Don't Write Like ChatGPT. ChatGPT Writes Like Me” by Marcus Olang', https://marcusolang.substack.com/p/im-kenyan-i-dont-write-li...

For what it’s worth, Pangram reports that Marcus’ article is 100% LLM-written: https://www.pangram.com/history/640288b9-e16b-4f76-a730-8000...

In theory, wouldn't be too hard be to settle the question if whether he used ChatGPT to write it: get Olang to write a few paragraphs by hand, then have people judge (blindly) if it's the same style as the article. Which one sounds more like ChatGPT.

I hate that I can’t write em dashes freely anymore without people accusing the writing of being AI generated.

Even though they are perfect for usage in writing down thoughts and notes.

Why do you care what others accuse you of?

One thing you can try⸺admittedly it's not quite correct⸺is replacing them with a two-em dash. I've never seen an AI use one, and it looks pretty funky.

I have nothing against em dashes. As long as your writing is human, experienced readers will be able to tell it's human. Only less experienced ones will use all or nothing rules. Em dashes just increase the likelihood that the text was LLM generated. They aren't proof.

This is just writing; terse maybe and maybe not grammatically correct, but people write like that.

It's not just terseness, it's the rhythm and "it's not x, it's y".

In fact, the latter is the opposite of terseness. LLMs love to tell you what things are not way more than people do.

See https://www.blakestockton.com/dont-write-like-ai-1-101-negat...

(The irony that I started with "it's not just" isn't lost on me)

> This got me thinking: what if LLMs are used to do the opposite? To condense a long prompt into a short article? That takes more work but might make the outcome more enjoyable as it contains more information.

You're fighting an uphill battle against the inherent tendency to produce more and longer text. There's also the regression to the mean problem, so you get less information (and more generic) even though the text is shorter.

Basically, it doesn't work

It’s a dismissal with no evidence i.e. it’s a witch hunt. And no one should support that.

And flagging is appropriate when you think content is not authentic

Twelve year old account and who knows how much lurking before that and I've never noticed this. Good lord.

Optimistically, I guess I can call myself some sort of live-and-let-live person.

Basically, it doesn't work

It's not just terseness, it's the rhythm and "it's not x, it's y".

In fact, the latter is the opposite of terseness. LLMs love to tell you what things are not way more than people do.

See https://www.blakestockton.com/dont-write-like-ai-1-101-negat...

(The irony that I started with "it's not just" isn't lost on me)

> (The irony that I started with "it's not just" isn't lost on me)

But an LLM wouldn't write "It's not just X, it's the Y and Z". No disrespect to your writing intended, but adding that extra clause adds just the slightest bit of natural slack to the flow of the sentence, whereas everything LLMs generate comes out like marketing copy that's trying to be as punchy and cloying as possible at all times.

There might be a market for your alternative though. Should be easy enough to build with Claude Code.

If the content was interesting, the author would've written about it himself.

By asking AI to write the article for you, you're asserting that the subject matter is not interesting enough to be worth your time to write, so why would it be worth my time to read?

I didn't say they're dispositive. I said they're suspicious. Most people don't write effectively.

Your guess is worth what you paid for it.

> Do you really think they spent the time required to actually write a good article by hand?

Those aren’t good examples - that’s just LLMs living for free in your head.

Why do you care what others accuse you of?

One thing you can try⸺admittedly it's not quite correct⸺is replacing them with a two-em dash. I've never seen an AI use one, and it looks pretty funky.

> (The irony that I started with "it's not just" isn't lost on me)

If the content was interesting, the author would've written about it himself.

By asking AI to write the article for you, you're asserting that the subject matter is not interesting enough to be worth your time to write, so why would it be worth my time to read?

You just need AI to read it for you and summarise back in to the original prompt.

So LLMs write effectively and when people do you accuse them of using an LLM?

Here's an alternative way of thinking about this...

To start, this is more or less an advertising piece for their product. It's pretty clear that they want to sell you Allium. And that's fine! They are allowed! But even if that was written by a human, they were compensated for it. They didn't expend lots of effort and thinking, it's their job.

More importantly, it's an article about using Claude from a company about using Claude. I think on the balance it's very likely that they would use Claude to write their technical blog posts.

While I agree with the sentiment, using AI to write the final draft of the article isn’t cheating. People may not like it, but it’s more a stylistic preference.

Yet another way the mere possibility of AI/LLM being involved diminishes the value of ALL text.

If there is constant vigilance on the part of the reader as to how it was created, meaning and value become secondary, a sure path to the death of reading as a joy.

The times I've written articles, and those have gone through multiple rounds of reviews (by humans) with countless edits each time, before it ends up being published, I wonder if I'd pass that test in those cases. Initial drafts with my scattered thoughts usually are very different from the published end results, even without involving multiple reviewers and editors.

That nuance is lost on the majority of anti-AI folks who’ve learned they get positive social reactions by declaring essentially everything to be AI written and condemnable.

“An em dash… they’re a witch!”… “it’s not just X, it’s Y… they’re a witch!”

Evidently there are no experienced readers who post AI accusations.

So LLMs write effectively and when people do you accuse them of using an LLM?

Yet another way the mere possibility of AI/LLM being involved diminishes the value of ALL text.

If there is constant vigilance on the part of the reader as to how it was created, meaning and value become secondary, a sure path to the death of reading as a joy.

While I agree with the sentiment, using AI to write the final draft of the article isn’t cheating. People may not like it, but it’s more a stylistic preference.

Evidently there are no experienced readers who post AI accusations.

You just need AI to read it for you and summarise back in to the original prompt.

More importantly, it's an article about using Claude from a company about using Claude. I think on the balance it's very likely that they would use Claude to write their technical blog posts.

> They didn't expend lots of effort and thinking, it's their job.

Your job doesn't require you to think or expend effort?

That nuance is lost on the majority of anti-AI folks who’ve learned they get positive social reactions by declaring essentially everything to be AI written and condemnable.

“An em dash… they’re a witch!”… “it’s not just X, it’s Y… they’re a witch!”

> anti-AI folks who’ve learned they get positive social reactions by declaring essentially everything to be AI written and condemnable.

that's a strawman alright; all the comments complaining how they can't use their writing style without being ganged up on are positive karma from my angle, so I'm not sure the "positive social reactions" are really aligned with your imagination. Or does it only count when it aligns with your persecution complex?

> They didn't expend lots of effort and thinking, it's their job.

Your job doesn't require you to think or expend effort?

The Apollo Guidance Computer (AGC) is one of the most scrutinised codebases in history. Thousands of developers have read it. Academics have published papers on its reliability. Emulators run it instruction by instruction. We found a bug in it that had been missed for fifty-seven years: a resource lock in the gyro control code that leaks on an error path, silently disabling the guidance platform's ability to realign.

We used Claude and Allium, our open-source behavioural specification language, to distil 130,000 lines of AGC assembly into 12,500 lines of specs. The specs were derived from the code itself, and the process signposted us directly to the defect.

Charting the code

The source code has been publicly available since 2003, when Ron Burkey and a team of volunteers began painstakingly transcribing it by hand from printed listings at the MIT Instrumentation Laboratory. In 2016, former NASA intern Chris Garry’s GitHub repository went viral, topping the trending page. Thousands of developers scrolled through the assembly language of a machine with 2K of erasable RAM and a 1MHz clock.

The AGC’s programs were stored in 74KB of core rope: copper wire threaded by hand through tiny magnetic cores in a factory (a wire passing through a core was a 1; a wire bypassing it was a 0). The women who wove it were known internally as the “Little Old Ladies”, and the memory itself was called LOL memory. The program was physically woven into the hardware. Ken Shirriff has analysed it down to individual gates, and the Virtual AGC project runs the software in emulation, having confirmed the recovered source byte-for-byte against the original core rope dumps.

As far as we can determine, no formal verification, model checking or static analysis has been published against the flight code. The scrutiny has been deep, but it has been a particular kind of scrutiny: reading the code, emulating the code, verifying the transcription.

We took a different approach. We used Allium to distil a behavioural specification from the Inertial Measurement Unit (IMU) subsystem, the gyroscope-based platform that tells the spacecraft which way it is pointing. The specification models the lifecycle of every shared resource: when it is acquired, when it must be released, and on which paths.

It surfaced a flaw that reading and emulation had missed.

Losing the reference

The AGC manages the IMU through a shared resource lock called LGYRO. When the computer needs to torque the gyroscopes (to correct platform drift or perform a star alignment), it acquires LGYRO at the start and releases it when all three axes have been torqued. The lock prevents two routines from fighting over the gyro hardware at the same time.

The lock is acquired on the way in and released on the way out. But there is a third possibility, and it doesn’t release the lock.

‘Caging’ is an emergency measure: a physical clamp that locks the IMU’s gimbals in place to protect the gyroscopes from damage. The crew could trigger it with a guarded switch in the cockpit.

When the torque completes normally, the routine exits via STRTGYR2 and the LGYRO lock is cleared. When the IMU is caged while a torque is in progress, the code exits via a routine called BADEND, which does not clear the lock. Two instructions are missing:

        CAF  ZERO
        TS   LGYRO

Four bytes.

Once LGYRO is stuck, every subsequent attempt to torque the gyros finds the lock held, sleeps waiting for a wake signal that will never come, and hangs. Fine alignment, drift compensation, manual gyro torque: all blocked.

On 21 July 1969, while Neil Armstrong and Buzz Aldrin walked on the lunar surface, Michael Collins orbited alone in the Command Module Columbia. Every two hours he disappeared behind the Moon, out of radio contact with Earth. “I am alone now, truly alone, and absolutely isolated from any known life. I am it,” he wrote in Carrying the Fire. “If a count were taken, the score would be three billion plus two over on the other side of the moon, and one plus God knows what on this side.”

During each pass he ran Program 52, a star-sighting alignment that kept the guidance platform pointed in the right direction. If the platform drifted, the engine burn to bring him home would point the wrong way.

Radio silence

Here’s how the bug might have manifested.

Collins has just finished his star sightings at the optics station in the lower equipment bay and keyed in the final commands. The computer is torquing the gyroscopes to apply the correction across all three axes.

He moves back toward the main panel in a cramped cockpit, past a cage switch protected by a flip-up cover. An elbow catches the cover and nudges the switch. The code handles this gracefully: a routine called CAGETEST detects the cage, abandons the torque and exits. The P52 fails, and he understands why: the cage interrupted the correction. He uncages the IMU and heads back to the optics station to realign.

He starts a new P52. The program hangs.

No alarm, no program light. The DSKY (display and keyboard, his only interface to the computer) accepts the input and does nothing. He tries V41, the manual gyro torque verb. Same result. Everything else on the computer works. Only gyro operations are dead.

The first failure looked normal: a cage event during alignment, with a known recovery. The second gives no clue what is wrong. The trained response to an accidental cage is to uncage and realign. Collins had been trained to restart the computer, but nothing about this failure would suggest he needed to. Commands were accepted, everything else worked. It would look like faulty hardware, not a stuck lock.

“My secret terror for the last six months has been leaving them on the Moon and returning to Earth alone”, Collins later wrote of the rendezvous. A dead gyro system behind the Moon, with Armstrong and Aldrin on the surface waiting for a rendezvous burn that depends on a platform he can no longer align, is exactly that scenario.

A hard reset would have cleared it. But the 1202 alarms during the lunar descent had been stressful enough with Mission Control on the line and Steve Bales making a snap abort-or-continue call.

Behind the Moon, alone, with a computer that was accepting commands and doing nothing, Collins would have had to make that call by himself.

The known landmarks

Margaret Hamilton (as “rope mother” for LUMINARY) approved the final flight programs before they were woven into core rope memory. Her team at the MIT Instrumentation Laboratory pioneered concepts we now take for granted: priority scheduling, asynchronous multitasking, restart protection and software-based error recovery. Even the term ‘software engineering’ is hers.

Their priority scheduling saved the Apollo 11 landing when the 1202 alarms fired during descent, shedding low-priority tasks under load exactly as designed. Most modern systems don’t handle overload that gracefully.

The most serious bugs that did surface were specification errors, not coding mistakes. Don Eyles, who wrote the lunar landing guidance code, documented several. For example, the ICD for the rendezvous radar specified that two 800 Hz power supplies would be frequency-locked but said nothing about phase synchronisation. The resulting phase drift made the antenna appear to dither, generating roughly 6,400 spurious interrupts per second per angle and consuming roughly 13% of the computer’s capacity during Apollo 11’s descent. This was the underlying cause of the 1202 alarms.

This defect has the same shape. BADEND is a general-purpose termination routine shared by all IMU mode-switching operations. It clears MODECADR (the stall register), wakes sleeping jobs, and exits. But LGYRO is a gyro-specific lock, acquired only by the pulse-torquing code and released only by the normal completion path in STRTGYR2. When the error path routes through BADEND, it handles the general resources correctly, but not the gyro-specific lock.

The AGC was written so defensively that latent faults like this would be silently corrected by the restart logic, which clears LGYRO as a side effect of full erasable-memory initialisation. Any test that happened to trigger a restart after the bug would see the system recover seamlessly.

The defensive coding hid the problem, but it didn’t eliminate it. A cage event without a subsequent restart would still leave the gyros locked. Collins would have no way to realign the guidance platform and no diagnostic clue pointing to the fix.

Star sighting

We found this defect by distilling a behavioural specification of the IMU subsystem using Allium, an AI-native behavioural specification language. The specification models each shared resource as an entity with a lifecycle: acquired, held, released.

The IMU entity declares a gyros_busy field modelling LGYRO. Two rules govern it:

rule GyroTorque {
    -- Sends gyro torquing pulse commands. Reserves the gyros,
    -- enables power supply, and dispatches pulses per axis.
    when: GyroTorque(command: GyroTorqueCommand)

    requires:
        imu.mode != caged
        imu.gyros_busy = false

    ensures:
        imu.gyros_busy = true
        GyroTorqueStarted()
}

rule GyroTorqueBusy {
    -- Gyros already reserved by another torquing operation.
    -- Caller sleeps until LGYRO is cleared.
    when: GyroTorque(command: GyroTorqueCommand)

    requires: imu.gyros_busy = true

    ensures:
        JobSleep(job: calling_job())
}

GyroTorque requires gyros_busy = false and ensures gyros_busy = true: the lock is acquired. Somewhere, on every path that follows, the lock must be released. The spec doesn’t show where in the code the release happens, but it makes the obligation explicit: if gyros_busy goes to true, something must set it back to false.

With that obligation written down, Claude traced every path that runs after gyros_busy is set to true. The normal completion path (STRTGYR2) clears it. The cage-interrupted path (BADEND) does not. MODECADR, the other shared resource, is correctly cleared in BADEND: LGYRO is missing.

The specification forces this question on every path through the IMU mode-switching code. A reviewer examining BADEND would see correct, complete cleanup for every resource BADEND was designed to handle.

The specification approaches from the other direction: starting from LGYRO and asking whether any paths fail to clear it.

Tests verify the code as written; a behavioural specification asks what the code is for.

A specification distilled by Allium models resource lifecycles across all paths, including the ones nobody thought to test. You can view the Allium specifications and reproduction of the bug on GitHub.

Course correction

Hamilton’s team released resources by loading the constant zero into the accumulator (CAF ZERO) and storing it into the lock register (TS LGYRO). Every release placed manually, by a programmer who remembered every path that could reach that point.

Modern languages have tried to make lock leaks structurally impossible: Go has defer, Java has try-with-resources, Python has with, Rust’s ownership system makes lock leaks a compile-time error.

Nevertheless, lock leaks persist. MITRE classifies the pattern as CWE-772: “Missing Release of Resource after Effective Lifetime”, and rates its likelihood of exploitation as high. Not all resources are managed by a language runtime. Database connections, distributed locks, file handles in shell scripts, infrastructure that must be torn down in the right order: these are still often the programmer’s responsibility. Anywhere the programmer is responsible for writing the cleanup, the same bug is waiting.

Every Apollo crew came home safely. But the IMU mode-switching routines were carried forward across missions in both the Command Module software (COMANCHE) and the Lunar Module software (LUMINARY). The fault was never noticed and never fixed.

A fifty-seven-year-old bug hid in flight-proven assembly. What’s hiding in yours? Let’s talk.

Thanks to Farzad “Fuzz” Pezeshkpour for independently reproducing the bug, and to Danny Smith and Prashant Gandhi for reviewing early drafts of this article.

> anti-AI folks who’ve learned they get positive social reactions by declaring essentially everything to be AI written and condemnable.

You have the same problem apparently. You think it’s okay to go witch hunting and accuse people with no real evidence.

Hacker Times

Hacker Times

We found an undocumented bug in the Apollo 11 guidance computer code

Discussion

Discussion

Charting the code

Losing the reference

Radio silence

The known landmarks

Star sighting

Course correction