It turns out literate programming is useful for a lot more than just programming!
You can change the code by changing either tests or production code, and letting the other follow.
Code reviews are a breeze because if you’re confused by the production code, the test code often holds an explanation - and vice versa. So just switch from one to the other as needed.
Lots of benefits. The downside is how much extra code you end up with of course - up to you if the gains in readability make up for it.
https://podlite.org is this done in a language neutral way perl, JS/TS and raku for now.
Heres an example:
#!/usr/bin/env raku
=begin pod
=head1 NAME
Stats::Simple - Simple statistical utilities written in Raku
=head1 SYNOPSIS
use Stats::Simple;
my @numbers = 10, 20, 30, 40;
say mean(@numbers); # 25
say median(@numbers); # 25
=head1 DESCRIPTION
This module provides a few simple statistical helper functions
such as mean and median. It is meant as a small example showing
how Rakudoc documentation can be embedded directly inside Raku
source code.
=end pod
unit module Stats::Simple;
=begin pod
=head2 mean
mean(@values --> Numeric)
Returns the arithmetic mean (average) of a list of numeric values.
=head3 Parameters
=over 4
=item @values
A list of numeric values.
=back
=head3 Example
say mean(1, 2, 3, 4); # 2.5
=end pod
sub mean(*@values --> Numeric) is export {
die "No values supplied" if @values.elems == 0;
@values.sum / @values.elems;
}
=begin pod
=head2 median
median(@values --> Numeric)
Returns the median value of a list of numbers.
If the list length is even, the function returns the mean of
the two middle values.
=head3 Example
say median(1, 5, 3); # 3
say median(1, 2, 3, 4); # 2.5
=end pod
sub median(*@values --> Numeric) is export {
die "No values supplied" if @values.elems == 0;
my @sorted = @values.sort;
my $n = @sorted.elems;
return @sorted[$n div 2] if $n % 2;
(@sorted[$n/2 - 1] + @sorted[$n/2]) / 2;
}
=begin pod
=head1 AUTHOR
Example written to demonstrate Rakudoc usage.
=head1 LICENSE
Public domain / example code.
=end podWith there being data that shows context files which explain code reduces the performance of them, it is not straightforward that literate programming is better so without data this article is useless.
Most of these LLM things are kind of separate systems, with their own UI. The idea of agency being inlayed to existing systems the user knows like this, with immediate bidirectional feedback as the user and LLM work the page, is incredibly incredibly compelling to me.
Series of submissions (descending in time): https://news.ycombinator.com/item?id=47211249 https://news.ycombinator.com/item?id=47037501 https://news.ycombinator.com/item?id=45622604
I don't know whether "literate programming" per se is required. Good names, docstrings, type signatures, strategic comments re: "why", a good README, and thoughtfully-designed abstractions are enough to establish a solid pattern.
Going full "literate programming" may not be necessary. I'd maybe reframe it as a focus on communication. Notebooks, examples, scripts and such can go a long way to reinforcing the patterns.
Ultimately that's what it's about: establishing patterns for both your human readers and your LLMs to follow.
The big problem with documentation is that if it was accurate when it was written, it's just a matter of time before it goes stale compared to the code it's documenting. And while compilers can tell you if your types and your implementation have come out of sync, before now there's been nothing automated that can check whether your comments are still telling the truth.
Somebody could make a startup out of this.
This allows a trusted and tested abstraction layer that does not shift and makes maintenance easier, while making the code that the agents generate easier to review and it also uses much less tokens.
So as always, just build better abstractions.
Boring and reliable, I know.
If you need guides to the code base beyond what the programming language provides, just write a directory level readme.md where necessary.
This was always the primary role. The only people who ever said it was about writing just wanted an easy sales pitch aimed at everyone else.
Literate programming failed to take off because with that much prose it inevitably misrepresents the actual code. Most normal comments are bad enough.
It's hard to maintain any writing that doesn't actually change the result. You can't "test" comments. The author doesn't even need to know why the code works to write comments that are convincing at first glance. If we want to read lies influenced by office politics, we already have the rest of the docs.
and don't we have doc-blocks?
A bunch of us thought learning to talk to computers would get them out of learning to talk to humans and so they spent 4 of the most important years of emotional growth engaging in that, only to graduate and discover they are even farther behind everyone else in that area.
- Configuration is massively duplicated, across repositories
- No one is willing to rip out redundancy, because comprehensive testing is not practical
- In order to understand the configuration, you have to read lots of code, again across multiple repositories (this in particular is a problem for LLM assistance, at least the way we currently use it)
I love the idea, but in practice it’s currently a nightmare. I think if we took a week we could clean things up a fair bit, but we don’t have a week (at least as far as management is concerned), and again, without full functional testing, it’s difficult to know when you’ve accidentally broken someone else’s subsystem
Sometimes what we manage with config is itself processing pipelines. A tool like darktable has a series of processing steps that are run. Each of those has config, but the outer layer is itself a config of those inner configs. And the outer layer is a programmable pipeline; it's not that far apart from thinking of each user coming in and building their own http handler pipeline, making their own bespoke computational flow.
I guess my point is that computation itself is configuration. XSLT probably came closest to that sun. But we see similar lessons everywhere we look.
I’d like to have a good issue tracking system inside git. I think the SQLite version management system has this functionality but I never used it.
One thing to solve is that different kinds of users need to interact with it in different kinds of ways. Non-programmers can use Jira, for example. Issues are often treated as mutable text boxes rather than versioned specification (and git is designed for the latter). It’s tricky!
I'm thinking that we're approaching a world where you can both test for comments and test the comments themselves.
The biggest problem is that humans don't need the documentation until they do. I recall one project that extensively used docblock style comments. You could open any file in the project and find at least one error, either in the natural language or the annotations.
If the LLM actually uses the documentation in every task it performs- or if it isn't capable of adequate output without it- then that's a far better motivation to document than we actually ever had for day to day work.
Basically, it's incredibly helpful to document the higher-level structure of the code, almost like extensive docstrings at the file level and subdirectory level and project level.
The problem is that major architectural concepts and decisions are often cross-cutting across files and directories, so those aren't always the right places. And there's also the question of what properly belongs in code files, vs. what belongs in design documents, and how to ensure they are kept in sync.
New eyes don’t have the curse of knowledge. They don’t filter out the bullshit bits. And one of the advantages of creating reusable modules is you get more new eyes on your code regularly.
This may also be a place where AI can help. Some of the review tools are already calling us out on making the code not match the documentation.
We need metadata in source code that LLMs don't delete and interpreters/compilers/linters don't barf on.
- Module level comments with explanations of the purpose of the module and how it fits into the whole codebase.
- Document all methods, constants, and variables, public and private. A single terse sentence is enough, no need to go crazy.
- Document each block of code. Again, a single sentence is enough. The goal is to be able to know what that block does in plain English without having to "read" code. Reading code is a misnomer because it is a different ability from reading human language.
Example from one of my open-source projects: https://github.com/trane-project/trane/blob/master/src/sched...
Naming is so incredibly important. The wrong name for a configuration key can have cascading impacts, especially when there is "magic" involved, like stripping out or adding common prefixes to configuration values.
We have a concept called a "domain" which is treated as a magic value everywhere, such as adding a prefix or suffix. But domain isn't well-defined, and in different contexts it is used different ways, and figuring out what the impact is of choosing a domain string is typically a matter of trial and error.
This is especially pronounced in the programming workplace, where the most "senior" programmers are asked to stop programming so they can review PRs.
bool isEven(number: Int) { return number % 2 == 0 }
I would say this expresses the intent, no need for a comment saying "check if the number is even".
Most of the code I read (at work) is not documented, still I understand the intent. In open source projects, I used to go read the source code because the documentation is inexistent or out-of-date. To the point where now I actually go directly to the source code, because if the code is well written, I can actually understand it.
"Bad programmers worry about the code. Good programmers worry about data structures and their relationships."
-- Linus Torvalds
- Natural languages are ambiguous. That's the reason why we created programming languages. So the documentation around the code is generally ambiguous as well. Worse: it's not being executed, so it can get out of date (sometimes in subtle ways).
- LLMs are trained on tons of source code, which is arguably a smaller space than natural languages. My experience is that LLMs are really good at e.g. translating code between two programming languages. But translating my prompts to code is not working as well, because my prompts are in natural languages, and hence ambiguous.
- I wonder if it is a question of "natural languages vs programming languages" or "bad code vs good code". I could totally imagine that documenting bad code helps the LLMs (and the humans) understand the intent, while documenting good code actually adds ambiguity.
What I learned is that we write code for humans to read. Good code is code that clearly expresses the intent. If there is a need to comment the code all over the place, to me it means that the code is maybe not as good as it should be :-).
Of course there is an argument to make that the quality of code is generally getting worse every year, and therefore there is more and more a need for documentation around it because it's getting hard to understand what the hell the author wanted to do.
does literate code have a place for big pic though?
Have you tried naming things properly? A reader that knows your language could then read your code base as a narrative.
The question being - are LLMs 'good' at interpreting and making choices/decisions about data structures and relationships?
I do not write code for a living but I studied comp sci. My impression was always that the good software engineers did not worry about the code, not nearly as much as the data structures and so on.
A lighter API footprint probably also means a higher amount of boilerplate code, but these models love cranking out boilerplate.
I’ve been doing a lot more Go instead of dynamic languages like Python or TypeScript these days. Mostly because if agents are writing the program, they might as well write it in a language that’s fast enough. Fast compilation means agents can quickly iterate on a design, execute it, and loop back.
The Go ecosystem is heavy on style guides, design patterns, and canonical ways of doing things. Mostly because the language doesn’t prevent obvious footguns like nil pointer errors, subtle race conditions in concurrent code, or context cancellation issues. So people rely heavily on patterns, and agents are quite good at picking those up.
My version of literate programming is ensuring that each package has enough top-level docs and that all public APIs have good docstrings. I also point agents to read the Google Go style guide [1] each time before working on my codebase.This yields surprisingly good results most of the time.
If you get the architecture wrong, everyone complains. If you get it right, nobody notices it's there.
It's not practical to have codebases that can be read like a narrative, because that's not how we want to read them when we deal with the source code. We jump to definitions, arriving at different pieces of code in different paths, for different reasons, and presuming there is one universal, linear, book-style way to read that code, is frankly just absurd from this perspective. A programming language should be expressive enough to make code read easily, and tools should make it easy to navigate.
I believe my opinion on this matters more than an opinion of an average admirer of LP. By their own admission, they still mostly write code in boring plain text files. I write programs in org-mode all the time. Literally (no pun intended) all my libraries, outside of those written for a day job, are written in Org. I think it's important to note that they are all Lisp libraries, as my workflow might not be as great for something like C. The documentation in my Org files is mostly reduced to examples — I do like docstrings but I appreciate an exhaustive (or at least a rich enough) set of examples more, and writing them is much easier: I write them naturally as tests while I'm implementing a function. The examples are writen in Org blocks, and when I install a library of push an important commit, I run all tests, of which examples are but special cases. The effect is, this part of the documentation is always in sync with the code (of course, some tests fail, and they are marked as such when tests run). I know how to sync this with docstrings, too, if necessary; I haven't: it takes time to implement and I'm not sure the benefit will be that great.
My (limited, so far) experience with LLMs in this setting is nice: a set of pre-written examples provides a good entry point, and an LLM is often capable of producing a very satisfactory solution, immediately testable, of course. The general structure of my Org files with code is also quite strict.
I don't call this “literate programming”, however — I think LP is a mess of mostly wrong ideas — my approach is just a “notebook interface” to a program, inspired by Mathematica Notebooks, popularly (but not in a representative way) imitated by the now-famous Jupyter notebooks. The terminology doesn't matter much: what I'm describing is what the silly.business blogpost is largerly about. The author of nbdev is in the comments here; we're basically implementing the same idea.
silly.business mentions tangling which is a fundamental concept in LP and is a good example of what I dislike about LP: tangling, like several concepts behing LP, is only a thing due to limitations of the programming systems that Donald Knuth was using. When I write Common Lisp in Org, I do not need to tangle, because Common Lisp does not have many of the limitations that apparently influenced the concepts of LP. Much like “reading like a narrative” idea is misguided, for reasons I outlined in the beginning. Lisp is expressive enough to read like prose (or like anything else) to as large a degree as required, and, more generally, to have code organized as non-linearly as required. This argument, however, is irrelevant if we want LLMs, rather than us, read codebases like a book; but that's a different topic.
"Everything's broken! What do we even pay you for!?"
Go was designed based on Rob Pike's contempt for his coworkers (https://news.ycombinator.com/item?id=16143918), so it seems suitable for LLMs.
I am currently fighting the recursive improvement loop part of working with agents.
Translating from a natural language spec to code involves a truly massive amount of decision making because it’s ambiguous. For a non trivial program, 2 implementations of the same natural language spec will have thousands of observable differences.
Where we are today, that is agents require guardrails to keep from spinning out, there is no way to let agents work on code autonomously or constantly recompile specs that won’t end up with all of those observable differences constantly shifting, resulting in unusable software.
Tests can’t prevent this because for a test suite to cover all observable behavior, it would need to be more complex than the code. In which case, it wouldn’t be any easier for machine or human to understand. The only solution to this problem is that LLMs get better.
Personally I think at the point they can pull this off, they can do any white collar job, and there’s not point in planning for that future because it results in either Mad Max or Star Trek.
Natural languages are richer in ideas, it may be harder to get working code going from a purely natural description to code, than code to code, but you don't gain much from just translating code. One is only limited by your imagination the other already exists, you could just call it as a routine.
You only have a SENSE for good code because it's a natural language with conventions and shared meaning. If the goal of programming is to learn to communicate better as humans then we should be fighting ambiguity not running from it. 100 years from now nobody is going to understand that your conventions were actually "good code".
I have full examples of something that is heavily commented and explained, including links to any schemas or docs. I have gotten good results when I ask an LLM to use that as a template, that not everything in there needs to be used, and it cuts down on hallucinations by quite a bit.
If good code was enough on its own we would read the source instead of documentation. I believe part of good software is good documentation. The prose of literate source is aimed at documentation, not line-level comments about implementation.
Not only that, but there's something very annoying and deeply dissatisfying about typing a bunch of text into a thing for which you have no control over how its producing an output, nor can an output be reproduced even if the input is identical.
Agreed natural language is very ambiguous and becoming more ambiguous by the day "what exactly does 'vibe' mean?".
People spoke in a particular way, say 60 years ago, that left very little room for interpretation of what they meant. The same cannot be said today.
I loathe this take.
I have rocked up to codebases where there were specific rules banning comments because of this attitude.
Yes comments can lie, yes there are no guards ensuring they stay in lock step with the code they document, but not having them is a thousand times worse - I can always see WHAT code is doing, that's never the problem, the problems is WHY it was done in this manner.
I put comments like "This code runs in O(n) because there are only a handful of items ever going to be searched - update it when there are enough items to justify an O(log2 n) search"
That tells future developers that the author (me) KNOWS it's not the most efficient code possible, but it IS when you take into account things unknown by the person reading it
Edit: Tribal knowledge is the worst type of knowledge, it's assumed that everyone knows it, and pass it along when new people onboard, but the reality (for me) has always been that the people doing the onboarding have had fragments, or incorrect assumptions on what was being conveyed to them, and just like the childrens game of "telephone" the passing of the knowledge always ends in a disaster
My Git history contains links between the false starts and misunderstandings and the corrections, which then also include a paragraph on my this was a misunderstanding or false start. It is a lot better than just a single linear log from LLMs.
My Git history contains links between the false starts and misunderstandings and the corrections, which then also include a paragraph on my this was a misunderstanding or false start. It is a lot better than just a single linear log.
And maybe there is a way to trim the parts out of it that are not needed... like to automatically produce an initial prompt which is equivalent to the results of a longer session, but is precise enough so as to not need clarification upon reprocessing it. Something like that? I'm not sure if that's something that already exists.
That's 100% how I work -- reading the source. If the code is confusing, the code needs to be fixed.
other than that you seem to be arguing against someone other than me. I certainly agree that agents / existing options would be chaotic hell to use this way. But I think the high-level idea has some potential, independent of that.
An axiom I have long held regarding documenting code is:
Code answers what it does, how it does it, when it is used,
and who uses it. What it cannot answer is why it exists.
Comments accomplish this.(originally developed at: https://docs.divio.com/documentation-system/) --- divides documentation along two axes:
- Action (Practical) vs. Cognition (Theoretical)
- Acquisition (Studying) vs. Application (Working)
which for my current project has resulted in:
- readme.md --- (Overview) Explanation (understanding-oriented)
- Templates (small source snippets) --- Tutorials (learning-oriented)
- Literate Source (pdf) --- How-to Guides (problem-oriented)
- Index (of the above pdf) --- Reference (information-oriented)
Comments only lie if they are allowed to become one.
Just like a method name can lie. Or a class name. Or ...
Surely you don’t mean everyone in the 1960s spoke directly, free of metaphor or euphemism or nuance or doublespeak or dog whistle or any other kind or ambiguity? Then why are there people who dedicate their entire life to interpreting religious texts and the Constitution?
Literate programming is the idea that code should be intermingled with prose such that an uninformed reader could read a code base as a narrative, and come away with an understanding of how it works and what it does.
Although I have long been intrigued by this idea, and have found uses for it in a couple1 of different cases2, I have found that in practice literate programming turns into a chore of maintaining two parallel narratives: the code itself, and the prose. This has obviously limited its adoption.
Historically in practice literate programming is most commonly found as Jupyter notebooks in the data science community, where explanations live alongside calculations and their results in a web browser.
Frequent readers of this blog will be aware that Emacs Org Mode supports polyglot literate programming through its org-babel package, allowing execution of arbitrary languages with results captured back into the document, but this has remained a niche pattern for nerds like me.
Even for someone as enthusiastic about this pattern as I am, it becomes cumbersome to use Org as the source of truth for larger software projects, as the source code essentially becomes a compiled output, and after every edit in the Org file, the code must be re-extracted and placed into its destination ("tangled", in Org Mode parlance). Obviously this can be automated, but it's easy to get into annoying situations where you or your agent has edited the real source and it gets overwritten on the next tangle.
That said, I have had enough success with using literate programming for bookkeeping personal configuration that I have not been able to fully give up on the idea, even before the advent of LLMs.
For example: before coding agents, I had been adapting a pattern for using Org Mode for manual testing and note-taking: instead of working on the command line, I would write more commands into my editor and execute them there, editing them in place until each step was correct, and running them in-place, so that when I was done I would have a document explaining exactly the steps that were taken, without extra steps or note-taking. Combining the act of creating the note and running the test gives you the notes for free when the test is completed.
This is even more exciting now that we have coding agents. Claude and Kimi and friends all have a great grasp of Org Mode syntax; it's a forgiving markup language and they are quite good at those. All the documentation is available online and was probably in the training data, and while a big downside of Org Mode is just how much syntax there is, but that's no problem at all for a language model.
Now when I want to test a feature, I ask the clanker3 to write me a runbook in Org. Then I can review it – the prose explains the model's reflection of the intent for each step, and the code blocks are interactively executable once I am done reviewing, either one at a time or the whole file like a script. The results will be stored in the document, under the code, like a Jupyter notebook.
I can edit the prose and ask the model to update the code, or edit the code and have the model reflect the meaning upon the text. Or ask the agent to change both simultaneously. The problem of maintaining the parallel systems disappears.
The agent is told to handle tangling, and the problem of extraction goes away. The agent can be instructed with an AGENTS.md file to treat the Org Mode file as the source of truth, to always explain in prose what is going on, and to tangle before execution. The agent is very good at all of these things, and it never gets tired of re-explaining something in prose after a tweak to the code.
The fundamental extra labor of literate programming, which I believe is why it is not widely practiced, is eliminated by the agent and it utilizes capabilities the large language model is best at: translation and summarization.
As a benefit, the code base can now be exported into many formats for comfortable reading. This is especially important if the primary role of engineers is shifting from writing to reading.
I don't have data to support this, but I also suspect that literate programming will improve the quality of generated code, because the prose explaining the intent of each code block will appear in context alongside the code itself.
I have not personally had the opportunity to try this pattern yet on a larger, more serious codebase. So far, I have only been using this workflow for testing and for documenting manual processes, but I am thrilled by its application there.
I also recognize that the Org format is a limiting factor, due to its tight integration with Emacs. However, I have long believed that Org should escape Emacs. I would promote something like Markdown instead, however Markdown lacks the ability to include metadata4. But as usual in my posts about Emacs, it's not Emacs's specific implementation of the idea that excites me, as in this case Org's implementation of literate programming does.
It is the idea itself that is exciting to me, not the tool.
With agents, does it become practical to have large codebases that can be read like a narrative, whose prose is kept in sync with changes to the code by tireless machines?
I think that's a compelling question.
Org Mode has concepts like properties which allow acting on the document programmatically from Emacs Lisp. In the past this meant I was often tempted to fumble around with Lisp for awhile to get some imagined interactive feature in my document, for which I never had time in practice. But now the LLM will happily shove some Emacs Lisp into the file variables section of the document with bespoke functionality for that interactive document specifically.
The lack of metadata in Markdown also means that there is nowhere to store information about codeblocks that would be extracted from a literate document. Org Mode provides header arguments that can be applied to source code blocks, providing instruction to the machine about execution details like where the code should be executed, which might even be a remote machine.
I think we’ll either get to the point where AI is so advanced it replaces the manager, the PM, the engineer, the designer, and the CEO, or we’ll keep using formal languages to specify how computers should work.
https://github.com/super-productivity/super-productivity/wik...
Even with a well-described framework it is still hard to maintain proper boundaries and there is always a temptation to mix things together.
The compiler ensures that the code is valid, and what ensures that ‘// used a suboptimal sort because reasons’ is updated during a global refactor that changes the method? … some dude living in that module all day every day exercising monk-like discipline? That is unwanted for a few reasons, notably the routine failures of such efforts over time.
Module names and namespaces and function names can lie. But they are also corrected wholesale and en-masse when first fixed, those lies are made apparent when using them. If right_pad() is updated so it’s actually left_pad() it gets caught as an error source during implementation or as an independent naming issue in working code. If that misrepresentation is the source of an emergent error it will be visible and unavoidable in debugging if it’s in code, and the subsequent correction will be validated by the compiler (and therefore amenable to automated testing).
Lies in comments don’t reduce the potential for lies in code, but keeping inline comments minimal and focused on exceptional circumstances can meaningfully reduce the number of aggregate lies in a codebase.
There's a generation of people that 'typ lyk dis'.
So yes.
I <3 great (edit: improve clarity) commit comments, but I am leaning more heavily to good comments at the same level as the dev is reading - right there in the code - rather than telling them to look at git blame, find the appropriate commit message (keeping in mind that there might have been changes to the line(s) of code and commits might intertwine, thus making it a mission to find the commit holding the right message(s).
edit: I forgot to add - commit messages are great, assuming the people merging the PR into main aren't squashing the commits (a lot of people do this because of a lack of understanding of our friend rebase)
Great point. Well-placed documentation as to why an approach was not taken can be quite valuable.
For example, documenting that domain events are persisted in the same DB transaction as changes to corresponding entities and then picked up by a different workflow instead of being sent immediately after a commit.
Practically, it only encodes information that made it into `main`, not what an author just mulled over in their head or just had a brief prototype for, or ran an unrelated toy simulation over.