My ability to get this right is often a matter of how well I know the domain. If I don't know the domain as well I think I do, I fall into a lot of rework. If I know the domain more than I imagine then I waste my time with a baby step process when I could have run. All of this is a big judgement call, and I have "regrets" in both directions.
I don't think it holds in 100% of situations but I do think if you're going to make an error one way or the other, I'd rather do something smaller and release too early than do something bigger and waste time.
See also "Why does the C++ standard ship every three years" (as opposed to ship when the desired features are ready):
https://news.ycombinator.com/item?id=20428703 (2019-07-13, 220 comments)
I think the author is indeed someone who just really enjoys learning and doing all sorts of things, so the rabbitholing is part of the fun that tickles their brain.
Don't fall prey to sunk cost fallacy. Just because you spent hours researching a PhD level topic doesn't mean you now have to use it in your project, if it's not quite the right application.
You start with a simple goal → then research → then keep expanding scope → and never ship.
The people who actually finish things do the opposite: lock scope early, ignore “better ideas”, ship v1.
Most projects don’t fail due to lack of ideas, they fail because they never converge.
The real problem is avoidance, when cuts are warranted and you don't want them, so you ... hide, often by working hard on something else.
The solution is to value your time. Most don't, so (self-) managers instead need to dangle other opportunities: finish this so you can do that. You can't take candy from a baby without trouble; instead, you trade for something else.
This resonates hard. LLMs enable true perfectionism, the ability to completely fulfil your vision for a project. This lets you add many features without burning out due to fatigue or boredom. However (as the author points out), most projects' original goal does not require these complementary features.
Prototype a minority of the time. Research a majority of the time. At some point the ratio flips as research fades out and producing increases.
Maybe I lack imagination or curiosity, but it makes it difficult to come up with an idea and follow it through.
do you want to learn a new skill? do you want to scratch a very specific personal itch for just yourself? do you want to solve problems for others as well? do you want to build a startup/business around the idea?
all of these necessitate different approaches and strategies to research and coding. scratching an itch? maybe fully vibe coding is fine. want to learn? ditch the vibes and write by hand and ignore prior art. want to build a business? do some actual market research first and decide if this is something you actually want to pursue.
this post was a good reminder for me to identify the why as early on as possible and to be ok with just building something for myself without always having to monetize a side project which, for me, just zaps all joy from it.
Project where the sole user is you in your kitchen? Sure, hack it together.
Project where you actually want other people to use the product? A research phase matters and helps here.
Consider what the goal is and the amount of effort to invest typically becomes more evident.
Basically, you will end up dependent on the massive complexity of a compiler due to the syntax complexity, and the cherry on top, thanks to ISO, you'll get feature creep creating a cycle of planned obsolescence around 5 to 10 years.
Oh, sorry, "they" called that "innovation".
"impediment to action advances action. what stands in the way, becomes the way".
It's in a field that I have little experience with (Information Retrieval). So there is obviously prior art that I could learn from or even integrate with.
This article motivates me further to learn things by focusing on building my own and peek into prior art as I go, when I'm stuck or need ideas.
Recently a Clojure documentary came out and the approach of Rich Hickey was seemingly the opposite: Deep research of prior art, papers, other languages over a long period of time.
However, he also mentioned that he made other languages before. So the larger story starts earlier, by making things and learning from practice.
Maybe that's also the bigger lesson: Don't overthink, start by making the thing. But later when you learned a bunch of practical lessons and maybe hit a wall or two, then you might need that deeper research to push further.
Organizations and Conferences:
1. Insist on doing everything through “channels.” Never permit short-cuts to be taken in order to expedite decisions.
2. Make “speeches,” Talk as frequently as possible and at great length. Illustrate your “points” by long anecdotes and accounts of personal experiences.
3. When possible refer all matters to committees, for “further study and consideration”. Attempt to make the committees as large as possible – never less than five.
4. Bring up irrelevant issues as frequently as possible.
5. Haggle over precise wordings of communications, minutes, resolutions.
6. Refer back to matters decided upon at the last meeting and attempt to re-open the question of the advisability of that decision.
7. Advocate “caution.” Be “reasonable” and urge your fellow-conferees to be “reasonable” and avoid haste which might result in embarrassments or difficulties later on.
8. Be worried about the propriety of any decision – raise the question of whether such action as is contemplated lies within the jurisdiction of the group or whether it might conflict with the policy of some higher echelon.
Managers and Supervisors:
1. Demand written orders.
2. “Misunderstand” orders. Ask endless questions or engage in long correspondence about such orders. Quibble over them when you can.
3. Do everything possible to delay the delivery of orders. Even though parts of the order may be ready beforehand, don’t deliver it until its completely ready.
4. Don’t order new working materials until your current stocks have been virtually exhausted, so that the slightest delay in filling your order will mean a shutdown.
5. Order high-quality materials which are hard to get. If you don’t get them argue about it. Warn that inferior materials will mean inferior work.
6. In making work assignments, always sing out the unimportant jobs first. See that important jobs are assigned to inefficient workers with poor equipment.
7. Insist on perfect work in relatively unimportant products send back for refinishing those which have the least flaws. Approve other defective parts whose flaws are not visible to the naked eye.
8. Make mistakes in routing so that parts and materials will be sent to the wrong place in the plant.
9. When training new workers, give incomplete or misleading instructions.
10. To lower moral and with it production, be pleasant to inefficient workers; give them undeserved promotions. Discriminate against efficient workers; complain unjustly about their work.
11. Hold meetings when there is critical work to be done.
12. Multiply paperwork in plausible ways. Start duplicating files.
13. Multiply the procedures and clearances involved in issuing instructions, making payments, and so on. See that three people have to approve everything where one would do.
14. Apply all regulations to the last letter.
You’ll almost never see a PhD thesis that has anything particularly interesting, novel or directly applicable to the sciences.
I always thought perfectionism meant extremely high achievements (for too great of a cost). But it can also be quitting without any progress because you can't accept anything less than perfect (which may or may not be achievable). Perfectionism can be someone procrastinating on a large task.
Kids these days just want to use prefab libraries and frameworks with a million dependencies doing god knows what and written by randos.
(Unrelated to how commenters these days just want an excuse to use the term "peanut smut".)
Day 400: Having thoroughly described a universal theory of everything, we set out to build an experimental apparatus in orbit at a Lagrange point capable of detecting a universal particle which acts a mediator for all observable forces in the known universe.
This is definitely the wrong way of going about a research project, and I have rarely seen anyone approach research projects this way. You should read two or at most three papers and build upon them. You only do a deep review of the research literature later in the project, once you have some results and you have started writing them down.
I had a coworker who would always be diplomatic about code changes he felt could be improved but when he felt he was nitpicking, where he would say: It's better than it was. It allowed him to provide criticism while also giving permission to go ahead even if there were minor things that weren't perfect. I strongly endorse this kind of attitude.
Acknowledge it is normal? Attempt to buy deeper into the delusion ("Yeah my work is awesome and unique!"). Use stimulants to force enthusiastic days every once in awhile?
Uhh... unless you plan to stay in academia? Then, this is a terrible idea.
If it helps anything at all: It's normal. At this point, you've already proven you're smart and knowledgeable. Now, the universe wants to see if you can also finish what you've started. That's the main thing a PhD proves: That you can take an incredibly interesting topic and then do all the boring stuff that they need you to do to be formally compliant with arbitrary rules.
Focus on finishing. Reduce the scope as much as possible again. Down to your core message (or 3-4 core messages, I guess, for paper-based dissertations).
Listen to the feedback you get from your advisor.
You got this!
But there's some things to remember that are incredibly important
- a paper doesn't *prove* something, it suggests it is *probably* right
- under the conditions of the paper's settings, which aren't yours
- just because someone had X outcome before doesn't mean you won't get Y outcome
- those small details usually dominate success
- sometimes a one liner seemingly throw away sentence is what you're missing
- sometimes the authors don't know and the answer is 5 papers back that they've been building on
- DO NOT TREAT PAPERS AS *ABSOLUTE* TRUTH
- no one is *absolutely* right, everyone is *some* degree of wrong
- other researchers are just like you, writing papers just like you
- they also look back at their old papers and say "I'm glad I'm not that bad anymore"
- a paper demonstrating your idea is a positive signal, you're thinking in the right direction
As soon as you start treating papers as "this is fact" you tend to overly generalize the results. But the details dominate so you just kill your own creativity. You kill your own ideas before you know they're right or wrong. More importantly you don't know how right or how wrong. nit: this could be changed to XYZ
vs we should use XYZ here
where it was understood nits could be ignored if you didn't feel it was an urgent thing vs a preference.The trick to overcoming this is not to aim for "clean" but for "cleaner than before".
Just keep chipping away at it, whether it is a messy codebase or a messy kitchen.
That was also on my mind thanks to the documentary. Then I followed up with "Easy made Simple" and "Hammock Driven Development", and it makes me want to learn Clojure.
Clojure documentary on CultRepo channel: https://www.youtube.com/watch?v=Y24vK_QDLFg
Simple Made Easy: https://www.youtube.com/watch?v=SxdOUGdseq4
Hammock Driven Development: https://www.youtube.com/watch?v=f84n5oFoZBc
Like, if you stay focused, is it even really a side project?
Which is why my 2d top down sprite-based rpg now has a 3d procedural animation engine, a procedural 3d character generator with automagic rigging, a population simulator that would put Europa Universalis to shame if I ever get around to finishing it (ha!) a pixel art editor, a 2d procedural animation engine using active ragdolls.........
You might wonder why a 2d game needs 3d procedural animation, well...
The scope creeps in mysterious ways
in my field this would be terrible advice. instead you need to be doing something that your audience actually will give a shit about.
Moreover, I am not suggesting you don't look at other papers at all. But google scholar and some quick skimming of abstracts and papers you find should suffice to check if someone has already done the work. If you start fully reading more than a handful of papers, your ideas are already locked in by what others have done, and it becomes way harder to produce something novel.
What I am describing would be something higher level, more like a comment on approach, or an observation that there is some high-level redundancy or opportunity for refactor. Something like "in an ideal world we would offload some of this to an external cache server instead of an in-memory store but this is better than hitting the DB on every request".
That kind of observation may come up in top-level comment on a code review, but it might also come up in a tech review long before a line of code has been written. It is about extending that attitude to all aspects of dev.
The other saying I say is "completion not perfection". That helps me in yard work especially. I'm not going for the cover shot of "Better Homes and Gardens", I just need the lawn to be cut.
The sand blows in endlessly. You don’t aim for a pristine, sandless land. But you can’t ignore it or it takes over.
I’ll just pick up a few things and ferry them towards their “home.” Or go do a small amount of yard work. Etc.
When I did my MSc thesis he told me it was a pretty good PhD. (Before giving me a months work in corrections.) I didn’t understand back then, but I understand now. It was small, replicatable and novel (still is)! Just replicate three times and be done with it. You’ve proven your mastery. Now start something serious.
My professor once told me he presented at a small conference, the whole audience everybody had PhD in mathematics and maybe 2 of the 50 or so people in the audience could follow along. The point he was trying to make is at some point the people in the audience were not really interested in what was being presented because it is difficult to just follow along some really niche topic.
Replicating existing results doesn't meet that criteria so unknowingly repeating someone's work is an existential crisis for PhD students. It can mean that you worked for 4-6 years on something that the committee then can't/won't grant a doctorate for and effectively forcing you to start over.
Theoretically, your advisor is supposed to help prevent this as well by guiding you in good directions, but not all advisors are created equal.
He discussed this topic and how generally it's left to those who are more notable in a field to ask the 'dumb' questions everyone else is afraid to ask. And such questions often need to be asked to get the audience on board and open the floodgates with areas of niche research - the speaker themself is often too far into the rabbit hole to discern the difference between opaque and obvious.
So it stands to reason, at smaller conferences this would be a big problem, with fewer thought leaders in attendance whose reputations are intact enough that they wouldn't mind looking foolish.
It’s as if a committee of middle managers got together and said, “how can we replicate and scale the work of people like Einstein?”
Hi friends,
I’ll be attending Babashka Conf on May 8 and Dutch Clojure Days on May 9. If you’re attending either (or just visiting Amsterdam), drop me a line!
When I have an idea for a project, it tends to go in one of these two directions:
I just do it. Maybe I make a few minor revisions, but often it turns out exactly how I’d imagined and I’m happy.
I think, “I should look for prior art”. There’s a lot of prior art, dealing with a much broader scope than I’d originally imagined. I start to wonder if I should incorporate that scope. Or perhaps try to build my thing on top of the existing sorta-nearby-solutions. Or maybe I should just use the popular thing. Although I could do a better job than that thing, if I put a bunch of time into it. But actually, I don’t want to maintain a big popular project, nor do I want to put that much time into this project. Uh oh, now I’ve spent a bunch of time, having neither addressed the original issue nor experienced the joy of creating something.
I prefer the first outcome, and I think the pivotal factor is how well I’ve internalized my own success criteria.
For example, last weekend I hosted my friend Marcin and we decided it’d be fun to do some woodworking, so we threw together this shelf and 3d-printed hangers for my kitchen:

Absolute banger of a project:
The main success criteria was to jam on woodworking with a friend, and that helped me not overthink the object-level success criteria: Just make a shelf for my exact kitchen!
In contrast, this past Friday I noticed difftastic did a poor job, so I decided to shop around for structural/semantic diff tools and related workflows (a topic I’ve never studied, that I’m increasingly interested in as I’m reviewing more and more LLM-generated code).
I spent 4 hours over the weekend researching existing tools (see my notes below), going through dark periods of both “semantic tree diffing is a PhD-level complex problem” and “why do all of these have MCP servers? I don’t want an MCP server”, before I came to my senses and remembered my original success criteria: I just want a nicer diffing workflow for myself in Emacs, I should just build it myself — should take about 4 hours.
I’m cautiously optimistic that, having had this realization and committing myself to a minimal scope, I’ll be able to knock out a prototype before running out of motivation.
However, other long-running interests of mine:
seem to be deep in the well of outcome #2.
That is, I’ve spent hundreds of hours on background research and little prototypes, but haven’t yet synthesized anything that addresses the original motivating issue.
It’s not quite that I regret that time — I do love learning by reading — but I have a nagging sense of unease that my inner critic (fear of failure?) is silencing my generative tendencies, keeping me from the much more enjoyable (and productive!) learning by doing.
I think in these cases the success criteria has been much fuzzier: Am I trying to replace my own usage of Rust/Clojure? Only for some subset of problems? Or is it that I actually just need a playground to learn about language design/implementation, and it’s fine if I don’t end up using it?
Ditto for CAD: Am I trying to replace my commercial CAD tool in favor of my own? Only for some subset of simple or particularly parametric parts? Do I care if it’s useful for others? Does my tool need to be legibly different from existing open-source tools?
It’s worth considering these questions, sure. But at the end of the day, I’d much rather have done a lot than have only considered a lot.
So I’m trying to embrace my inner clueless 20-year-old and just do things — even if some turn out to be “obviously bad” in hindsight, I’ll still be coming out ahead on net =D
Of course, there’s only so much time to “just do things”, and there’s a balance to be had. I’m not sure how many times I’ll re-learn YAGNI (“you ain’t gonna need it”) in my career, but I was reminded of it again after writing a bunch of code with an LLM agent, then eventually coming to my senses and throwing it all out.
I wanted a Finda-style filesystem-wide fuzzy path search for Emacs. Since I’ve built (by hand, typing the code myself!) this exact functionality before (walk filesystem to collect paths, index them by trigram, do fast fuzzy queries via bitmap intersections), I figured it’d only take a few hours to supervise an LLM to write all the code.
I started with a “plan mode” chat, and the LLM suggested a library, Nucleo, which turned up since I wrote Finda (10 years ago, eek!). I read through it, found it quite well-designed and documented, and decided to use it so I’d get its smart case and Unicode normalization functionality. (E.g., query foo matches Foo and foo, whereas query Foo won’t match foo; similarly for cafe and café.)
Finding a great library wasn’t the problem, the problem was that Nucleo also supported some extra functionality: anchors (^foo only matches at the beginning of a line).
This got me thinking about what that might mean in a corpus that consists entirely of file paths. Anchoring to the beginning of a line isn’t useful (everything starts with /), so I decided to try and interpret the anchors with respect to the path segments. E.g., ^foo would match /root/foobar/ but not /root/barfoo/.
But to do this efficiently, the index needs to keep track of segment boundaries so that the query can be checked against each segment quickly.
But then we also need to handle a slash occurring in an anchored query (e.g., ^foo/bar) since that wouldn’t get matched when only looking at segments individually (root, foo, bar, and baz of a matching path /root/foo/bar/baz/).
Working through this took several hours: first throwing around design ideas with an LLM, having it write code to wrap Nucleo’s types, then realizing its code was bloated and didn’t spark joy, so finally writing my own (smaller) wrapper.
Then, after a break, I realized:
/ to the start or end of a query (this works for everything except anchoring to the end of a filename).So I tossed all of the anchoring code.
I’m pretty sure I still came out ahead compared to if I’d tried to write everything myself sans LLM or discussion with others, but I’m not certain.
Perhaps there’s some kind of conservation law here: Any increases in programming speed will be offset by a corresponding increase in unnecessary features, rabbit holes, and diversions.
Speaking of unnecessary diversions, let me tell you everything I’ve learned about structural diffing recently — if you have thoughts/feelings/references in this space, I’d love to hear about ‘em!
When we’re talking about code, a “diff” usually means a summary of the line-by-line changes between two versions of a file. This might be rendered as a “unified” view, where changed lines are prefixed with + or - to indicate whether they’re additions or deletions. For example:

We’ve removed coffee and added apple.
The same diff might also be rendered in a side-by-side view, which can be easier to read when there are more complex changes:

The problem with these line-by-line diffs is that they’re not aware of higher-level structure like functions, types, etc. — if some braces match up somehow between versions, they might not be shown at all, even if the braces “belong” to different functions.
There’s a wonderful tool, difftastic, which tries to address this by calculating diffs using treesitter-provided concrete syntax trees. It’s a huge improvement over line-based diffs, but unfortunately it doesn’t always do a great job matching entities between versions.
Here’s the diff that motivated this entire foray:

Note that it doesn’t match up struct PendingClick, it shows it deleted on the left and added on the right.
I haven’t dug into why difftastic fails to match here, but I do feel like it’s wrong — even if the overall diff would be longer, I’d still rather see PendingClickRequest and PendingClick matched up between both sides.
Here’s a summary of tools / references in the space:
The most “baked” and thoughtful semantic diff tool I found is, perhaps unsurprisingly, semanticdiff.com, a small German company with a free VSCode plugin and web app that shows diffs for github PRs. Unfortunately they don’t have any code libraries I can use as a foundation for the workflow I want.
Context-sensitive keywords in particular were a constant source of annoyance. The grammar looks correct, but it will fail to parse because of the way the lexer works. You don’t want your tool to abort just because someone named their parameter “async”.
mergiraf: treesitter-based merge-driver written in rust
weave: also a treesitter-based merge-driver written in Rust
sem diff --verbose HEAD~4; it showed lines as having changed that…didn’t change at all.diffast: tree edit-distance of ASTs based on an algorithm from a 2008 academic paper.
autochrome: Clojure-specific diffs based on dynamic programming
Tristan Hume has a great article on Designing a Tree Diff Algorithm Using Dynamic Programming and A*
My primary use case is reviewing LLM output turn-by-turn — I’m very much in-the-loop, and I’m not letting my agent (or dozens of them, lol) run wild generating 10k+ lines of code at a time.
Rather, I give an agent a scoped task, then come back in a few minutes and want to see an overview of what it did and then either revise/tweak it manually in Emacs or throw the whole thing out and try again (or just write it myself).
The workflow I want, then, is to
Basically, I want something like Magit’s workflow for reviewing and staging changes, but on an entity level rather than file/line level.
In light of the "minimal scope, just get your project done” lesson I’ve just re-learned for the nth time, my plan is to:
Once that seems reasonable (i.e., it does a better job than difftastic did on that specific commit), I’ll:
Mayyybe if I’m happy with it I’ll end up releasing something. But I’m not trying to collect Github stars or HN karma, so I might just happily use it in the privacy of my own home without trying to “commercialize it”.
After all, sometimes I just want a shelf.
I’m in the market for a few square meters of Tyvek or other translucent, non-woven material suitable for building a light diffuser — let me know if you have any favorite vendors that can ship to the EU.
How They Made This - Coinbase Commercial Breakdown. Crypto is a negative-sum parasite on productive economic activity, but has the silver lining of funneling a lot of capital to weird creative folks.
The Easiest Way To Design Furniture…. Laura Kampf on getting off the computer and designing physical spaces with tape, lil’ wood sticks, and cardboard.
Hotel California - Reimagined on the Traditional Chinese Guzheng
C is not a low-level language: Your computer is not a fast PDP-11.
Loon is a Lisp. Thrilled to discover I’m not the only one who wants to mash together Clojure and Rust. The current implementation seems to have been manically vibe-coded and I quickly ran into some terrible bugs, but on the other hand it exists so I’m not going to be a hater.
Made a print in place box so I can easily hand out printed bees🐝. “Im quite content with the result, the bees fit snugly and the box opens and closes nicely”
“There isn’t a lot of reliable information out there about how to buy a gas mask, especially for the specific purpose of living under state repression. But hopefully after reading this guide you’ll feel equipped to make an educated decision.”
“This zoomable map shows every page of every issue of BYTE starting from the front cover of the first issue (top left) to the last page of the final edition (bottom right). The search bar runs RE2 regex over the full text of all 100k pages.” A lovely reminder that user-interfaces can be extremely fast and information dense.