2025 Taxes
Dumped all pdfs of all my tax forms into a single folder, asked Claude the rename them nicely. Ask it to use Gemini 2.5 Flash to extract out all tax-relevant details from all statements / tax forms. Had it put together a webui showing all income, deductions, etc, for the year. Had it estimate my 2025 tax refund / underpay.
Result was amazing. I now actually fully understand the tax position. It broke down all the progressive tax brackets, added notes for all the extra federal and state taxes (i.e. Medicare, CA Mental Health tax, etc).
Finally had Claude prepare all of my docs for upload to my accountant: FinCEN reporting, summary of all docs, etc.
Desk Fabrication
Planning on having a furniture maker fabricate a custom walnut solid desk for a custom office standing desk. Want to create a STEP of the exact cuts / bevels / countersinks / etc to help with fabrication.
Worked with Codex to plan out and then build an interactive in-browser 3D CAD experience. I can ask Codex to add some component (i.e. a grommet) and it will generate a parameterized B-rep geometry for that feature and then allow me to control the parameters live in the web UI.
Codex found Open CASCADE Technology (OCCT) B-rep modeling library, which has a web assembly compiled version, and integrated it.
Now have a WebGL view of the desk, can add various components, change their parameters, and see the impact live in 3D.
I'm starting to believe using them is more likely to make you obsolete than not.
What I find interesting is that I have little motivation to open source it. Making it usable for others requires a substantial amount of time, which would otherwise be just a fraction of the development time.
What do people think of it?
I personal don't think that's a badge of honor. Aside from losing your coding skills you miss oppurtunities to generate AI pieces and connect them to existing systems that can't be feed into the AI. Plus making small changes is easier than having the AI make them without messing something else up.
Hope you dont get audited
You couldn’t do that with TurboTax or block’s tax file? You don’t have to submit or pay.
What scares me though is how I've (still) seen ChatGPT make up numbers in some specific scenarios.
I have a ChatGPT project with all of my bloodwork and a bunch of medical info from the past 10 years uploaded. I think it's more context than ChatGPT can handle at once. When I ask it basic things like "Compare how my lipids have trended over the past 2 years" it will sometimes make up numbers for tests, or it will mix up the dates on a certain data points.
It's usually very small errors that I don't notice until I really study what it's telling me.
And also the opposite problem: A couple days ago I thought I saw an error (when really ChatGPT was right). So I said "No, that number is wrong, find the error" and instead of pushing back and telling me the number was right, it admitted to the error (there was no error) and made up a reason why it was wrong.
Hallucinations have gotten way better compared to a couple years ago, but at least ChatGPT seems to still break down especially when it's overloaded with a ton of context, in my experience.
Code doesn't need to be "beautiful", but the beauty of code has nothing to do with maintainability. Linus once said "Bad programmers worry about the code. Good programmers worry about data structures and their relationships." The actual hard part of software is not the code, it's what isn't in the code - the assumptions, relationships, feedback loops, emergent behaviours, etc. Maintainability in that regard is about system design. Imagine software as a graph, the nodes being pieces of code and the edges being those implicit relationships. LLM's are good at generating the nodes but useless at the edges.
The only thing that seems to work is to have a validation criteria (eg. a test suite) that the LLM can use to do a guided random walk towards a solution where the edges and nodes align to satisfy the criteria. This can be useful if what you are doing doesn't really matter, like in the case of all the pet projects and tools people share. But it does matter if your program assumes responsibility somewhere, like if you're handling user data. This idea of guardrail-style programming has been around for a while, but nobody drives by bouncing off the guardrails to get to their destination, because it's much more efficient to encode what a program should do instead of what it shouldn't, which is the case with this type of mega-test-driven-development. Is it more efficient to tell someone where not to go when giving directions as opposed to telling them how to get there?
Take the Cloudflare Next.js experiment for example - their version passed all the Next.js tests but still had issues because the test suite didn't even come close to encoding how the system works.
So no, you still need to care about maintainability. You don't need to obsess over code aesthetics or design patterns or whatever, but you never needed to do that. In fact, more than ever programmers need to be concerned with the edges of their software and how they can guide the LLM's to generate the nodes (code) while maintaining the invariants of the edges.
This is a narrow view of software engineering. Thinking that your role is "code that works" is hardly better than thinking you're a "(human) resource that produces code". Your job is to provide value. You do that by building knowledge, not only of the system you're developing but of the problem space you're exploring, the customers you're serving, the innovations you can do that your competitors can't.
It's like saying that a soccer player's purpose is "to kick a ball" and therefore a machine that launches balls faster and further than any human will replace all soccer players, and soon all professional teams will be made up of robots.
*Piece Together*
An animated puzzle game that I built with a fairly heavy reliance on agentic coding, especially for scaffolding. I did have to jump in and tweak some things manually (the piece-matching algorithm, responsive design, etc.), but overall I’d estimate that LLMs handled about 80% of the work. It's heavily based on the concept of animated puzzles in the early edutainment game The Island of Dr. Brain.
https://animated-puzzles.specr.net
*Lend Me Your Ears*
Lend Me Your Ears is an interactive web-based game inspired by the classic Simon toy (originally by Milton Bradley). It presents players with a sequence of musical notes and challenges them to reproduce the sequence using either an on-screen piano, MIDI keyboard, or an acoustic instrument such as a guitar.
https://lend-me-your-ears.specr.net
*Shâh Kur - Invisible Chess*
A voice controlled blindfold chess game that uses novel types of approaches (last N pieces moved hidden, fade over time, etc). Already been already playing it daily on my walks.
*Word game to find the common word*
It's based off an old word game where one person tries to come up with three words: sign, watch, bus. The other person has to think of a common word that forms compound-style words with each of them: stop.
I was quite surprised to see that this didn't exist online already.
https://common-thread.specr.net
*A Slide Puzzle*
Slide puzzles for qualified MENSA members. I built it for a friend who's basically a real-life equivalent of Dustin Hoffman's character from Rain Man. So you might have to rearrange a slide puzzle from the periodic table of elements, or the U.S. presidents by portrait, etc.
https://slide-puzzles.specr.net
*Glyphshift*
Transforms random words on web pages into different writing systems like Hiragana, Braille, and Morse Code to help you learn and practice reading these alphabets so you can practice the most functionally pointless task, like being able to read braille visually.
https://github.com/scpedicini/glyph-shift
All of these were built with varying levels of assistance from agentic coding. None of them were purely vibe-coded and there was a great deal of manual and unit testing to verify functionality as it was built.
I prefer having Claude make even small changes at this point since every change it makes ends up tweaking it to better understand something about my coding convention, standard, interpretation etc... It does pick up on these little changes and commits them to memory so that in the long run you end up not having to make any little changes whatsoever.
And to drive this point further, even prior to using LLMs, if I review someone's work and see even a single typo or something minor that I could probably just fix in a second, I still insist that the author is the one to fix it. It's something my mentor at Google did with me which at the time I kind of felt was a bit annoying, but I've come to understand their reason for it and appreciate it.
I imagine your accountant had the same reaction I do when an amateur shows me their vibe codebase.
1. I keep all my accounts in accounting software (originally Wave, then beancount)
2. Because the machinery is all in programmatically queriable means, the data is not in token-space, only the schema and logic
I then use tax software to prep my professional and personal returns. The LLM acts as a validator, and ensures I've done my accounts right. I have `jmap` pull my mail via IMAP, my Mercury account via a read-only transactions-only token and then I let it compare against my beancount records to make sure I've accounted for things correctly.
For the most part, you want it to be handling very little arithmetic in token-space though the SOTA models can do it pretty flawlessly. I did notice that they would occasionally make arithmetic errors in numerical comparison, but when using them as an assistant you're not using them directly but as a hypothesis generator and a checker tool and if you ask it to write out the reasoning it's pretty damned good.
For me Opus 4.6 in Claude Code was remarkable for this use-case. These days, I just run `,cc accounts` and then look at the newly added accounts in fava and compare with Mercury. This is one of those tedious-to-enter trivial-to-verify use-cases that they excel at.
To be honest, I was fine using Wave, but without machine-access it's software that's dead to me.
And it usually takes just as long.
Similarly to your directions analogy, I’ve been using the the analogy id trying to ensure that a 1000 restaurant franchise produces the exact same peanut butter sandwich for ever customer.
It’s much easier to figure out the primitives that your employees understand and then use those primitives to describe exactly how to build a sandwich than it is to write a massive specification that describes what they should produce and just let them figure it out.
It still has gaps. I don't think they've landed on the right model for CI. Like Earthly, their model is a CI runner + local cache. I believe a distributed cache (like Bazel) makes more sense.
If I were choosing between the two I'd personally always pick Dagger, but I think there is a strong argument for Earthly for simpler projects. If you're using multiple Earthfiles or a few hundred lines of Earthly, I think you've outgrown it.
* https://www.stavros.io/posts/i-made-a-voice-note-taker/ - A voice note recorder.
* https://github.com/skorokithakis/stavrobot - My secure AI personal assistant that's made my life admin massively easier.
* https://github.com/skorokithakis/macropad - A macropad.
* https://github.com/skorokithakis/sleight-of-hand - A clock that ticks seconds irregularly but is accurate for minutes.
* https://pine.town - A whimsical little massively multiplayer drawing town.
* https://encyclopedai.stavros.io - A fictional encyclopedia.
* https://justone.stavros.io - A web implementation of the board game Just One.
* https://www.themakery.cc - The website and newsletter for my maker community.
* https://theboard.stavros.io - A feature board that implements itself.
* https://github.com/skorokithakis/dracula - A blood test viewer.
* https://github.com/skorokithakis/support-email-bot - An email bot to answer common support queries for my users.
Maybe some of these will beat the rap.
I chuckle when I see some of them because you could achieve the same (or often faster) result by jotting a note onto a notecard and sticking it in your pocket.
Most of the other automations running don't really seem to serve any real purpose at all.
But hey, if it's fun, have at it.
(I actually did write my own note-taking application, but that was before LLMs were any good at writing code.)
To then "aggregate" all of the json outputs, I had Claude look at the json outputs, and then iterate on a Python tool to programmatically do it. I saw it iterating a few times on this: write the most naive Python tool, run it, throws exception, rinse and repeat, until it was able to parse all the json files sensibly.
I get the sentiment, but this is natural with a groundbraking new technology. We are still in the process of figuring out how to best apply generative LLM's in a productive way. Lots of people tinker and share their results. Most is surely hype and will get thrown away and forgotten soon, but some is solid. And I am glad for it as I did not take part in that but now enjoy the results as the agents have become really good now.
I do think it'll be a while before LLMs make significant contributions to complex projects, though. For example I can't imagine many maintainers of the Linux kernel use LLMs much.
It also seems like none of them are relatively unique and all of them have been done before.
At work, I would say I've done plenty of "useful" things with AI, but that's hard to show off given that I work on an internal application.
Which should pair well with the “write a script” tactic.
Businesses wish this were the case, and many will even say it or start to believe it. But it doesn't bare out to be true in practice.
Think about it this way, engineers are expensive so a company is going to want to have as few of them as possible to do as much work as possible. Long before LLMs came along there have been many rounds of "replace expensive engineers" fads.
Visual programming was going to destroy the industry, where any idiot could drag and drop a few boxes and put together software. Turns out that didn't work out and now visual programming is all but dead. Then we had consultants and software consultancies. Why keep engineers on staff and have to deal with benefits and HR functions when you can hire consultants for just long enough to get the job done and end their contracts. Then we had offshoring. Why hire expensive developers in markets like California when you can hire far cheaper engineers abroad in a country with lower wages and laxer employment law. (It's not a quality thing either, many of these engineers are unquestionably excellent.)
Or, think about what happens when software companies get acquired. It's almost unheard of for the acquiring company to layoff all of the engineering staff from the acquired company right away, if anything it's the opposite with vesting incentives to convince engineers to stay.
If all that mattered was the code and the systems, and people were cogs that produced code that businesses wanted to optimise, then none of these actions make sense. You'd see companies offshore and use consultants with the company that does "good enough" as cheaply as possible. You'd see engineers from acquisitions be laid off immediately, replaced with cheaper staff as fast as possible.
There are businesses like that operate like this, it happens all the time. But, all of the most successful and profitable tech companies in the world don't do this. Why?
The second thing Claude Code does is when it reaches the end of its context window it /compact the session, which takes a summary of the current session, dumps it into a file, and then starts a new session with that summary. But it also retains logs of all the previous sessions that it can use and search through.
Looking over my session of Claude Code, out of the 256k tokens available, about 50k of these tokens are used among "memory" and session summaries, and 200k tokens are available to work with. The reality is that the vast majority of tokens Claude Code uses is for its own internal reasoning as opposed to being "front-end" facing so to speak.
Additionally given that ChatGPT Codex just increased its context length from 256k to 1 million tokens, I expect Anthropic will release an update within a month or so to catch up with their own 1 million token model.
And even for the ones that might "beat the rap", I don't understand from your descriptions why they are interesting or unique. A voice note recorder? Cool. There are already hundreds if not thousands of those, why did you need to make your own in the first place? I'm not saying that yours isn't special, I'm just saying that it doesn't help to post the blandest description possible if you're trying to impress people with the utility of your utility.
And with AI the result of 99.9% is abandonware. Just piles of code no one will ever touch again.
Which proves the point of no productivity gains. Its just cheap dopamine hits.
Sounds like something that could be tried as a fix for a kind of OCD (obsessive seconds counting).
Quite simply, I don't think that they are asking or arguing in good faith.
From where I stand this thing is going to provide great leverage to those who don’t simply just write code. I personally doubt the thing will ever get to a place where it can be trusted to operate alone - it needs a team of people and to go super fast you need more people.
Moreover, the price won’t be high due to competition.
I’ve changed my view on LLMs as being good, as long as competition is fierce.
This is exactly the same reason why the appropriate question to ask about Haskell is "where are the open source projects that are useful for something that is not programming?"
The answer for Haskell after 3 decades is very, very little. Pandoc, Git Annexe, Xmonad. Might be something else since I last did the exercise but for Haskell the answer is not much. Then we examine why the kids (us kids of all ages) can't or don't write Haskell programs.
The answer for LLM coding may be very different. But the question "where is the software that does something that solves a problem outside its own orbit" is crucial. (You have a problem. You want to use foo to solve it, now you have two problems but you can use foo to solve a part of the second one!!)
The price of getting code written just went down. Where are the site/business launches? Apps? New ideas being built? Specifically. With links. Not general, hand-wavy "these are the sorts of things that ..." because even if it's superb analysis, without some data that can be checked it's indistinguishable from hype.
Whatever data we get will be very informative.
Simon toy that's integrated into an ear training tool?
Blindfold chess with Last N moves hidden?
Mensa-style slide puzzles?
An extension that converts random words into phonetic equivalents like morse, braille, and vorticon?
I've also made some way less useful stuff like a win32 app that lets you physically grab a window and hurl it which invokes an WM_DESTROY when it completely is off the screen.
And an app that measures low frequencies to tell if you are blowing into the mic and then increases the speed of the CPU fan to cool it down.
I believe your skills are atrophying when you use these things no matter how trivial the case. That compounds with their bias towards solving problems by producing more code to further reduce your productivity without them.
Constant enshittification and UI redesigns are driven by the provider to justify monthly extortion.
I looked into doing it manually, but gave up. Way too much dirty work and me no energy for that.
Then I discovered that claude CLI got good - and told it to do it (with some handholding).
And it did it. Build process modernized. No more outdated dependencies. Then I added some features I missed in the original wick editor. Again, it did it and it works.
A working editor that was abandoned and missed features - now working again with the missing features. With minimal work done from my side (but I did put in work before to understand the source).
I call this a very useful result. There are lots of abandoned half working projects out there. Lots of value to be recovered. Unlike Haskell, Agents are not just busy with building agents, but real tools. Currently I have the agents refactor a old codebase of mine. Lot's of tech dept. Lot's of hacks. Bad documentation. There are features I wanted to implement for ages but never did as I did not wanted to touch that ugly code again. But claude did it. It is almost scary of what they are already capable of.
I do agree with you to some extent. I think anyone who uses LLMs will need to set aside some time writing code by hand to keep their skills sharp.
1. The closer the context gets to full the worse it performs.
2. The more context it has the less it weights individual items.
That is Claude might learn you hate long functions and add a line about short functions. When that is the only thing in the function it is likely to follow other very closely. But when it’s 1 piece of such longer context, it is much more likely to ignore it.
3. Tokens cost money even you are currently being subsidized.
4. You have no idea how new models and new system prompt will perform with your current memory.md file.
5. Unlike learning something yourself, anything you teach Claude is likely to start being controlled by your employer. They might not let you take it with you when you go.
Seems like the bar is now it has to be a mass market product. On another post someone else commented a SaaS doesn't count if it doesn't earn sustainable revenue.
I guess OpenClaw also doesn't count because we don't know how much Peter got from OpenAI.
This is an ideological flame war, not a rational discussion. There's no convincing anyone.
keep in mind that those 50k memory tokens would likely be cached after the first run and thus significantly cheaper
Moreover though, I'm not even saying you shouldn't do those things. I'm actually playing around with AI quite a bit, and certainly have created my share of useless/productivity tools. But it's not a flex to show off your own Flappy Birds or OpenNanoClaw clone, even if they are written in COBOL or MUMPS.
And they definitely do not have to be "extremely useful". But they should answer the question: what problem does it solve?
For example, I checked out their "Fictional Encyclopedia". It's an absolutely terrible project, much worse than useless, because it claims to be an "encyclopedia" right in the name (the tagline is "Everything about everything"), yet it's engineered to just completely make things up, and nowhere on the page does it indicate this! I looked up my own niche open-source project, and was prepared to be at least somewhat impressed that it pulled together facts on the fly into an encyclopedic form. For the first couple of paragraphs that seemed like it might be the case, then it veered into complete fantasy and just kept going.
Like what is the point of this? I can already ask a chatbot the same question and at least then I have explicit indicators that it might be hallucinating. But this page deliberately confuses truth and reality for absolutely zero purpose. It's a waste of brain cells, for both the creator and the consumer, with no redeeming value. It's neither interesting, nor different, nor valuable. AND it's burning tokens to boot!
I mean, come on, the bar is not that high. Some of stavros' projects may even be over it. But the first projects I checked were sub-basement, and I am not interested in searching through mounds of trash for what might be a quarter dollar. I'm actually kind of disappointed that stavros didn't have (or apply) the sense or taste to whittle down that list of 11 (!) projects to some 3 that show off the value of their work. Which I'm starting to understand is everyone's issue with AI brain rot; it seems to just encourage "here's everything, I dunno, you figure it out" which is maddening and deserves the pushback it gets.
And it’s exactly what I expected - lines of code. Cute. But… so what? This is not good for the AI hype and nor any continued support for future investment.
On the other hand all this stuff is going to drive continual innovation. The more tokens generated the more model producers invest. And we might eventually get to a place of local models.
That’s not even mentioning that this tools doesn’t do much beyond wrap a call to Claude. And it’s using Claude to display blood test data to the end user. This is not something I’d trust an LLM to not mess up. You’d really want to double check every single result.
Something like this would be anxiety inducing for most people, I bet. That'd be an excellent experiment, track heart rate, EEG, and performance on a range of cognitive tasks with 2 minute long breaks between each tasks, one group exposed to the irregular ticking, another exposed to regular ticking, another with silence, and one last one with pleasant white noise.
Mon, Mar 9, 2026 | View comments on Hacker News
I started programming in middle school. The first thing I remember writing is HTML for my neopets homepage. That morphed into writing static sites for Minecraft servers, and later on Java plugins for Minecraft.
Programming is so fun. I took it on as a hobby and it became an obvious career path, but I didn’t realize how well engineers were paid until my junior year in college. I had received an internship at AWS and was astonished. That led to a return offer and eventually to where I am today, with about seven years of professional experience and another seven of self-teaching/programming as a hobby.
I felt I was a year or two ahead of the application-focused classes I took. The more theoretical courses helped round out my knowledge and was the foundation I needed to work at tech companies.
In college I programmed all of the time. I found problems to solve. I found libraries and languages to try. I always wanted to learn more — to make sure my code was well-architected, maintainable, “clean”. This led to me reading programming books for fun. Clean Code taught me how to write better Java. You Don’t Know JavaScript taught me that JavaScript is actually quite a good language. Category Theory for Programmers taught me that it’s really hard to find a job writing Haskell.
I tend to qualify my statements before talking about AI. I write this to make it clear that I have a passion for programming. The money is quite nice but it’s incidental. I would be doing this if I were rich. I honestly cannot imagine my life without programming — it’s so satisfying to learn, solve problems, and build something others can use.
I was initially quite hesitant on applying AI to programming. I avoided GitHib Copilot when it came out. I thought Cursor was overhyped. I didn’t understand why someone would use Claude Code (a CLI/TUI interface) over an IDE.
I have spent years caring about architecture, type systems, maintainability. I am quite good at paying attention to every little detail. I wanted full control over my code. How could I have control if AI is writing everything? How could I be sure it wasn’t writing my project in a substandard way?
Effectively using AI required fundamental shift in how I thought about my projects. Why did I care about types? Why do we have design patterns? Why does code need to be maintainable or “well written”. For hobby projects, it can be a source of pleasure to write and see beautiful code.
That’s not an acceptable reason for projects I’m paid to work on, though. At work, all that matters is that value is delivered to the business. Code needs to be maintainable so that new requirements can be met. Code follows design patterns, when appropriate, because they are known solutions to common problems, and thus are easy to talk about with others. Code has type systems and static analysis so that programmers make fewer mistakes.
Speaking in the context of solving a problem: does AI need to write beautiful code? No. It needs to write code that works. The code doesn’t need to be maintainable in the traditional sense. If you have sufficient tests, you can throw some LLMs at a pile of “bad” code and have them figure it out. Type systems and static analysis continue to be useful to LLMs, if not more so than humans.
This is all to say: if you care about solving a problem more than gaining satisfaction, LLMS fit the bill. I’ve discovered that, largely, I enjoy solving problems more than I care about writing code. I haven’t written code, for work or personal projects, since October 2025. I’ve only written prompts and reviewed (a lot of) LLM output. This has led to me making a massive number of projects.
I’ve done all of these in the last 9 months with the help of Cursor and Claude Code.
I’ve accomplished so much with AI. I still am figuring out how to use it effectively. There is so much power in these tools — there has never been a better time to be a programmer who just wants to build things.
At the same time, it is a bit exhausting. Work in particular has been difficult as we slowly embrace these new tools. There are many problems to solve around testing, developer experience, and velocity.
For personal projects, testing and documentation has been become the bottleneck. I have to make sure that the LLM has produced the correct thing, and that the documentation it has written is truthful.
As an industry I think we have to invest a significant amount of effort into better tools for testing and generated documentation.
I’ll continue use these tools with the hope that they don’t make me obsolete too quickly.
Tancred loses Erminia's trail in a wood, and is tricked into stepping into a dungeon. Battle rages outside Jerusalem, and Godfrey is forced to try to take the city, putting Clorinda at risk.
via The Eclectic Light Company Mon, Mar 9, 2026
AI-based assistants or "agents" -- autonomous programs that have access to the user's computer, files, online services and can automate virtually any task -- are growing in popularity with developers and IT workers. But as so many eyebrow-raising headlines over the past few weeks have shown, these p...
via Krebs on Security Sun, Mar 8, 2026
We hate having to feel like we have to double check everything. We have an asymmetric relationship with gains and losses etc.
Is it me or is this stuff flying over peoples heads?
Steve Jobs once said a thing about the belief that an idea is 90% of the work is a disease. He is and was absolutely right.
It's just the right amount of "did that clock just skip a beat? Nah must just be my imagination".