antirez 21 hours ago. 88634 views.
I started working on the new Array data type for Redis in the first days of January. The PR landed the repository only now, so this code was cooked for four months. I worked at the implementation kinda part time (kinda because many weeks were actually full time, sometimes to detach yourself from the keyboard is complicated), and even before LLMs the implementation was likely something I could do in four months. What changed is that in the same time span, I was able to do a lot more. This is the short story of what happened.
In the first month I just wrote the specification document. The rationale for the new data type, the C structures, the sparse representation used, the exact semantics of the array cursor for ring buffer and ARINSERT. I started writing for days a long specification by hand, then I paired with Opus initially, then GPT 5.3 was released and I switched all the design and development with Codex. Since then I use only GPT 5.x for system programming tasks. Thanks to AI, the specification evolved a lot, via back and forth of feedback, intellectual challenges about what was the best design, what was the right compromise, what was too engineered and what not.
Starting from the second month, I started the implementation using automatic programming (auto coding if you prefer), constantly reviewing the developed code. Then I realized that the level of indirection I picked was wrong. I really wanted people to be able to do ARSET myarray 293842948324 foo and everything to still work without huge allocations. The two levels of directory + slices (sparse and dense) I had were not enough. Because I had AI, I took no compromises, and I decided to go the extra mile. Once certain conditions are reached, the data structure internally changes shape, and becomes a super directory of sliced dense directories, that also point to the actual array slices (4096 elements per slice, by default). This design provided still the internal "is actually an array" representation I wanted, and the memory characteristics I seeked, while being able, for ARSCAN, and ARPOP, to scan the existing arrays taking a time proportional to the existing elements and not to the range span.
Then, it was time to read all the code, line by line. Everything was working, and this type has massive testing, thanks, again to AI, but still things that superficially work do not mean they are optimal. I found many small inefficiencies or design errors that I didn't want, so I started a process of manual and AI-assisted rewrite of many modules. When this stage was done, I started, during the third month, to stress test the implementation in many different ways. I started to be confident that it was really solid, useful, well designed.
Then… it happened. While modeling different use cases to see if the data structure was comfortable to use, I started to put markdown files into Redis arrays. Because files are a very good match for it. At this point, as I was working for other goals with agents, I realized that I could have the skills markdown files centralized knowledge base that I needed, so from a need of mine I decided to implement ARGREP. But I wanted regular expressions, too. What library to pick?
I ended up picking TRE (thanks Ville Laurikari!), because when you have regexp in Redis, you want to be sure that there are no pathological patterns in time or space. But TRE was very inefficient in a specific and extremely useful case, that is matching foo|bar|zap. So with the help of GPT I optimized it, fixed a few potential security issues, and extended the test. I had everything in place.
You know what was the biggest realization of all that? For high quality system programming tasks you have to still be fully involved, but I ventured to a level of complexity that I would have otherwise skipped. AI provided the safety net for two things: certain massive tasks that are very tiring (like the 32 bit support that was added and tested later), and at the same time the virtual work force required to make sure there are no obvious bugs in complicated algorithms. To write the initial huge specification was the key to the successive work, as it was the key to review each single line of sparsearray.c and t_array.c and modifying everything was not a good fit.
I didn't spend any word on the use cases as I tried to document the PR itself with a message where they are detailed:
https://github.com/redis/redis/pull/15162
So it was not really useful to repeat myself here. Enough to say that I really believe it is about time for Redis to have a data type where the numerical index is part of the semantics.
I hope the Array PR will be accepted soon, and that we can benefit from the new use cases it opens. Of course, feedback is welcomed. Thank you.
or maybe the conclusion is that model providers need to clean up their training data!
Who is going to do an LLM free fork?
The RE component is interesting, but as commentary here has noted it seems orthogonal to the array data structure (i.e., usable on others as well). Does this not make more sense to accomplish with Lua scripting? Or if performance of Lua is an issue perhaps abstracting OP to be composable on top of any command that returns a range of values.
I say this with reverence for Antirez as the expert in this space, but some of this new feature set feels like the sort of solution that I tend to see arise from LLM-driven development; namely creation of new functionality instead of enhancement of existing, plus overcomplicating features when composition with others might be more effective.
I start with a high level design md doc which an AI helps write. Then I ask another AI - whether the same model without the context, or another model - to critique it and spot bugs, gaps and omissions. It always finds obvious in hindsight stuff. So I ask it to summarize its findings and I paste that into the first AI and ask its opinions. We form an agreed change and make it and carry on this adversarial round robin until no model can suggest anything that seems weighty.
I then ask the AI to make a plan. And I round robin that through a bunch of AIs adversarially as well. In the end, the plan looks solid.
Then the end to end test cases plan and so on.
By the end of the first day or week or month - depending on the scale of the system - we are ready to code.
And as code gets made I paste that into other AIs with the spec and plan and ask them to spot bugs, omissions and gaps too and so on. Continually using other AI to check on the main one implementing.
And of course you have to go read the code because I have found it that AI misses polishes.
- the project essentially spans almost 3 different (albeit minor) generations of LLMs. Have you noticed major differences in their personas, behavior, output for that specific use case?
- when using AI for feedback, have you ever considered giving it different "personalities"? I have few skills that role play as very different reviewers with their own different (by design conflicting) personalities. I found this to improve the output, but also to be extremely tiring and to often have high noise ratio.
- when did you, if ever, felt that AI was slowing you down massively compared to just doing it yourself (e.g. some specific bug or performance or design fix)? Are there recurring patterns?
- conversely, how often did AI had moments where it genuinely gave you feedback or ideas that would've not come to you?
- last: do you have specific prompts, skills, setups, etc to work on specific repositories?
He is not "your avg dev" and it took him 4 months with llm.
This is not a seal of approval for you to go and command all your developers to move to Claude code/codex/any other ai coding tool fully.
I'm looking at you - any avg CEO of a startup.
@antirez: Introducing a regex feature that late into the project for a seemingly unrelated feature feels a bit weird? Can you explain more your rationale on that? thanks!
Very cool anyway! Can I expect a youtube video about this soon?
> And of course you have to go read the code because I have found it that AI misses polishes
Since you mentioned using other agents, do you get mileage out of code reviews with another agent polishing the unpolished bits? My colleagues swear by it, though I personally remain skeptical about its value without a human reviewer.
> Then I ask another AI
May be synthesis-antithesis-thesis works better in applied computer science... https://en.wikipedia.org/wiki/Dialectic#Criticisms
If every user of an LLM took this much care and attention, many people would have fewer issues with LLM assisted coding. In this case the author has demonstrated they can write plenty of code without an LLM, so why not use it carefully to benefit their productivity?
Then it quickly lost its original meaning as people started using it for virtually all forms of AI-assisted coding.
maybe shortening the term to "auto-code" would help tho.
I'm doing my work mostly the same as Antirez is doing, writing detailed spec (which is actually 80% of the hard work, even without LLMs), then where I would have written the "boring stuff" I use the LLM to "autocomplete", and then see all the mistakes (which require being a senior to see / fix), correct, and iterate
It makes the work "feel" easier because we mostly skip writing the boilerplate, but it still doesn't replace coders. And companies that think they will be able to skip training juniors (in order to later replace seniors) and still have seniors onboard are making a huge mistake
And I’m not saying that to poke fun at you (my workflow is essentially identical to yours), or at Google, but rather to say that there’s nothing new :)
AI is a fantastic accelerator of effective and ineffective workflows alike. It’s showing us which are effective and ineffective on way shorter timescales / in realtime!
2. Nope I don't give much personalities, but I use subtle prompt differences to maximize certain responses I want, to make the model focusing in a given detail or acting in a specific kind of engineering mindset.
3. It never happened that the AI was slowing me down since I always had the full context and code detail in mind of what was happening. I believe that this happens more when you don't have a clear idea. Also GPT >= 5.3/4 is not the past generation of models, it is very hard to trap it into a situation where it seems unable to understand what you mean.
4. A few times the AI provided fresh insights that I really liked. Most of the times it was the other way around. Certain implementations were written by the AI at a very impressive level of quality.
5. I don't use general skills, I build skills with deep search when needed for specific projects, and build an AGENT.md that works as a knowledge base as I work with the AI. One thing that I use a lot is, when there is a very complex problem, to tell GPT that I have a friend called Machiavelli that is an incredible computer scientist. To write him an email in /tmp/letter.md with the problem we are facing, and I'll try to get a reply. Then I ask GPT 5.5 Pro on the web with extensive reasoning set on. It will take sometimes 30 minutes or more to reply. Often times after I feed back the reply, the agent will be able to see things a lot more clearly.
To clarify, from TFA:
> even before LLMs the implementation was likely something I could do in four months. What changed is that in the same time span, I was able to do a lot more
The initial timeframe was 4 months, he was able to do more work within the same timeframe with LLMs.
He's not, but his work is obviously not average.
Average dev work is plumbing and CRUDs.
Virtually all major Redis features are a solo job of the post author.
By the way reviewers are paid good money for this and know the setup.
wc -l t_array.c sparsearray.c
2012 t_array.c
2063 sparsearray.c
4075 total (including comments)
Sure there are also the AOF / RDB glues, the tests, the vendored TRE library for ARGREP. But all in all it's self contained complexity with little interactions with the rest of the server.A quick note: if we focus only on that part of the implementation, skipping tests and persistence code which is not huge, 4075 lines in 4 months are an average of 33 lines per day, which is quite low.
2000 lines the sparse array.
2000 lines the t_array commands and upper layer implementation.
~500 lines of AOF / RDB code.
All the other stuff is tests, JSON command descriptions, TRE library under "deps".
> Then I ask GPT 5.5 Pro on the web with extensive reasoning set on. It will take sometimes 30 minutes or more to reply.
Any reason why Codex can't do that?
I've been working on a Database adapter for a couple months using an LLM... I've got a couple minor refactors to do still, then getting the "publish" to jsr/npm working... I've mostly held off as I haven't actually done a full review of the code... I've reviewed the tests, and confirmed they're working though. The hard part is there's some features I really want when in Windows to a Windows SQL Server instance that isn't available in linux/containers. I don't think I'll ever choose SQL again, but at least I can use/access a good API with windows direct auth and FILESTREAM access in Deno/Bun/Node.
FWIW: My final implementation landed on ODBC via rust+ffi so after I get the mssql driver out, I'll strip a few bits in a fork and publish a more generic odbc client adapter. using/dispose and async iterators as first class features in the driver.
I haven't been using multiple AIs adversarially as OP, but might consider giving it a try with Codex and Opus. That said, my AI workflow has been pretty similar... lots of iterations on just design, then iterations on documentation, testing, etc... then iterations on implementation, testing, validation and human review in the mix.
My analogy is that it's really close to working with a foreign dev team, but your turnaround is in minutes instead of days, where it's much more interactive.
To get a quality, lasting, result you're ultimately having to carefully study everything otherwise you end up quickly accumulating cognitive debt and the speedup soon shrinks as you're constantly having to revisit the initial approaches.
* I can work in code I'm not familiar with much easier
* LLMs often identify confusion or uncertainty upfront, so I can address it earlier.
* I'm much less mentally taxed so I can go for longer at my top end.
* Meetings, disruptions, end of day is WAY less critical since I can lean on the LLM to get back into things.
* I can do something else productive while the LLM is running. Bug fixes, documentation, PR reviews, etc.
antirez - you inspire a generation of devs. Thanks for all you do.
Because spotting holes in specs has never been one of my strengths. And working without technical colleagues much of the time, it's a boon to be able to "rubber-duck" my ideas with something that is at least more intelligent than plastic.
Grabbing multipliers from thin air, the coding bit may only be 2x faster with a poorer-quality outcome, but working out what's needed is a good 5x faster.
And yes, I'm using the same adversarial AI MO as @wood_spirit, combined with Matt Pocock's excellent /grill-me and /grill-with-docs skills [1] and Plannotator [2] to review the plans.
This is arguably a key quote: "Then, it was time to read all the code, line by line. ... I found many small inefficiencies or design errors ... so I started a process of manual and AI-assisted rewrite of many modules." We should not underestimate that step: reading code line by line might easily require more time than writing it from scratch.
Now I just need a way to protect my chats from any potential discovery, and <pew pew> business’ll be easy.
This looks like a very useful feature. Thank you again for the reply.
If the initial development bar is relatively high, it's far, far easier to identify flaws and gaps when you have the whole thing in front of you all at once.
c.f. valkey and others
... just speaking as someone who sometimes has to review very long PRs sometimes, though, I feel like 25% is a roughly normal level of "signal to noise." 5,000 lines of core logic is a LOT, and the tests and dependencies do still need to be read.
EDIT: I feel like the problem, as a reviewer, is processing 4 months of intensive research/development and providing useful feedback. At that point, there's probably not much major input you can have into the core architecture or strategy, so you're probably not providing much more than a bugbot at that point.
> For high quality system programming tasks you have to still be fully involved, but I ventured to a level of complexity that I would have otherwise skipped. AI provided the safety net for two things: certain massive tasks that are very tiring (like the 32 bit support that was added and tested later), and at the same time the virtual work force required to make sure there are no obvious bugs in complicated algorithms.
I feel strong making "dev" documentation though, since it seems a bit redundant/superfluous. I fully suspect nobody is going to read it at this point.
I remain unconvinced by the "faster to write it by hand than read it" arguments though. My experience throughout my career is that most people, myself included, top out at a couple of hundred lines of tested, production-ready code per day. I can productively review a couple of thousand.
Like:
[0] https://csci1710.github.io/2026/ and https://forge-fm.github.io/book/2026/
Might?
Reading code you did not write is always going to take more time if you do it properly.
When antirez says 'I ventured to a level of complexity that I would have otherwise skipped,' I don't think you can call that a minor gain. The alternative is likely something 'good enough' that leaves the community dissatisfied for months, and then after initial design mistakes become load-bearing the ideal implementation can never be realized.
"Automatic programming"
Sure you can? In this concrete case, Redis is very "flat" — there's the data structure implementations, and there's the commands that use them. 1+N. You could have feedback about the data structure (i.e. whether it's optimal for the use-cases); or about any of the commands (i.e. not just their impls, but also whether they're the best core API surface to lock in long-term, or even whether they're worth including at all.)
Any given feedback would necessitate fairly limited rework to address, as you're either modifying the data structure (and its tests) or a command (and its tests and docs.)
Let the LLM cook by doing the issues one by one. In the meantime I could start reviewing them. Checkout, running, reading. It was definitely faster since it also correctly linked everything, etc. of course once the change goes beyond that it probably is not working. However I really thought that a good idea would be to check for that work and implement it according to the issue description and change a Mr once the description changes, at least as long as the Mr is 1-3 lines. And even if it does not work, I can just discard it.
(A lot of these problems are often typos that do not even need a checkout, they come in through bigger Mrs that should not be blocked because of them)
Thanks for the links, going to have a read and see if I can apply any to my work.
> I started to refer to the process of writing software using AI assistance (soon to become just "the process of writing software", I believe) with the term "Automatic Programming"
I was confused because the last time I checked on things, it was still about fostering community input and advancement but not necessarily consensus. Things have tipped back in the original direction since then. I don't think "Redis was completely built in this way since the start" is completely accurate, but also the community effort under the new governance model never got very deeply entrenched while you were away.
Then you need a senior to go realize the 100 mistakes it did, fix them, and iterate, which is why you can't replace "natural intelligence"
And there are real mathematical reasons why computers won't be able to break through "mathematical reasoning" on their own (indecidability, etc)
In particular, doing direct comparisons between metrics like that doesn't work. "Lines of code" isn't a good way to measure complexity of the code, and the amount of time it takes to review the code will vary quite a bit based on the use case.
There's a lot of diversity in what kind of code people write and just because it worked for someone else doesn't mean it will work for the kinds of problems you solve. It's anecdotal evidence that someone else found it useful, your mileage may vary.
It would absolutely NOT work for production-code with critical concurrency / embedded / real-time stuff
To quote another of his posts:
> I fixed transient failures in the Redis test. This is very annoying work, timing related issues, TCP deadlock conditions, and so forth. Claude Code iterated for all the time needed to reproduce it, inspected the state of the processes to understand what was happening, and fixed the bugs.
...
> In the past weeks I operated changes to Redis Streams internals. I had a design document for the work I did. I tried to give it to Claude Code and it reproduced my work in, like, 20 minutes or less (mostly because I'm slow at checking and authorizing to run the commands needed).
From "Don't fall into the anti-AI hype" https://antirez.com/news/158
He's saying you should be writing up complex, highly detailed specs for the LLM to turn into code, stressing that it's critical to work in a self-contained and "textually representable" problem domain. This is not one-shotting complete products from a vague prompt. You'll still going to need software architects, and they'll still be doing much the same work. Turning fully-specified design into code has never been a "10x" task, it was always regarded as a relatively straightforward, if often tricky part of the job. And the way he worked with Redis makes it clear that you can't take what the AI delivers at face value, either: you'll have to go through it yourself, and that will take time and effort.