This, IMO is the biggest insight into where we're at and where we're going:
> Because both evaluation and self-modification are coding tasks, gains in coding ability can translate into gains in self-improvement ability.
There's a thing that I've noticed early into LLMs: once they unlock one capability, you can use that capability to compose stuff and improve on other, related or not, capabilities. For example "reflexion" goes into coding - hey, this didn't work, let me try ... Then "tools". Then "reflxion" + "tools". And so on.
You can get workflows that have individual parts that aren't so precise become better by composing them, and letting one component influence the other. Like e2e coding gets better by checking with "gof" tools (linters, compilers, etc). Then it gets even better by adding a coding review stage. Then it gets even better by adding a static analysis phase.
Now we're seeing this all converge on "self improving" by combining "improving" components. And so on. This is really cool.
I've had to really shift how I think about building code bases, alot of logic can go into claude skills and sub agents. Requires essentially relearning software engineering
Its is the core of any and all learning/exellency; exposure to chaotic perturbations allow selection of solutions that are then generalized to further, ever more straining problems; producing increasingly applicable solutions.
This is the core of evolution, and is actually derivable from just a single rule.
But this idea of having a task agent & meta agent maybe has wings. Neat submission.
OTOH, there's loads you can do for evaluation before a human even sees the artifact. Things like does the site load, does it behave the same, did anything major change on the happy path, etc etc. There's a recent-ish paper where instead of classic "LLM as a judge" they used LLMs to come up with rubrics, and other instances check original prompt + rubrics on a binary scale. Saw improvements in a lot of evaluations.
Then there's "evaluate by having an agent do it" for any documentation tracking. Say you have a project, you implement a feature, and document the changes. Then you can have an agent take that documentation and "try it out". Should give you much faster feedback loops.
Larger composition, though, starts to run into typical software design problems, like dependency graphs, shared state, how to upgrade, etc.
I've been working on this front for over two years now too: https://github.com/smartcomputer-ai/agent-os/
On this view, learning in general operates via selection under uncertainty. This is less visible in individual cognition, where we tend to over-attribute agency, but it is explicit in science: hypotheses are proposed, subjected to tests, and selectively retained, precisely because the future cannot be deduced from the present.
In that sense, generation/discrimination is a particular implementation of this broader principle (a way of instantiating variation and selection) not the primitive itself.
I've always felt that the most important part of engineering was feedback loops.
Maybe nature is the greatest engineer ever?
Abstract:
Self-improving AI systems aim to reduce reliance on human engineering by learning to improve their own learning and problem-solving processes. Existing approaches to self-improvement rely on fixed, handcrafted meta-level mechanisms, fundamentally limiting how fast such systems can improve. The Darwin Gödel Machine (DGM) demonstrates open-ended self-improvement in coding by repeatedly generating and evaluating self-modified variants. Because both evaluation and self-modification are coding tasks, gains in coding ability can translate into gains in self-improvement ability. However, this alignment does not generally hold beyond coding domains. We introduce \textbf{hyperagents}, self-referential agents that integrate a task agent (which solves the target task) and a meta agent (which modifies itself and the task agent) into a single editable program. Crucially, the meta-level modification procedure is itself editable, enabling metacognitive self-modification, improving not only the task-solving behavior, but also the mechanism that generates future improvements. We instantiate this framework by extending DGM to create DGM-Hyperagents (DGM-H), eliminating the assumption of domain-specific alignment between task performance and self-modification skill to potentially support self-accelerating progress on any computable task. Across diverse domains, the DGM-H improves performance over time and outperforms baselines without self-improvement or open-ended exploration, as well as prior self-improving systems. Furthermore, the DGM-H improves the process by which it generates new agents (e.g., persistent memory, performance tracking), and these meta-level improvements transfer across domains and accumulate across runs. DGM-Hyperagents offer a glimpse of open-ended AI systems that do not merely search for better solutions, but continually improve their search for how to improve.
Another thing that get quantized is video preferences to maximize engagement.
Here is a breakdown - https://vectree.io/c/plant-self-incompatibility-logic
# API keys, put these into .env file
OPENAI_API_KEY=...
ANTHROPIC_API_KEY=...
GEMINI_API_KEY=...
# Install things
sudo dnf install -y python3.12-devel
sudo dnf install -y graphviz graphviz-devel cmake ninja-build bzip2-devel zlib-devel ncurses-devel libffi-devel
# Create virtual environment
python3.12 -m venv venv_nat
source venv_nat/bin/activate
pip install -r requirements.txt
pip install -r requirements_dev.txt
# To build the docker container
docker build --network=host -t hyperagents .
# Setup initial agents
bash ./setup_initial.sh
# See the script for args, and baseline selections
python generate_loop.py --domains <domain>
By default, outputs will be saved in outputs/ directory.
agent/ code for using foundation modelsanalysis/ scripts used for plotting and analysisdomains/ code for each domainutils/ common code used in the reporun_meta_agent.py script to help run the meta agent and get the diffsmeta_agent.py main implementation of the meta agenttask_agent.py main implementation of the task agentgenerate_loop.py entry point for running the algorithmThe experiment logs are stored as a multi-part ZIP archive. To extract them, ensure all .z01, .z02, etc., files are in the same directory as the .zip file, then run:
zip -s 0 outputs_os_parts.zip --out unsplit_logs.zip
unzip unsplit_outputs.zip
[!WARNING]
This repository involves executing untrusted, model-generated code. We strongly advise users to be aware of the associated safety risks. While it is highly unlikely that such code will perform overtly malicious actions under our current settings and with the models we use, it may still behave destructively due to limitations in model capability or alignment. By using this repository, you acknowledge and accept these risks.
If you find this project useful, please consider citing:
@misc{zhang2026hyperagents,
title={Hyperagents},
author={Jenny Zhang and Bingchen Zhao and Wannan Yang and Jakob Foerster and Jeff Clune and Minqi Jiang and Sam Devlin and Tatiana Shavrina},
year={2026},
eprint={2603.19461},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2603.19461},
}