and also wrote about it https://s2.dev/blog/distributed-ai-agents
i guess gastown is a better choice for now? idk i don't feel good about "relatively stable"
I was much more focused on integrating with ticketing systems (Notion, Github Issues, Jira, Linear), and then having coding agents specifically work towards merging a PR. Scion's support for long running agents and inter-container communication looks really interesting though. I think I'll have to go plan some features around that. Some of their concepts, make less sense to me, I chose to build on top of k8s whereas they seem to be trying to make something that recreates the control plane. Somewhat skeptical that the recreation and grove/hub are needed, but maybe they'll make more sense once I see them in action the first time.
> https://en.wikipedia.org/wiki/SCION_(Internet_architecture)
You...do have all the same abstraction layers, right? No? Oh. Well, don't worry, Google/Amazon/Microsoft can sell you those if you don't want to pay your IT staff to prop it up for you.
---
Look, snark aside, yours is the correct take. Google's solutions are amazing, but they're also built for an organization as large and complex as Google. Time will tell if this is an industry-standard abstraction (a la S3 APIs) or just a Google product for Google-like orgs/functions (a la K8s).
My main complaints with Gastown are that (1) it's expensive, partly because (2) it refuses to use anything but Claude models, in spite of my configuration attempts, (3) I can't figure out how to back up or add a remote to its beads/dolt bug database, which makes me afraid to touch the installation, and (4) upgrading it often causes yak shaving and lost context. These might all be my own skill issues, but I do RTFM.
But wow, Gastown gets results. There's something magic about the dialogue and coordination between the mayor and the polecats that leads to an even better experience than Claude Code alone.
I've not been impressed with any of them. I do use their ADK in my custom agent stack for the core runtime. That one I think is good and has legs for longevity.
The main enterprise problem here is getting the various agent frameworks to play nice. How should one have shared runtimes, session clones, sandboxes, memory, etc between the tooling and/or employees?
The "Build container images" is confusing and written for someone who is familiar with GitHub Actions and ghcr.io. `fork this repo` should be linkified. The page never actually tells you where the repo is; you have to infer it from a `go install` command. It's https://github.com/GoogleCloudPlatform/scion/.
Under "Building Custom Images," option 1 seems to require having cloned the repository, but as mentioned earlier, there's no indication of where the repo is hosted. This option fails for me with `=> ERROR [internal] load metadata for ghcr.io/sowbug/core-base:latest`, and I don't know what that means. My guess would be it's a dependency error because the build-images.sh script was not tested with a completely clean registry like mine. (Update: I got farther by adding `--target core-base` to the script arguments. This means that the script documentation "Build scion-base + all harness images" is misleading, because it requires you to know that scion-base depends on core-base, but the "Quick Start" script doesn't build that for you.) (Update 2: this still fails with "failed to fetch anonymous token," which I'm guessing is needed for me to upload to my registry. To reiterate, until today I had no idea that I had a registry.)
Option 2 starts with "If your project is hosted on GitHub," but I don't have a project (I want to test-drive Scion to start one), so I don't know whether that option applies to me. I tried it anyway, and it failed with a similar error as option 1 -- "core-base:latest: not found."
I'm not going to try Option 3, because it appears to give even less background than the prior options.
The instructions are inconsistent about referring to `ghcr.io/myorg` or `ghcr.io/<your-username>`. If anything, it is consistently backward in that it includes ghcr.io/myorg in text designed to be copied/pasted, and ghcr.io/<your-username> in text designed to be read.
If you look at this orchestration example
https://github.com/ptone/scion-athenaeum
its just markdown - Scion is the game engine
(a port of gastown to run on scion is in progress)
Designed to manage concurrent agents running in containers across local and remote compute, Scion is an experimental orchestration testbed that enables developers to run groups of specialized agents with isolated identities, credentials, and shared workspaces.
Google describes Scion as a "hypervisor for agents" that enables to integrate multi-agent system components like agent memory, chatrooms, and task management as orthogonal concerns.
Scion orchestrates "deep agents" (Claude Code, Gemini CLI, Codex, and others) as isolated, concurrent processes. Each agent gets its own container, git worktree, and credentials — so they can work on different parts of your project without stepping on each other. Agents run locally, on remote VMs, or across Kubernetes clusters.
Scion enables developers to manage a graph of tasks that evolve dynamically and execute in parallel pursuing distinct goals, such as coding, auditing, and testing. Rather than relying on a fixed set of agents, it support distinct agent lifecycles, with some agents being specialized and long-lived, while others are ephemeral and tied to one single task.
One basic tenet of Scion is preferring isolation over constraints to make agents operation safe. This means that instead of constraining an agent's behavior by defining rules and embedding them into its context, Scion opts for letting agents do whatever they need to do to complete their tasks while enforcing outside boundaries and guardrails:
Scion favors running agents in --yolo mode, while isolating them in containers, git worktrees, and on compute nodes subject to network policy at the infrastructure layer.
Scion supports multiple popular agents through adapters called harnesses, which manage lifecycle, authentication, and configuration. Supported agents include Gemini, Claude Code, OpenCode, and Codex, though support for the latter two is currently partial.
Developers can use distinct containerization runtimes with Scion, including Docker, Podman, Apple containers, and Kubernetes via named profiles.
To use Scion developers should familiarize themselves with its unique lexicon, which includes concepts such as grove, corresponding to a project; hub, a central control plane for orchestration; runtime broker, a machine where hubs run, and others.
To showcase Scion's capabilities, Google has released the codebase for a game, Relics of the Athenaeum, in which groups of agents collaborate to solve computational puzzles. The code demonstrates how distinct agents running on different harnesses work together to impersonate distinct characters, with a game runner in charge of spawning new characters/agents and those agents in turn spawning worker and specialized agents dynamically. Collaboration occurs through a shared workspace for reading and writing data about the challenge and solutions, as well as via direct messages and party-wide broadcasts.
Show moreShow less
I modified file_read/write/edit to put the contents in the system prompt. This saves context space, i.e. when it rereads a file after failed edit, even though it has the most recent contents. It also does not need to infer modified content from read+edits. It still sees the edits as messages, but the current actual contents are always there.
My AGENTS.md loader. The agent does not decide, it's deterministic based on what other files/dirs it has interacted with. It can still ask to read them, but it rarely does this now.
I've also backed the agents environment or sandbox with Dagger, which brings a number of capabilities like being able to drop into a shell in the same environment, make changes, and have those propagate back to the session. Time travel, clone/fork, and a VS Code virtual FS are some others. I can go into a shell at any point in the session history. If my agent deletes a file it shouldn't, I can undo it with the click of a button.
I can also interact with the same session, at the same time, from VS Code, the TUI, or the API. Different modalities are ideal for different tasks (e.g. VS Code multi-diff for code review / edits; TUI for session management / cleanup).
One that I retired was used for serving ftp(among other transfer stuff), ftp of all things, it needs to have ports open and routed back from the client. And for extra points they had the pods capped at 1 cpu. And I had to explain the thing to the perpetrator and their boss, madness.
If someone wants production K8s, I'm steering them (and their budget) to a managed control plane from one of the major cloud providers. Trying to prop it up locally when it really hates having to work directly with bare metal does not spark joy.