the interesting design tension i ran into building in this space is context management for longer sessions. the model accumulates tool call history that degrades output quality well before you hit the hard context limit - you start seeing "let me check that again" loops and increasingly hedged tool selection.a few things that helped: (1) summarizing completed sub-task outputs into a compact working-memory block that replaces the full tool call history, (2) being aggressive about dropping intermediate file read results once the relevant information has been extracted, and (3) structuring the initial system prompt so the model has a clear mental model of what "done" looks like before it starts exploring.the swift angle is actually a nice fit - the structured concurrency model maps well to the agent loop, and the strong type system makes tool schema definition less error-prone than JSON string wrangling in most other languages.

I think this is a good learning project, based in a long perusal of the github repo. One suggestion: don’t call the CLI component of the project ‘claude’ - that seems like asking for legal takedown problems.

I built a Swift library called Operator [0] to run the core agent loop, if it would save anyone time.

[0]: https://github.com/bensyverson/Operator

Interesting, I'm also building one in Swift :D Seems like a good learning experience.

How practically could we drop in Apple Intelligence once it's using Gemini as its core for a 100% local AI agent in a box?

I built a Swift library called Operator [0] to run the core agent loop, if it would save anyone time.

[0]: https://github.com/bensyverson/Operator

How practically could we drop in Apple Intelligence once it's using Gemini as its core for a 100% local AI agent in a box?

IIUC Gemini will run in Apple's cloud infra, not on device. The only "gemini" local model is really old by today's standards, and is not that smart for local inference (newer open source models are better).

Yeah, this is basically what I ran into too. I actually wrote about this in Stage 6 (https://ivanmagda.dev/posts/s06-context-compaction/) I went with your option (1): once history crosses a token threshold, the agent asks the model to summarize everything so far, then swaps the full history for that summary. Keeps the context window clean, though you do lose the ability to go back and reference exact earlier tool outputs.

The hard part was picking when to trigger it. Too early and you're throwing away useful context. Too late and the model's already struggling. I ended up just using a simple token count — nothing clever, but it works.

And yeah, the Swift angle was genuinely fun. Defining tool schemas as Codable structs that auto-generate JSON schemas at compile time, getting compiler errors instead of runtime API failures is a huge win.

Good point, I'll rename the binary. Thanks for actually going through the repo.

Interesting, I'm also building one in Swift :D Seems like a good learning experience.

swift-claude-code

Exploring the architecture of coding agents by rebuilding a Claude Code-style CLI from scratch in Swift.

demo

Learning Series

A complete 9-part learning series is available on ivanmagda.dev.

Start the series →

Why This Exists

Claude Code feels unusually effective compared to other coding agents, and I suspect most of it comes from architectural restraint rather than architectural complexity. I studied the tool surface, traced the interaction loop, and tried to isolate which design choices actually matter.

My working theory: coding agents benefit more from a small set of excellent tools and tight loop design than from large orchestration layers.

Claude Code doesn't have many tools. The tools it does have are simple: a search tool, a file editing tool. But those tools are really good. And the system leans on the model far more than most agent implementations — less scaffolding, more trust in the LLM to do the heavy lifting.

This project tests that idea by rebuilding the core mechanics from scratch in Swift, one stage at a time, to see how little architecture you actually need.

Hypothesis

This project tests a few specific ideas about coding agents:

A small number of high-quality tools beats a large tool catalog
The model should do most of the heavy lifting — thin orchestration, not thick
Explicit task state improves reliability more than prompt-only planning
Controlled context injection matters more than persistent memory
Context compaction is a product feature, not just a token optimization

Each stage is designed to isolate one mechanism and see what it enables.

The Agent Loop

The whole thing boils down to one loop:

func run(query: String) async throws -> String {
    messages.append(.user(query))

    while true {
        let request = APIRequest(
            model: model, system: systemPrompt, messages: messages, tools: Self.toolDefinitions
        )
        let response = try await apiClient.createMessage(request)
        messages.append(Message(role: .assistant, content: response.content))

        guard response.stopReason == .toolUse else {
            return response.content.textContent
        }

        var results: [ContentBlock] = []
        for block in response.content {
            if case .toolUse(let id, let name, let input) = block {
                let output = await executeTool(name: name, input: input)
                results.append(.toolResult(toolUseId: id, content: output, isError: false))
            }
        }
        messages.append(Message(role: .user, content: results))
    }
}

The loop is the invariant. Tools are the variable. Every stage adds entries to the tool handler dictionary and injection points before the API call, but the loop body itself never changes.

Roadmap

Progress is tracked via git tags. The roadmap is split into two phases — core mechanics first, then product-level features.

Phase 1 — Core Loop

The minimum viable agent: a loop and a small set of good tools.

Stage	What It Adds	Tag
00	Bootstrap: SPM project, two-target layout, CI	`00-bootstrap`
01	Agent loop + bash tool	`01-agent-loop`
02	Tool dispatch: `read_file`, `write_file`, `edit_file` with path safety	`02-tool-dispatch`
03	Todo tracking with nag reminder injection	`03-todo-write`

Phase 2 — Product Mechanics

The features that make an agent feel like a usable product: context, memory management, and persistence.

Stage	What It Adds	Tag
04	Subagents: recursive loop with fresh context	`04-subagents`
05	Skill loading: `.md` files injected as tool results	`05-skill-loading`
06	Context compaction: 3-layer strategy (micro, auto, manual)	`06-context-compaction`
07	Task system: file-based CRUD with dependency DAG	`07-task-system`
08	Background tasks: `Task {}` + actor-based notification queue	`08-background-tasks`

Architecture

Two-target Swift Package Manager project:

Core is the library — API client, shell executor, agent loop, tools.

CLI is just the entry point. The executable is called agent.

Raw HTTP to POST https://api.anthropic.com/v1/messages using AsyncHTTPClient. Works on both macOS and Linux.

Non-Goals

This project is not:

A full Claude Code clone or drop-in replacement
A general-purpose multi-agent framework
Production-ready IDE tooling

It's a staged exploration of coding-agent architecture — intentionally minimal, intentionally incomplete.

Tech Stack

Swift 6.2 with strict concurrency
AsyncHTTPClient (SwiftNIO-based) for cross-platform HTTP + streaming SSE
Foundation Process for shell command execution
macOS 10.15+ / Linux

Getting Started

git clone https://github.com/ivan-magda/swift-claude-code.git
cd swift-claude-code

# Set up your API key and model
cp .env.example .env
# Edit .env with your ANTHROPIC_API_KEY and MODEL_ID

swift build
swift run agent

References

Anthropic Messages API — the single endpoint the entire agent talks to
Anthropic Tool Use — how tool definitions, tool_use, and tool_result work

License

MIT