EvanFlow is a single TDD-driven loop. Say "let's evanflow this" and it walks brainstorm β plan β execute β tdd β iterate β STOP. Real checkpoints at design and plan approval. Never auto-commits, never auto-stages, never proposes integration - every git op is your call.
The three things that actually changed how I work:
1. Vertical-slice TDD. One failing test β minimal impl β next test. Watch each test fail before writing the impl that passes it. (Sounds obvious. Almost no agent does it by default. ~62% of LLM-generated test assertions are wrong per HumanEval research, so testing TDD discipline matters more than the impl discipline.)
2. Embedded grilling at decision points. Before locking a plan: what breaks if a user does X? What's the rollback? What's explicitly out of scope? Catches design flaws while they're still cheap.
3. Iterate-until-clean (hard cap of 5 rounds). Re-read the diff against dead code, naming, the deletion test, assertion correctness, and a Five Failure Modes pass (hallucinated actions, scope creep, cascading errors, context loss, tool misuse). For UI: screenshot via headless Chromium.
For bigger plans with 3+ independent units sharing types, it forks into a parallel coder/overseer orchestration. Integration tests at touchpoints ARE the cohesion contract.
Three install paths: Claude Code plugin marketplace, npx skills add, manual copy. MIT.
1) Do you not feel self-conscious or weird about calling this "EvanFlow"? Seems like a lot of people these days are naming their AI tools/skills/whatever after themselves which seems self-absorbed. Either that or they hope that if their thing takes off like OpenClaw did then they'll grab the fame that comes along with it.
2) Why does your TDD flow miss the refactor step of TDD?
It sucked so hard I thought the idea of agentic coding was just a joke. Ive tried it periodically and it literally never stopped sucking.
I figure if it cant do that part it isnt worth using it for any part.
Ever since then whenever people tell me it's gotten better I've tried it out and nope, still sucks.
I still get gaslit about how well it works by people who just discovered TDD though, and watch it power through CRUD boilerplate getting impressed, blissfully unaware that boilerplate spew is an antipattern.
Just write it yourself. I promise itβs worth it
How are these separate steps?
TDD is how you execute, not something you tack on afterwards.
Sometimes itβs helpful to ask oneself whatβs the benefit of an answer. I cannot think of any for your question and the way you worded it is a bit cringe. People name things after themselves all the time. It does not matter in the slightest.
Everybody who grew up to listen to Pearl Jam had seen or used an Evenflo pacifier, baby bottle, or car seat. That's one reason the song already sounded so familiar.
> Several rules come from 2025-2026 industry research on agentic coding failure modes
What are some of the papers you read?
We don't question when scientists name stuff after themselves so why question this? At least he gets some recognition for his work.
I can think of one example that did go somewhere: Linux.
A TDD-driven iterative feedback loop for software development with Claude Code.
16 cohesive skills + 2 custom subagents walk an idea from brainstorm through implementation, with checkpoints throughout where you stay in control. One entry point: say "let's evanflow this" and the orchestrator runs the loop.
brainstorm β plan β execute (sequential or parallel) β tdd β iterate β STOP
The loop is conductor, not autopilot: real checkpoints at design approval, plan approval, and after iteration. The agent stops short of every git operation and waits for your direction. No auto-commits. No forced ceremony. No "must invoke a skill" tax.
The recommended path β Claude Code's plugin marketplace:
/plugin marketplace add evanklem/evanflow
/plugin install evanflow@evanflow
Restart, then try:
"Let's evanflow this β I want to add a small feature that does X."
evanflow-go fires and walks the loop. The git-guardrails hook auto-activates with the plugin (no settings.json edit needed). Skills appear under the evanflow: namespace (e.g., /evanflow:evanflow-go).
See Installation below for two alternative paths.
The loop is built around discipline that compounds across iterations, not single-shot generation. Every step has a checkpoint that gates the next:
For plans with 3+ truly independent units, the loop forks into a parallel coder/overseer orchestration: one coder per unit (using vertical-slice TDD with a RED checkpoint), one overseer per coder (read-only review subagent that can't modify code), plus an integration overseer that runs named integration tests at every touchpoint. The integration tests are the executable contract β interfaces can't drift if both sides have to satisfy the same passing test.
Several rules come from 2025-2026 industry research on agentic coding failure modes and are baked into every skill:
evanflow-tdd and the overseer review explicitly check whether a one-character bug in the implementation would still let the assertion pass.evanflow-compact triggers when symptoms appear (re-asking established questions, contradicting earlier decisions). Industry data: ~65% of enterprise AI coding failures trace to context drift, not raw token exhaustion.| Skill | Purpose |
|---|---|
evanflow-brainstorming |
Clarify intent, propose 2β3 approaches with embedded grill (stress-test). Mockup quick-mode for visual-only requests. |
evanflow-writing-plans |
File structure first, bite-sized tasks, embedded grill. Step 2.5 offers evanflow-coder-overseer if the plan is parallelizable. |
evanflow-executing-plans |
Task-by-task with inline verification. Step 0 re-offers parallel path. Hands off to iterate, then STOPS. |
evanflow-tdd |
Vertical-slice TDD. One test β one impl β repeat. Behavior through public interface. Assertion-correctness warning. |
evanflow-iterate |
Self-review loop after implementation. Re-read diff, fix issues, run quality checks, screenshot UI (via headless Chromium). Five Failure Modes checklist. Hard cap of 5 iterations. |
| Skill | Purpose |
|---|---|
evanflow-go |
Single entry point. Say "let's evanflow this" and it walks the whole loop. |
evanflow-glossary |
Extract canonical domain terms into CONTEXT.md. Flag ambiguities and synonyms. |
evanflow-improve-architecture |
Surface refactor opportunities via the deletion test + deep-modules vocabulary. |
evanflow-design-interface |
"Design it twice" β spawn 3+ parallel sub-agents with radically different constraints, compare on depth/simplicity/efficiency. |
evanflow-debug |
Root-cause discipline. Hypothesis stated explicitly, embedded grill before fixing, failing test first. |
evanflow-review |
Both halves of code review (giving + receiving). Don't capitulate to feedback you can't justify. |
evanflow-prd |
Synthesize a PRD from existing context. For substantial new features. |
evanflow-qa |
Conversational bug discovery β issue draft. Asks before filing. |
| Skill | Purpose |
|---|---|
evanflow-compact |
Long-session context management. Strategies for proactive summarization at clean boundaries. Drift symptoms checklist. |
| Skill | Purpose |
|---|---|
evanflow |
The index. Shared vocabulary + when to invoke each evanflow-* skill. |
In agents/ β invoked via Agent tool with subagent_type: parameter:
| Subagent | Tool restrictions | Purpose |
|---|---|---|
evanflow-coder |
Read, Edit, Write, Glob, Grep, Bash, TodoWrite | Implementation subagent for evanflow-coder-overseer. Tools + system prompt prevent git ops, out-of-scope edits, value hallucination. |
evanflow-overseer |
Read, Grep, Glob (no Edit/Write/Bash) | Read-only review subagent. Tools physically enforce "report findings, never fix." |
hooks/block-dangerous-git.sh β PreToolUse hook that blocks destructive git ops (git push, git reset --hard, git clean -f, git branch -D, git checkout ., git restore .). Auto-activates with the plugin install path.
jq β used by the hook script to parse Claude's JSON tool input. Install via apt install jq, brew install jq, or your platform's package manager. If jq is missing, the guardrail hook fails silently and dangerous git ops are NOT blocked.Optional but recommended:
chromium or google-chrome β for evanflow-iterate's visual verification of UI changes (chromium --headless --screenshot=...). Falls back gracefully if missing β the skill flags it and asks you to verify visually.Three paths, in priority order. All three end with the same skill set in your .claude/skills/. The plugin path additionally auto-wires the guardrail hook.
This is the cleanest install. Skills, agents, AND the guardrail hook all activate automatically.
/plugin marketplace add evanklem/evanflow
/plugin install evanflow@evanflow
Restart Claude Code (or /reload-plugins). Skills appear namespaced as /evanflow:evanflow-go, /evanflow:evanflow-tdd, etc. Auto-invocation via "let's evanflow this" still works regardless of namespace.
To uninstall: /plugin uninstall evanflow@evanflow.
npx skills@latest add CLIWorks against any GitHub repo with SKILL.md-shaped folders. Installs skills only β does not install the guardrail hook or custom subagents (you'd add those manually if you want them).
# Install all 16 skills at once
npx skills@latest add evanklem/evanflow -s '*' -y
# Or install individual skills
npx skills@latest add evanklem/evanflow/evanflow-go
npx skills@latest add evanklem/evanflow/evanflow-tdd
# ...
This places skills under ~/.claude/skills/ (global) or .claude/skills/ (project, auto-detected).
For users who want full control, no CLI dependencies.
git clone https://github.com/evanklem/evanflow.git
cd evanflow
# Skills (project-level β adjust to ~/.claude/skills/ for global)
mkdir -p .claude/skills
cp -r skills/* .claude/skills/
# Agents (custom subagents used by evanflow-coder-overseer)
mkdir -p .claude/agents
cp agents/*.md .claude/agents/
# Git guardrails hook (optional but recommended)
mkdir -p .claude/hooks
cp hooks/block-dangerous-git.sh .claude/hooks/
chmod +x .claude/hooks/block-dangerous-git.sh
Then register the hook in your .claude/settings.json:
{
"hooks": {
"PreToolUse": [
{
"matcher": "Bash",
"hooks": [
{
"type": "command",
"command": "\"$CLAUDE_PROJECT_DIR\"/.claude/hooks/block-dangerous-git.sh"
}
]
}
]
}
}
Optionally, paste examples/CLAUDE.md.snippet into your project's CLAUDE.md to brief Claude about EvanFlow's conventions.
Restart Claude Code. Try saying:
"Let's evanflow this β I want to add a small feature that does X."
evanflow-go should fire and walk you through the loop. To verify the guardrail hook (paths 1 and 3 only): try git reset --hard HEAD from the Bash tool β it should be blocked with "BLOCKED: ... matches dangerous pattern".
Every skill has a clear structure with a ## Hard Rules section. To adapt to your project:
<frontend> and <backend> placeholders in skills like evanflow-writing-plans with your actual paths if you find yourself answering the same question repeatedly.CLAUDE.md β exact typecheck, lint, and test commands. The skills reference these abstractly.evanflow-iterate if you don't have chromium available β substitute google-chrome --headless or another tool.evanflow-coder-overseer to match your project's conventions (your authentication middleware name, your DB write helper, etc.).The skills are designed to be edited. Treat them as starting points, not gospel.
If you fork to make a vendor-specific variant (your-name-flow), great β that's the spirit.
You say: "let's evanflow this β I want to add a feature that does X"
β
βΌ
evanflow-go (the conductor)
β
ββ Phase 0: Restate idea, scope check
ββ Phase 1: evanflow-brainstorming (CHECKPOINT: design approval)
ββ Phase 2: evanflow-writing-plans (CHECKPOINT: plan approval)
β ββ Step 2.5: parallelization check
ββ Phase 3: evanflow-executing-plans (sequential)
β OR
β evanflow-coder-overseer (parallel)
β ββ contract with named tests + integration tests
β ββ RED checkpoint (all coders write failing tests, orchestrator verifies)
β ββ GREEN phase (vertical-slice TDD per coder)
β ββ per-coder overseers (review, never fix)
β ββ integration overseer (runs touchpoint tests)
ββ Phase 4: evanflow-iterate (5x cap, Five Failure Modes pass)
ββ Phase 5: STOP. Report what was done. Await your direction.
Cross-cutting: evanflow-compact runs at clean boundaries when context gets heavy.
Special-purpose skills (evanflow-debug, evanflow-improve-architecture, evanflow-design-interface, evanflow-glossary, evanflow-prd, evanflow-qa, evanflow-review) are pulled in mid-flow when relevant.
.
βββ .claude-plugin/
β βββ plugin.json β plugin identity (name, description, version)
β βββ marketplace.json β marketplace manifest (lists EvanFlow as one bundled plugin)
βββ skills/ β 16 SKILL.md folders
β βββ evanflow/
β βββ evanflow-go/
β βββ evanflow-brainstorming/
β ... (etc)
βββ agents/ β 2 custom subagent definitions
β βββ evanflow-coder.md
β βββ evanflow-overseer.md
βββ hooks/
β βββ hooks.json β auto-activated when plugin installs
β βββ block-dangerous-git.sh
βββ examples/
β βββ CLAUDE.md.snippet β for the manual-copy install path
βββ docs/
β βββ skills-audit.md β verdict on all 38 candidate skills considered
βββ README.md
βββ LICENSE β MIT
EvanFlow synthesizes ideas from:
hooks/ (script copied verbatim). Original by Matt Pocock.Industry research informing the design:
MIT. See LICENSE.
Issues and pull requests welcome. EvanFlow is opinionated by design β proposals to add ceremony or auto-actions will be politely declined. Proposals to further reduce ceremony, sharpen rules, or add evidence-backed improvements are very welcome.
And djb (the djb) also wrote djbdns.
There are plenty of examples, usually when it coincides with someoneβs first project.
Debian is a portmanteau of Debra (Ian's girlfriend) and Ian.
I don't mind it. It's just a name
No x. No y. No z. Just abc.
Its like nails on a chalkboard...