If you’re just looking for the TDD part - https://github.com/nizos/tdd-guard - is the only project I’ve come across that actually enforces it with hooks and blocks edits rather than relying on a prompt that gets context rotted away.

To be honest, the official superpowers/brainstorming skill already does TDD so well, I don't see that much of a need for this. TDD is definitely the way to go with agentic development.

EvanFlow - thoughts arrive like butterflies?

Two questions

1) Do you not feel self-conscious or weird about calling this "EvanFlow"? Seems like a lot of people these days are naming their AI tools/skills/whatever after themselves which seems self-absorbed. Either that or they hope that if their thing takes off like OpenClaw did then they'll grab the fame that comes along with it.

2) Why does your TDD flow miss the refactor step of TDD?

The refactor step is the silent casualty in AI-assisted TDD. Once the test is green, Claude optimizes for moving to the next test, not for cleaning up the impl that just passed. An "iterate-until-clean" pass at the end is a different thing: you're refactoring cold code, not refactoring with a freshly-written test as the safety net.

Built this as an opinionated Claude Code development flow based on evidence based practices and what has been working for me while developing professional code.

EvanFlow is a single TDD-driven loop. Say "let's evanflow this" and it walks brainstorm → plan → execute → tdd → iterate → STOP. Real checkpoints at design and plan approval. Never auto-commits, never auto-stages, never proposes integration - every git op is your call.

The three things that actually changed how I work:

1. Vertical-slice TDD. One failing test → minimal impl → next test. Watch each test fail before writing the impl that passes it. (Sounds obvious. Almost no agent does it by default. ~62% of LLM-generated test assertions are wrong per HumanEval research, so testing TDD discipline matters more than the impl discipline.)

2. Embedded grilling at decision points. Before locking a plan: what breaks if a user does X? What's the rollback? What's explicitly out of scope? Catches design flaws while they're still cheap.

3. Iterate-until-clean (hard cap of 5 rounds). Re-read the diff against dead code, naming, the deletion test, assertion correctness, and a Five Failure Modes pass (hallucinated actions, scope creep, cascading errors, context loss, tool misuse). For UI: screenshot via headless Chromium.

For bigger plans with 3+ independent units sharing types, it forks into a parallel coder/overseer orchestration. Integration tests at touchpoints ARE the cohesion contract.

Three install paths: Claude Code plugin marketplace, npx skills add, manual copy. MIT.

superpowers/brainstorming is doing TDD as well.

How does this handle “dumb zone” evasion while looping?

TDD in 2026? Besides, TDDs main benefit is to come up with a decent architecture for your system… LLMs can already do that if instructed. I don’t see the point of TDD

How does this handle “dumb zone” evasion while looping?

When I first used agentic coding I was already doing strict TDD and I just tried using it for the refactor step.

It sucked so hard I thought the idea of agentic coding was just a joke. Ive tried it periodically and it literally never stopped sucking.

I figure if it cant do that part it isnt worth using it for any part.

Ever since then whenever people tell me it's gotten better I've tried it out and nope, still sucks.

I still get gaslit about how well it works by people who just discovered TDD though, and watch it power through CRUD boilerplate getting impressed, blissfully unaware that boilerplate spew is an antipattern.

To be honest, the official superpowers/brainstorming skill already does TDD so well, I don't see that much of a need for this. TDD is definitely the way to go with agentic development.

superpowers/brainstorming is doing TDD as well.

how?i saw superpowers/brainstorming but never saw tdd code produced

EvanFlow - thoughts arrive like butterflies?

Oh, he don't know, so he chases them away

Seeeethinnggg tests failing not complete... again

Someday soon he'll begin his life again

Built this as an opinionated Claude Code development flow based on evidence based practices and what has been working for me while developing professional code.

The three things that actually changed how I work:

2. Embedded grilling at decision points. Before locking a plan: what breaks if a user does X? What's the rollback? What's explicitly out of scope? Catches design flaws while they're still cheap.

For bigger plans with 3+ independent units sharing types, it forks into a parallel coder/overseer orchestration. Integration tests at touchpoints ARE the cohesion contract.

Three install paths: Claude Code plugin marketplace, npx skills add, manual copy. MIT.

Please don’t post AI generated comments :(

Just write it yourself. I promise it’s worth it

I’ve thought of going down the TDD model for LLMs as a way of providing constraints on their behavior. I would think that “vertical slice” TDD would encourage the LLM to start tailoring the tests to the implementation rather than establishing the invariants up front, though. I was considering “horizontal” TDD to force the agent to implement constraints before coding to them.

Curious, In the repo you mention

> Several rules come from 2025-2026 industry research on agentic coding failure modes

What are some of the papers you read?

> execute → tdd

How are these separate steps?

TDD is how you execute, not something you tack on afterwards.

Two questions

2) Why does your TDD flow miss the refactor step of TDD?

Let the guy have something. Free and open source developers work tirelessly for free for years supporting software that billion dollar companies use to make huge profits.

We don't question when scientists name stuff after themselves so why question this? At least he gets some recognition for his work.

I initially thought it was a pun on Pearl Jam's classic "Even Flow", then I read your comment and noticed the username... Sad.

1) Do you feel weird asking a question like this? What constructive benefit does it add to any dialogue?

Sometimes it’s helpful to ask oneself what’s the benefit of an answer. I cannot think of any for your question and the way you worded it is a bit cringe. People name things after themselves all the time. It does not matter in the slightest.

I feel like 1 is a self correcting problem. If this goes nowhere it will soon be forgotten.

I can think of one example that did go somewhere: Linux.

Ref 1, he should have called it Daughter.

"Evenflo is a hundred year old infant feeding brand." Probably named to market its baby bottles and accessories.

Everybody who grew up to listen to Pearl Jam had seen or used an Evenflo pacifier, baby bottle, or car seat. That's one reason the song already sounded so familiar.

TDD in 2026? Besides, TDDs main benefit is to come up with a decent architecture for your system… LLMs can already do that if instructed. I don’t see the point of TDD

I've always been hesitant to prescribe TDD to _everything_ until agentic coding agents came along. TDD is a great way to keep them on track.

When I first used agentic coding I was already doing strict TDD and I just tried using it for the refactor step.

It sucked so hard I thought the idea of agentic coding was just a joke. Ive tried it periodically and it literally never stopped sucking.

I figure if it cant do that part it isnt worth using it for any part.

Ever since then whenever people tell me it's gotten better I've tried it out and nope, still sucks.

Someday soon he'll begin his life again

Seeeethinnggg tests failing not complete... again

how?i saw superpowers/brainstorming but never saw tdd code produced

It’s supposed to do this, but I’ve found it doesn’t always do it

Oh, he don't know, so he chases them away

Please don’t post AI generated comments :(

Just write it yourself. I promise it’s worth it

> execute → tdd

How are these separate steps?

TDD is how you execute, not something you tack on afterwards.

1) Do you feel weird asking a question like this? What constructive benefit does it add to any dialogue?

"Evenflo is a hundred year old infant feeding brand." Probably named to market its baby bottles and accessories.

Everybody who grew up to listen to Pearl Jam had seen or used an Evenflo pacifier, baby bottle, or car seat. That's one reason the song already sounded so familiar.

I've always been hesitant to prescribe TDD to _everything_ until agentic coding agents came along. TDD is a great way to keep them on track.

He's even being cheeky by intentionally replacing the em-dash by a regular dash, haha

Curious, In the repo you mention

> Several rules come from 2025-2026 industry research on agentic coding failure modes

What are some of the papers you read?

With no disrespect intended because this is also how I would do it (but I wouldn't publish and name it after myself!) - they didn't read the research. They had the AI that actually created this do that for them.

Let the guy have something. Free and open source developers work tirelessly for free for years supporting software that billion dollar companies use to make huge profits.

We don't question when scientists name stuff after themselves so why question this? At least he gets some recognition for his work.

I initially thought it was a pun on Pearl Jam's classic "Even Flow", then I read your comment and noticed the username... Sad.

I was really hoping this was something I could find on CPAN from the author username perlJam.

I feel like 1 is a self correcting problem. If this goes nowhere it will soon be forgotten.

I can think of one example that did go somewhere: Linux.

TanStack was started by a guy named Tanner

Debian is a portmanteau of Debra (Ian's girlfriend) and Ian.

I don't mind it. It's just a name

ReiserFS is another one that comes to mind.

And djb (the djb) also wrote djbdns.

There are plenty of examples, usually when it coincides with someone’s first project.

Debian is an even better example

Linus did not name it Linux himself: https://en.wikipedia.org/wiki/Linux#Naming

Feels like a bonus to me.

Ref 1, he should have called it Daughter.

EvanFlow

A TDD-driven iterative feedback loop for software development with Claude Code.

16 cohesive skills + 2 custom subagents walk an idea from brainstorm through implementation, with checkpoints throughout where you stay in control. One entry point: say "let's evanflow this" and the orchestrator runs the loop.

brainstorm → plan → execute (sequential or parallel) → tdd → iterate → STOP

The loop is conductor, not autopilot: real checkpoints at design approval, plan approval, and after iteration. The agent stops short of every git operation and waits for your direction. No auto-commits. No forced ceremony. No "must invoke a skill" tax.

Quick Install

The recommended path — Claude Code's plugin marketplace:

/plugin marketplace add evanklem/evanflow
/plugin install evanflow@evanflow

Restart, then try:

"Let's evanflow this — I want to add a small feature that does X."

evanflow-go fires and walks the loop. The git-guardrails hook auto-activates with the plugin (no settings.json edit needed). Skills appear under the evanflow: namespace (e.g., /evanflow:evanflow-go).

See Installation below for two alternative paths.

What Makes It a Feedback Loop

The loop is built around discipline that compounds across iterations, not single-shot generation. Every step has a checkpoint that gates the next:

Brainstorm clarifies intent, proposes 2–3 approaches with embedded grill (stress-test) → you approve the design
Plan maps file structure first (deep modules, deletion test) → you approve the plan
Execute runs task-by-task with inline verification → blockers stop the loop and surface to you
TDD is vertical-slice only: one failing test → minimal impl → repeat. Tests verify behavior through public interfaces, so they survive refactors
Iterate re-reads the diff with fresh eyes, runs quality checks, screenshots UI changes, and runs against a Five Failure Modes checklist (hallucinated actions, scope creep, cascading errors, context loss, tool misuse). Hard cap of 5 iterations
STOP. Report. Await your direction. The agent never auto-commits, never auto-stages, never proposes a PR

For plans with 3+ truly independent units, the loop forks into a parallel coder/overseer orchestration: one coder per unit (using vertical-slice TDD with a RED checkpoint), one overseer per coder (read-only review subagent that can't modify code), plus an integration overseer that runs named integration tests at every touchpoint. The integration tests are the executable contract — interfaces can't drift if both sides have to satisfy the same passing test.

Hard Rules Baked Into the Loop

Several rules come from 2025-2026 industry research on agentic coding failure modes and are baked into every skill:

Never invent values — file paths, env vars, IDs, function names, library APIs. If unsure, the agent stops and asks. (Action-hallucination is the most dangerous agent failure.)
Assertion-correctness warning — research shows 62% of LLM-generated test assertions are wrong. Both evanflow-tdd and the overseer review explicitly check whether a one-character bug in the implementation would still let the assertion pass.
Watch for context drift — evanflow-compact triggers when symptoms appear (re-asking established questions, contradicting earlier decisions). Industry data: ~65% of enterprise AI coding failures trace to context drift, not raw token exhaustion.
Five Failure Modes pass in iterate + overseer review — explicit check against hallucinated actions, scope creep, cascading errors, context loss, tool misuse.
No skill tax — ad-hoc questions don't require a skill invocation. Skills are tools, not a tollbooth.

The Skill Set

Default Loop (5 skills)

Skill	Purpose
`evanflow-brainstorming`	Clarify intent, propose 2–3 approaches with embedded grill (stress-test). Mockup quick-mode for visual-only requests.
`evanflow-writing-plans`	File structure first, bite-sized tasks, embedded grill. Step 2.5 offers `evanflow-coder-overseer` if the plan is parallelizable.
`evanflow-executing-plans`	Task-by-task with inline verification. Step 0 re-offers parallel path. Hands off to iterate, then STOPS.
`evanflow-tdd`	Vertical-slice TDD. One test → one impl → repeat. Behavior through public interface. Assertion-correctness warning.
`evanflow-iterate`	Self-review loop after implementation. Re-read diff, fix issues, run quality checks, screenshot UI (via headless Chromium). Five Failure Modes checklist. Hard cap of 5 iterations.

Special-Purpose (8 skills)

Skill	Purpose
`evanflow-go`	Single entry point. Say "let's evanflow this" and it walks the whole loop.
`evanflow-glossary`	Extract canonical domain terms into `CONTEXT.md`. Flag ambiguities and synonyms.
`evanflow-improve-architecture`	Surface refactor opportunities via the deletion test + deep-modules vocabulary.
`evanflow-design-interface`	"Design it twice" — spawn 3+ parallel sub-agents with radically different constraints, compare on depth/simplicity/efficiency.
`evanflow-debug`	Root-cause discipline. Hypothesis stated explicitly, embedded grill before fixing, failing test first.
`evanflow-review`	Both halves of code review (giving + receiving). Don't capitulate to feedback you can't justify.
`evanflow-prd`	Synthesize a PRD from existing context. For substantial new features.
`evanflow-qa`	Conversational bug discovery → issue draft. Asks before filing.

Cross-Cutting (1 skill)

Skill	Purpose
`evanflow-compact`	Long-session context management. Strategies for proactive summarization at clean boundaries. Drift symptoms checklist.

Meta (1 skill)

Skill	Purpose
`evanflow`	The index. Shared vocabulary + when to invoke each `evanflow-*` skill.

Custom Subagents (2)

In agents/ — invoked via Agent tool with subagent_type: parameter:

Subagent	Tool restrictions	Purpose
`evanflow-coder`	Read, Edit, Write, Glob, Grep, Bash, TodoWrite	Implementation subagent for `evanflow-coder-overseer`. Tools + system prompt prevent git ops, out-of-scope edits, value hallucination.
`evanflow-overseer`	Read, Grep, Glob (no Edit/Write/Bash)	Read-only review subagent. Tools physically enforce "report findings, never fix."

Bundled Hook

hooks/block-dangerous-git.sh — PreToolUse hook that blocks destructive git ops (git push, git reset --hard, git clean -f, git branch -D, git checkout ., git restore .). Auto-activates with the plugin install path.

Hard Rules (apply to every skill)

Never auto-commit, never auto-stage, never auto-finish. Every git write op requires you to explicitly ask in the current turn.
Never invent values. File paths, env vars, IDs, function names, library APIs — if unsure, the agent stops and asks.
No skill tax. Ad-hoc questions don't require a skill invocation. Skills are tools, not a tollbooth.
No forced spec/plan paths. Files live where you want them.
Verify before claiming done. Quality checks (typecheck, lint, test) run before any "done" report.

Requirements

Claude Code (any recent version)
Bash — for the bundled hook script (Linux, macOS, or Windows + WSL)
jq — used by the hook script to parse Claude's JSON tool input. Install via apt install jq, brew install jq, or your platform's package manager. If jq is missing, the guardrail hook fails silently and dangerous git ops are NOT blocked.

Optional but recommended:

chromium or google-chrome — for evanflow-iterate's visual verification of UI changes (chromium --headless --screenshot=...). Falls back gracefully if missing — the skill flags it and asks you to verify visually.

Installation

Three paths, in priority order. All three end with the same skill set in your .claude/skills/. The plugin path additionally auto-wires the guardrail hook.

Path 1 — Claude Code Plugin Marketplace (recommended)

This is the cleanest install. Skills, agents, AND the guardrail hook all activate automatically.

/plugin marketplace add evanklem/evanflow
/plugin install evanflow@evanflow

Restart Claude Code (or /reload-plugins). Skills appear namespaced as /evanflow:evanflow-go, /evanflow:evanflow-tdd, etc. Auto-invocation via "let's evanflow this" still works regardless of namespace.

To uninstall: /plugin uninstall evanflow@evanflow.

Path 2 — `npx skills@latest add` CLI

Works against any GitHub repo with SKILL.md-shaped folders. Installs skills only — does not install the guardrail hook or custom subagents (you'd add those manually if you want them).

# Install all 16 skills at once
npx skills@latest add evanklem/evanflow -s '*' -y

# Or install individual skills
npx skills@latest add evanklem/evanflow/evanflow-go
npx skills@latest add evanklem/evanflow/evanflow-tdd
# ...

This places skills under ~/.claude/skills/ (global) or .claude/skills/ (project, auto-detected).

Path 3 — Manual Copy

For users who want full control, no CLI dependencies.

git clone https://github.com/evanklem/evanflow.git
cd evanflow

# Skills (project-level — adjust to ~/.claude/skills/ for global)
mkdir -p .claude/skills
cp -r skills/* .claude/skills/

# Agents (custom subagents used by evanflow-coder-overseer)
mkdir -p .claude/agents
cp agents/*.md .claude/agents/

# Git guardrails hook (optional but recommended)
mkdir -p .claude/hooks
cp hooks/block-dangerous-git.sh .claude/hooks/
chmod +x .claude/hooks/block-dangerous-git.sh

Then register the hook in your .claude/settings.json:

{
  "hooks": {
    "PreToolUse": [
      {
        "matcher": "Bash",
        "hooks": [
          {
            "type": "command",
            "command": "\"$CLAUDE_PROJECT_DIR\"/.claude/hooks/block-dangerous-git.sh"
          }
        ]
      }
    ]
  }
}

Optionally, paste examples/CLAUDE.md.snippet into your project's CLAUDE.md to brief Claude about EvanFlow's conventions.

Verify Any Install Path

Restart Claude Code. Try saying:

"Let's evanflow this — I want to add a small feature that does X."

evanflow-go should fire and walk you through the loop. To verify the guardrail hook (paths 1 and 3 only): try git reset --hard HEAD from the Bash tool — it should be blocked with "BLOCKED: ... matches dangerous pattern".

Customization

Every skill has a clear structure with a ## Hard Rules section. To adapt to your project:

Replace <frontend> and <backend> placeholders in skills like evanflow-writing-plans with your actual paths if you find yourself answering the same question repeatedly.
Document your project's quality checks in your CLAUDE.md — exact typecheck, lint, and test commands. The skills reference these abstractly.
Adapt the visual verification step in evanflow-iterate if you don't have chromium available — substitute google-chrome --headless or another tool.
Edit the cohesion contract template in evanflow-coder-overseer to match your project's conventions (your authentication middleware name, your DB write helper, etc.).

The skills are designed to be edited. Treat them as starting points, not gospel.

If you fork to make a vendor-specific variant (your-name-flow), great — that's the spirit.

How EvanFlow Works End-to-End

You say: "let's evanflow this — I want to add a feature that does X"
           │
           ▼
       evanflow-go (the conductor)
           │
           ├─ Phase 0: Restate idea, scope check
           ├─ Phase 1: evanflow-brainstorming (CHECKPOINT: design approval)
           ├─ Phase 2: evanflow-writing-plans (CHECKPOINT: plan approval)
           │            └─ Step 2.5: parallelization check
           ├─ Phase 3: evanflow-executing-plans (sequential)
           │            OR
           │            evanflow-coder-overseer (parallel)
           │              ├─ contract with named tests + integration tests
           │              ├─ RED checkpoint (all coders write failing tests, orchestrator verifies)
           │              ├─ GREEN phase (vertical-slice TDD per coder)
           │              ├─ per-coder overseers (review, never fix)
           │              └─ integration overseer (runs touchpoint tests)
           ├─ Phase 4: evanflow-iterate (5x cap, Five Failure Modes pass)
           └─ Phase 5: STOP. Report what was done. Await your direction.

Cross-cutting: evanflow-compact runs at clean boundaries when context gets heavy.

Special-purpose skills (evanflow-debug, evanflow-improve-architecture, evanflow-design-interface, evanflow-glossary, evanflow-prd, evanflow-qa, evanflow-review) are pulled in mid-flow when relevant.

Repository Structure

.
├── .claude-plugin/
│   ├── plugin.json          — plugin identity (name, description, version)
│   └── marketplace.json     — marketplace manifest (lists EvanFlow as one bundled plugin)
├── skills/                  — 16 SKILL.md folders
│   ├── evanflow/
│   ├── evanflow-go/
│   ├── evanflow-brainstorming/
│   ... (etc)
├── agents/                  — 2 custom subagent definitions
│   ├── evanflow-coder.md
│   └── evanflow-overseer.md
├── hooks/
│   ├── hooks.json           — auto-activated when plugin installs
│   └── block-dangerous-git.sh
├── examples/
│   └── CLAUDE.md.snippet    — for the manual-copy install path
├── docs/
│   └── skills-audit.md      — verdict on all 38 candidate skills considered
├── README.md
└── LICENSE                  — MIT

Credits

EvanFlow synthesizes ideas from:

mattpocock/skills by Matt Pocock — vertical-slice TDD, deep modules, deletion test, design-it-twice, ubiquitous language, grill-me, caveman.
superpowers by Jesse Vincent — verification-before-completion, code review patterns, parallel agent dispatch, finishing-a-development-branch (the 4-option presentation).
git-guardrails-claude-code — bundled in hooks/ (script copied verbatim). Original by Matt Pocock.

Industry research informing the design:

Anthropic's 2026 Agentic Coding Trends Report
9 Critical Failure Patterns of Coding Agents (DAPLab, Columbia)
Test-Driven Development for Code Generation (arXiv 2402.13521) — assertion-correctness findings

License

MIT. See LICENSE.

Contributing

Issues and pull requests welcome. EvanFlow is opinionated by design — proposals to add ceremony or auto-actions will be politely declined. Proposals to further reduce ceremony, sharpen rules, or add evidence-backed improvements are very welcome.

It’s supposed to do this, but I’ve found it doesn’t always do it

I was really hoping this was something I could find on CPAN from the author username perlJam.

ReiserFS is another one that comes to mind.

And djb (the djb) also wrote djbdns.

There are plenty of examples, usually when it coincides with someone’s first project.

TanStack was started by a guy named Tanner

Debian is a portmanteau of Debra (Ian's girlfriend) and Ian.

I don't mind it. It's just a name

Debian is an even better example

Feels like a bonus to me.

He's even being cheeky by intentionally replacing the em-dash by a regular dash, haha

It's quite well done really, but the cadence...

No x. No y. No z. Just abc.

Its like nails on a chalkboard...

Linus did not name it Linux himself: https://en.wikipedia.org/wiki/Linux#Naming

He merely laundered it through a coworker.

It's quite well done really, but the cadence...

No x. No y. No z. Just abc.

Its like nails on a chalkboard...

He merely laundered it through a coworker.

Hacker Times

Hacker Times

EvanFlow – A TDD driven feedback loop for Claude Code

Discussion

Discussion

EvanFlow

Quick Install

What Makes It a Feedback Loop

Hard Rules Baked Into the Loop

The Skill Set

Default Loop (5 skills)

Special-Purpose (8 skills)

Cross-Cutting (1 skill)

Meta (1 skill)

Custom Subagents (2)

Bundled Hook

Hard Rules (apply to every skill)

Requirements

Installation

Path 1 — Claude Code Plugin Marketplace (recommended)

Path 2 — `npx skills@latest add` CLI

Path 3 — Manual Copy

Verify Any Install Path

Customization

How EvanFlow Works End-to-End

Repository Structure

Credits

License

Contributing

Hacker Times

Hacker Times

EvanFlow – A TDD driven feedback loop for Claude Code

Discussion

Discussion

EvanFlow

Quick Install

What Makes It a Feedback Loop

Hard Rules Baked Into the Loop

The Skill Set

Default Loop (5 skills)

Special-Purpose (8 skills)

Cross-Cutting (1 skill)

Meta (1 skill)

Custom Subagents (2)

Bundled Hook

Hard Rules (apply to every skill)

Requirements

Installation

Path 1 — Claude Code Plugin Marketplace (recommended)

Path 2 — npx skills@latest add CLI

Path 3 — Manual Copy

Verify Any Install Path

Customization

How EvanFlow Works End-to-End

Repository Structure

Credits

License

Contributing

Path 2 — `npx skills@latest add` CLI