I turned Markdown into a protocol for generative UI

If you're still looking for a name let me suggest "hyper text".

It embodies the whole idea of having data, code and presentation at the same place.

If you're open for contributions I already have an idea for cascading styles system in mind.

The streamed execution idea is novel to me. Not sure what’s it significance ?

I have been working on something with a similar goal:

https://github.com/livetemplate/tinkerdown

Very cool. I'm imagining using this with Claude Code, allowing it to wire this up to MCP or to CLI commands somehow and using that whole system as an interactive dashboard for administering a kubernetes cluster or something like that - and the hypothetical first feature request is to be able to "freeze" one of these UI snippets and save it as some sort of a "view" that I can access later. Use case: it happens to build a particularly convenient way to do a bunch of calls to kubectl, parse results and present them in some interactive way - and I'd like to reuse that same widget later without explaining/iterating on it again.

I quite like this! I've been incrementally building similar tooling for a project I've been working on, and I really appreciate the ideas here.

I think the key decision for someone implementing a flexible UI system like this is the required level of expressiveness. To me, the chief problem with having agents build custom html pages (as another comment suggested) is far too unconstrained. I've been working with a system of pre-registered blocks and callbacks that are very constrained. I quite like this as a middleground, though it may still be too dynamic for my use case. Will explore a bit more!

I see potential to take over Notion's / Obsidian's business here. Imagine highly customizable notebooks people can generate on the fly with the right kind of UI they need. Compared to fixed blocks in Notion

In an agentic loop, the model can keep calling multiple tools for each specialized artifact (like how claude webapp renders HTML/SVG artifacts within a single turn). Models are already trained for this (tested this approach with qwen 3.5 27B and it was able to follow claude's lead from the previous turns).

OpenUI and JSON-render are some other players in this space.

I’m building an agentic commerce chat that uses MCP-UI and want to start using these new implementations instead of MCP-UI but can’t wrap my head around how button on click and actions work? MCP-UI allows onClick events to work since you’re “hard coding” the UI from the get-go vs relying on AI generating undertemistic JSON and turning that into UI that might be different on every use.

The nice thing about standards is that you have so many to choose from

The bots that read the instruction and yet add the emoji to the _beginning_ of the PR title though. Even bigger red flag I guess?

There’s definitely a lot of merit to this idea, and the gifs in the article look impressive. My strong opinion is that there’s a lot more to (good) UIs than what an LLM will ever be able to bring (happy to be proven wrong in a few years…), but for utilitarian and on-the-fly UIs there’s definitely a lot of promise

There seems to be a lot of movement in this direction, how do you feel about Markdown UI?

https://markdown-ui.com/

would be nice if it wasnt just ui but other form like voice narration, sounds ect

If you're still looking for a name let me suggest "hyper text".

It embodies the whole idea of having data, code and presentation at the same place.

If you're open for contributions I already have an idea for cascading styles system in mind.

Every turn of the wheel someone wants to make a new one.

Maybe one day someone will invent a rounder wheel.

If HTML happened again except this time it was markdown, maybe more non-nerds would be able to use it? XML just looks gnarly.

Ha, history does rhyme ;) Happy if you reach out via mail!

There seems to be a lot of movement in this direction, how do you feel about Markdown UI?

https://markdown-ui.com/

Markdown UI and my approach share the "markdown as the medium" insight, but they're fundamentally different bets:

Markdown UI is declarative — you embed predefined widget types in markdown. The LLM picks from a catalog. It's clean and safe, but limited to what the catalog supports.

My approach is code-based — the LLM writes executable TypeScript in markdown code fences, which runs on the server and can render any React UI. It also has server-side state, so the UI can do forms, callbacks, and streaming data — not just display widgets.

I'd much prefer MDX.

If HTML happened again except this time it was markdown, maybe more non-nerds would be able to use it? XML just looks gnarly.

Markdown UI and my approach share the "markdown as the medium" insight, but they're fundamentally different bets:

Markdown UI is declarative — you embed predefined widget types in markdown. The LLM picks from a catalog. It's clean and safe, but limited to what the catalog supports.

I'd much prefer MDX.

Every turn of the wheel someone wants to make a new one.

Maybe one day someone will invent a rounder wheel.

Personally I think we should move to Heptagons, they're round enough.

The wheel is what I would call, passé.

Ha, history does rhyme ;) Happy if you reach out via mail!

I think he's talking about CSS

Personally I think we should move to Heptagons, they're round enough.

The wheel is what I would call, passé.

I think he's talking about CSS

Thanks! I totally agree — this isn't about replacing carefully designed UIs. It's about ephemeral interfaces you need in the moment — the throwaway dashboard for this specific dataset, the one-off form for this exact workflow. Things that would normally mean opening 3 different apps.

The streamed execution idea is novel to me. Not sure what’s it significance ?

I have been working on something with a similar goal:

https://github.com/livetemplate/tinkerdown

The significance is responsiveness — instead of waiting for the LLM to finish generating the entire code block before anything happens, each statement executes as soon as it's complete. So API calls start, UIs render, and errors surface while the LLM is still streaming tokens.

Combined with a slot mechanism, complex UIs build up progressively — a skeleton appears first, then each section fills in as the LLM generates it.

I wrote a deeper dive on how the streaming execution works technically: https://fabian-kuebler.com/posts/streaming-ts-execution/

Add a video or a live demo, there's still too much friction on this readme.

Always Show then Ask.

and I meant to say: tinkerdown looks pretty cool!

would be nice if it wasnt just ui but other form like voice narration, sounds ect

The nice thing about standards is that you have so many to choose from

The bots that read the instruction and yet add the emoji to the _beginning_ of the PR title though. Even bigger red flag I guess?

That’s what I’m building, along with the invisible unified data model underneath, that is needed to tie everything together. Always glad for feedback, reach out in my profile if it sounds interesting!

OpenUI and JSON-render are some other players in this space.

In my approach, callbacks are first-class. The agent defines server-side functions and passes them to the UI:

  const onRefresh = async () => {
    data.loading = true;
    data.messages = await loadMessages();
    data.loading = false;
  };

  mount({
    data,
    callbacks: { onRefresh },
    ui: ({ data, callbacks }) => (
      <Button onClick={callbacks.onRefresh}>Refresh</Button>
    )
  });

When the user clicks the button, it invokes the server-side function. The callback fetches fresh data, updates state via reactive proxies, and the UI reflects it — all without triggering a new LLM turn.

So the UI is generated dynamically by the LLM, but the interactions are real server-side code, not just display. Forms work the same way — "await form.result" pauses execution until the user submits.

The article has a full walkthrough of the four data flow patterns (forms, live updates, streaming data, callbacks) with demos.

Exactly this!

Right now this uses React for Web but could also see it in the terminal via Ink.

And I love the "freeze" idea — maybe then you could even share the mini app.

I quite like this! I've been incrementally building similar tooling for a project I've been working on, and I really appreciate the ideas here.

Thanks! Really interesting to hear you're working on something similar.

You're right that the level of expressiveness is the key design decision. There's a real spectrum:

- pre-registered blocks (safe, predictable)

- code execution with a component library (middle ground)

- full arbitrary code (maximum flexibility).

My approach can slide along that spectrum: you could constrain the agent to only use a specific set of pre-imported components rather than writing arbitrary JSX. The mount() primitive and data flow patterns still work the same way, you just limit what the LLM is allowed to render.

Would love to hear what you learn if you explore it!

Add a video or a live demo, there's still too much friction on this readme.

Always Show then Ask.

and I meant to say: tinkerdown looks pretty cool!

Combined with a slot mechanism, complex UIs build up progressively — a skeleton appears first, then each section fills in as the LLM generates it.

I wrote a deeper dive on how the streaming execution works technically: https://fabian-kuebler.com/posts/streaming-ts-execution/

That is super cool. Sorry to be nitpicky but would really like to know your mental model: I didn’t understand from the blog why user waiting for a functional UI is a problem ? isn’t the partial streamed UI non-functional ?

I can see the value in early user verification and maybe interrupting the LLM to not proceed on an invalid path but I guess this is customer facing so not as valuable.

"In interactive assistants, that latency makes or breaks the experience." Why ? Because user might just jump off ?

(edited)

In my approach, callbacks are first-class. The agent defines server-side functions and passes them to the UI:

  const onRefresh = async () => {
    data.loading = true;
    data.messages = await loadMessages();
    data.loading = false;
  };

  mount({
    data,
    callbacks: { onRefresh },
    ui: ({ data, callbacks }) => (
      <Button onClick={callbacks.onRefresh}>Refresh</Button>
    )
  });

The article has a full walkthrough of the four data flow patterns (forms, live updates, streaming data, callbacks) with demos.

Exactly this!

Right now this uses React for Web but could also see it in the terminal via Ink.

And I love the "freeze" idea — maybe then you could even share the mini app.

Did you see that Claude Code just came out with "channels" that allows for messages to be injected into the session/sent out by claude via hooks and MCP server [1]? I had CC code an integration between fenced and CC using channels and it actually worked - a little clunky since there is no streaming, but very interesting nevertheless.

[1] https://code.claude.com/docs/en/channels-reference

Thanks! Really interesting to hear you're working on something similar.

You're right that the level of expressiveness is the key design decision. There's a real spectrum:

- pre-registered blocks (safe, predictable)

- code execution with a component library (middle ground)

- full arbitrary code (maximum flexibility).

Would love to hear what you learn if you explore it!

Will do! I'm using a JSON DSL currently, I wonder if there's a best choice for format that is both at the correct level of expressiveness and also easy enough for the LLM to generate in a valid way. I do think markdown has advantage of being very trivial for LLMs, but my current JSON blocks strategy might be better for more complex data.... will play around.

I can see the value in early user verification and maybe interrupting the LLM to not proceed on an invalid path but I guess this is customer facing so not as valuable.

"In interactive assistants, that latency makes or breaks the experience." Why ? Because user might just jump off ?

(edited)

Maybe I am a bit overdramatic ;) For me this is mostly about user experience. If the agent creates a complex mini app, the user might have to wait 30 seconds. That's 30 seconds without feedback. It's way nicer to already see information appearing - especially if that information is helpful. Also the UI can be functional already, even if it's not 100% complete!

[1] https://code.claude.com/docs/en/channels-reference

Whoa, I hadn't seen channels yet — and you already got fenced working with Claude Code?! That's awesome. Would love to see what you built!

15 December 2025·7 mins

“User interfaces are largely going to go away,” Eric Schmidt predicts. Agents will generate whatever UI you need on the fly. I built a prototype to explore the premise.

That’s an agentic AI assistant generating React UIs from scratch, with data flowing between client, server, and LLM. The prototype rests on three ideas:

Markdown as protocol — One stream carrying text, executable code, and data. The LLM already knows how to write it.
Streaming execution — The agent writes and executes code. Each statement executes as soon as it’s complete — no waiting for the full response.
A mount() primitive — One function that lets the agent create reactive UIs, with data flow patterns for client-server-LLM communication.

Check out the repo here.

The Protocol #

How do you combine code execution with text and data? All streamed and interleaved in arbitrary order? In a single protocol?

I kept coming back to markdown. LLMs know markdown cold — formatting, code fences, all of it. Why teach them something new?

So I settled on three block types:

Block	Syntax	Purpose
Text	`Plain markdown formatting`	Streams to the user
Code fence	```tsx agent.run	Executes on the server in a persistent context
Data fence	```json agent.data => "id"	Streams data into the UI

Here’s what this might look like:

Hey! I am the assistant. This text is streamed to the user token by token. But I can also run code...

```tsx agent.run const messages = await fetchMessages() ```

I can mount UIs

```tsx agent.run const fakeMovieData = new StreamedData("fake-movies");

const form = mount({ streamedData: fakeMovieData, ui: ({ streamedData }) => }) ```

I can stream data into these UIs [data appears one by one...]

```json agent.data => "fake-movies" [ { "name": "Blade Runner", "rating": 4.5 }, { "name": "Dune", "rating": 4.2 } ] ```

All within the same response...

Text, code, and data—interleaved, in any order, any number of times. The parser handles it incrementally as tokens arrive.

And the syntax is naturally extensible. Need a new block type? Just add a new fence header. tsx agent.run and json agent.data are just the first two.

The Feedback Loop #

The feedback loop is simple: console.log is how the agent talks to itself. It works like this:

LLM generates markdown with code blocks
Text streams to the user, code executes incrementally on the server
console.* output and exceptions feed back to the LLM as a new turn
If there’s no output or exceptions — done, wait for a new user query

This means the agent can react to its own execution:

How many messages did I get?

```tsx agent.run const messages = await fetchMessages(); console.log('messagesCount:', messages.length); ```

[runtime transcript] messagesCount: 4

You have four new messages.

Or it can pause and wait for user input:

```tsx agent.run const form = mount({ /* ... */ }); const answer = await form.result; // Blocks until user submits console.log("user:responded", answer); ```

Streaming Execution #

I wanted statements to execute as the LLM generated them, without waiting for the code fence to close. The result would be a more responsive user experience—API calls start, UI renders, errors surface, all while the LLM is still sending tokens.

The problem: streaming execution isn’t a standard primitive yet. No runtime lets you feed in tokens and execute statements as they complete, with shared context and top-level await.

I ended up building bun-streaming-exec to handle this, using vm.Script with some “creative” wrapping. I wrote a dedicated article about the approach if you want the deep dive.

Is it cursed? Yes. Works? Mostly.

Agentic UI #

With text, code, and data in one stream, you have most of the building blocks for agentic UI. The missing piece is a way to turn code into live interfaces. For UI, React is the obvious choice. LLMs have seen millions of React components. They know JSX.

The core primitive is mount():

```tsx agent.run mount({ ui: () => Hello from the agent! }); ```

Rendered UI

The LLM generates the code, the server executes it. mount() serializes the React component and sends it over the wire. The client renders it inside the chat.

The real power comes from data flow, though.

Four Ways Data Can Move #

Building this, I ended up with four distinct patterns for moving data between server, client, and LLM:

1. Client → Server (forms)

The agent can wait for user input:

```tsx agent.run const form = mount({ outputSchema: z.object({ name: z.string().min(1) }), ui: ({ output }) => ( <TextField {...output.name} label="Your name" /> <Button type="submit" {...output}>Submit ) }); const { name } = await form.result; // Blocks until submit console.log("user:responded", name); ```

{...output.name} wires up the field. await form.result pauses execution until the user submits. The result feeds back to the LLM via console.log.

2. Server → Client (live updates)

Server-side mutations transparently update the UI:

```tsx agent.run const data = new Data({ progress: 0 }); mount({ data, ui: ({ data }) => });

data.progress = 40; // UI updates immediately ```

Under the hood, Data objects are proxies. Mutations are detected, serialized as patches, sent over WebSocket, and applied on the client.

3. LLM → Client (streaming)

The LLM can stream JSON directly into the UI:

```tsx agent.run const movies = new StreamedData("movies-list"); mount({ streamedData: movies, ui: ({ streamedData }) => ( {streamedData?.map((movie, i) => ( {movie.name} )) ?? } ) }); ```

```json agent.data => "movies-list" [ { "name": "Blade Runner", "rating": 4.5 }, { "name": "Dune", "rating": 4.2 } ] ```

The JSON streams token-by-token. The client parses incrementally using jsonriver, updating the UI as data arrives. Once complete, the server can access it too via the StreamedData object.

4. Client → Server (callbacks)

For live interactions inside the UI:

```tsx agent.run const data = new Data({ messages, loading: false }); const onRefresh = async () => { data.loading = true; data.messages = await loadMessages(); data.loading = false; };

mount({ data, callbacks: { onRefresh }, ui: ({ data, callbacks }) => ( ) }); ```

Clicking the button invokes a server-side function. The callback fetches fresh data, updates state, and the UI reflects it — all in code, without triggering a new LLM turn.

Slots #

As UIs get more complex, the user has to wait longer for the LLM to generate the code. For more elaborate UIs, there’s a slot mechanism: the agent can mount a skeleton interface first and then inject the heavier sections later.

Combined with streaming execution, the skeleton appears the moment its mount() statement completes. Each mountSlot() call fills in a section as soon as the LLM finishes generating it:

```tsx agent.run const shell = mount({ data, callbacks: { onResolve }, ui: () => ( <Slot name="stats" fallback={} /> <Slot name="blockers" fallback={} /> ), });

shell.mountSlot("stats", ({ data }) => ); shell.mountSlot("blockers", ({ callbacks }) => ); ```

Slots share the same context as their parent: data, callbacks, streamed data. This means slots stay reactive across each other. A callback in one slot can mutate shared data, and every other slot that reads it updates automatically.

On Security #

Both Claude Code and ChatGPT’s Code Interpreter already execute LLM-generated code at scale — sandboxing, capability-based permissions, and static analysis are under active development across the industry. The hard unsolved problem is prompt injection, and that cuts across all agent architectures equally — tool calling, MCP, and code execution alike. This project doesn’t tackle any of that. It explores the layer above: what you can build once you assume security is reasonably solved. We’re not fully there yet.

Why It Works #

I built this prototype to see if markdown could actually work as a protocol for agentic UI without any finetuning. When I let it run the first time, I was surprised. The model picked it up immediately. It was not perfect. But the core idea just worked.

That’s because every design choice here optimizes for one thing: LLM ergonomics.

Markdown with code fences because LLMs have trained on billions of docs. TypeScript because it bridges server and client in the most-used language on GitHub. React because it’s the UI framework they know best. mount() because its building blocks — awaitable results, callbacks, Zod schemas — are patterns the model has seen millions of times.

The system doesn’t teach the model anything new. It arranges patterns the model already knows into a system that actually runs.

You could design a new protocol for agentic UI from scratch. Or you could just match the runtime to the model’s training data: markdown.