Tree-sitter is a parser generator tool and an incremental parsing library. It can build a concrete syntax tree for a source file and efficiently update the syntax tree as the source file is edited.
I'm happy to see that tree sitter highlighting on the web is finally a thing. This seems really solid although the bundle size is a lot.
They also load 1 mb of fonts. In total, this page is close to 3 mb.
Also, when you select a language, the grammar file gets downloaded twice.
My favorite is nvim-treesitter-textobjects which gives you dozens of new targets for vim motions, such as a function call or the condition of a loop.
It’s not a new discovery, I just didn’t know docs.rs (intentionally) wasn’t blocking this. Cf https://docs.rs/pwnies/0.0.13/pwnies/
(This all makes more sense if you read the blog post instead of the direct arborium link: https://fasterthanli.me/articles/my-gift-to-the-rust-docs-te...)
But then, on the other hand, I had given up on a scratch code editor for a game development project I'm working on, and just loosely wrapped up the monaco editor which I'm guessing is going to be pretty bare when I actually get around to trying to code with it, in browser (I'm aware that it is robust, but from what I gather, a lot of its features come from third-party dev as a way to keep the core functionality pure). Given that I want to be able to script in C# (aside from just js/ts), I was sure I was going to have to figure something complicated out.
But, honestly, I think this solves all of my most concerning issues! What a sweet little library!
[0] https://magnitce-code-example-e81613.gitlab.io/ (please excuse the unfinished-ness; I'm working on a JSDoc-to-documentation library that automates the documentation for me so there are minor issues, like the install text not changing on selection)
The ease of use to highlight static text via Arborium seems nice:
<script src="arborium.iife.js"></script>
<pre><code class="language-python">
def hello(name):
print("Hello " + name);
</code></pre>
But does it support editing highlighted text? If not, one would have to do some trickery by hiding a textarea and updating the <code> element on each keypress, I guess. Which probably has a thousand corner cases one would have to deal with.And how would one add SCAD support?
You can use it as a normal Rust library, or you can use the JavaScript/WASM wrapper to highlight source code on a web page.
Or... website text editors which historically have had imperfect syntax highlighting.
Notice the Zed sponsorship.
The code isn't minified so you can see how they do it by looking at the `doHighlight()` function here https://arborium.bearcove.eu/pkg/app.generated.js
I also had a hard time understanding the context given just the link.
Just wanted to note that tree-sitter is lower-level and more general: it's an incremental parser that is specialised for gracefully and efficiently parsing partially-correct code snippets or code being edited live.
It's an important building block of the highlighter, but it needs more on top to complete the package. It can be used for anything that requires awareness of code structure in an editor.
That's the best one sentence description there is and it's at the top of the Github README. I think that would fit nice at the top of https://arborium.bearcove.eu too
Hmm .. and the approach already shows its weaknesses when I play with it: When I search for something on the page, it gives me twice as many hits as there are. And jumps around two times to each hit when I use the "next" button.
I wonder if that is fixable.
I wonder if targeting the Tree-sitter ABI directly could be a viable way to write more accurate parsers in an actual programming language while piggybacking on the ecosystem. Could tree-sitter's runtime ABI be adapted for GLL parsers instead of GLR? I haven't looked deep into it yet.
https://developer.mozilla.org/en-US/docs/Web/API/Highlight
All the trickery vanishes and you get first-class CSS support.
https://developer.mozilla.org/en-US/docs/Web/HTML/Reference/...
https://github.blog/engineering/architecture-optimization/cr...
On this side, I'm not a bot, and fortunately even in 2025 nobody on the internet knows that you are a dog.
Finding good tree-sitter grammars is hard. In arborium, every grammar:
We hand-picked grammars, added missing highlight queries, and updated them to the latest tree-sitter. Tree-sitter parsers compiled to WASM need libc symbols (especially a C allocator)—we provide arborium-sysroot which re-exports dlmalloc and other essentials for wasm32-unknown-unknown.
HTML — custom elements like <a-k> instead of <span class="keyword">. More compact markup. No JavaScript required.
Traditional <span class="keyword">fn</span>
arborium <a-k>fn</a-k>
ANSI — 24-bit true color for terminal applications.
macOS, Linux, Windows — tree-sitter handles generating native crates for these platforms. Just add the dependency and go.
WebAssembly — that one's hard. Compiling Rust to WASM with C code that assumes a standard library is tricky. We provide a sysroot that makes this work, enabling Rust-on-the-frontend scenarios like this demo.
Add to your Cargo.toml:
arborium = { version = "2", features = ["lang-rust"] }
Then highlight code:
let html = arborium::highlight("rust", source)?;
Add this to your HTML and all <pre><code> blocks get highlighted automatically:
<script src="https://cdn.jsdelivr.net/npm/@arborium/arborium@1/dist/arborium.iife.js"></script>
Your code blocks should look like this:
<pre><code class="language-rust">fn main() {}</code></pre>
<!-- or -->
<pre><code data-lang="rust">fn main() {}</code></pre>
<!-- or just let it auto-detect -->
<pre><code>fn main() {}</code></pre>
Configure via data attributes:
<script src="..."
data-theme="github-light" <!-- theme name -->
data-selector="pre code" <!-- CSS selector -->
data-manual <!-- disable auto-highlight -->
data-cdn="unpkg"></script> <!-- jsdelivr | unpkg | custom URL -->
With data-manual, call window.arborium.highlightAll() when ready.
For bundlers or manual control:
import { loadGrammar, highlight } from '@arborium/arborium';
const html = await highlight('rust', sourceCode);
Grammars are loaded on-demand from jsDelivr (configurable).
Highlight TOML, shell, and other languages in your rustdoc. Create arborium-header.html:
<script defer src="https://cdn.jsdelivr.net/npm/@arborium/arborium@1/dist/arborium.iife.js"></script>
Then in Cargo.toml:
[package.metadata.docs.rs]
rustdoc-args = ["--html-in-header", "arborium-header.html"]
If you maintain docs.rs or rustdoc, you could integrate arborium directly! Either merge this PR for native rustdoc support, or use arborium-rustdoc as a post-processing step:
# Process rustdoc output in-place
arborium-rustdoc ./target/doc ./target/doc-highlighted
It streams through HTML, finds <pre class="language-*"> blocks, and highlights them in-place. Works with rustdoc's theme system.
crates.io · docs.rs · See it in action!
An incremental static site generator with zero-reload live updates via WASM DOM patching, Sass/SCSS, image processing, font subsetting, and arborium-powered syntax highlighting.
Nothing to configure—it just works. Arborium is built in and automatically highlights all code blocks.
101 languages included, each behind a feature flag. Enable only what you need, or use all-languages for everything.
Each feature flag comment includes the grammar's license, so you always know what you're shipping.
The highlighter supports themes for both HTML and ANSI output.
Bundled themes:
fn main() {
let x = 42;
println!("Hello");
}
Alabaster
fn main() {
let x = 42;
println!("Hello");
}
Ayu Dark
fn main() {
let x = 42;
println!("Hello");
}
Ayu Light
fn main() {
let x = 42;
println!("Hello");
}
Catppuccin Frappé
fn main() {
let x = 42;
println!("Hello");
}
Catppuccin Latte
fn main() {
let x = 42;
println!("Hello");
}
Catppuccin Macchiato
fn main() {
let x = 42;
println!("Hello");
}
Catppuccin Mocha
fn main() {
let x = 42;
println!("Hello");
}
Cobalt2
fn main() {
let x = 42;
println!("Hello");
}
Dayfox
fn main() {
let x = 42;
println!("Hello");
}
Desert256
fn main() {
let x = 42;
println!("Hello");
}
Dracula
fn main() {
let x = 42;
println!("Hello");
}
EF Melissa Dark
fn main() {
let x = 42;
println!("Hello");
}
GitHub Dark
fn main() {
let x = 42;
println!("Hello");
}
GitHub Light
fn main() {
let x = 42;
println!("Hello");
}
Gruvbox Dark
fn main() {
let x = 42;
println!("Hello");
}
Gruvbox Light
fn main() {
let x = 42;
println!("Hello");
}
Kanagawa Dragon
fn main() {
let x = 42;
println!("Hello");
}
Light Owl
fn main() {
let x = 42;
println!("Hello");
}
Lucius Light
fn main() {
let x = 42;
println!("Hello");
}
Melange Dark
fn main() {
let x = 42;
println!("Hello");
}
Melange Light
fn main() {
let x = 42;
println!("Hello");
}
Monokai
fn main() {
let x = 42;
println!("Hello");
}
Nord
fn main() {
let x = 42;
println!("Hello");
}
One Dark
fn main() {
let x = 42;
println!("Hello");
}
Rosé Pine Moon
fn main() {
let x = 42;
println!("Hello");
}
Rustdoc Ayu
fn main() {
let x = 42;
println!("Hello");
}
Rustdoc Dark
fn main() {
let x = 42;
println!("Hello");
}
Rustdoc Light
fn main() {
let x = 42;
println!("Hello");
}
Solarized Dark
fn main() {
let x = 42;
println!("Hello");
}
Solarized Light
fn main() {
let x = 42;
println!("Hello");
}
Tokyo Night
fn main() {
let x = 42;
println!("Hello");
}
Zenburn
Custom themes can be defined programmatically using RGB colors and style attributes (bold, italic, underline, strikethrough).
Each grammar includes the full tree-sitter runtime embedded in its WASM module. This adds a fixed overhead to every grammar bundle, on top of the grammar-specific parser tables.
Smallest -
Average -
Largest -
Total -
Sort:
| Language | C Lines | Size | Distribution |
|---|
Every grammar is compiled to WASM with aggressive size optimizations. Here's the complete build pipeline:
We compile with nightly Rust using -Zbuild-std to rebuild the standard library with our optimization flags:
-Cpanic=immediate-abort Skip unwinding machinery
-Copt-level=s Optimize for size, not speed
-Clto=fat Full link-time optimization across all crates
-Ccodegen-units=1 Single codegen unit for maximum optimization
-Cstrip=symbols Remove debug symbols
Generate JavaScript bindings with --target web for ES module output.
Final size optimization pass with Binaryen's optimizer:
-Oz Aggressive size optimization
--enable-bulk-memory Faster memory operations
--enable-mutable-globals Required for wasm-bindgen
--enable-simd SIMD instructions where applicable
Despite all these optimizations, WASM bundles are still large because each one embeds the full tree-sitter runtime. We're exploring ways to share the runtime across grammars, but that's the architecture trade-off for now.
Those use regex-based tokenization (TextMate grammars). Regexes can't count brackets, track scope, or understand structure—they just pattern-match.
Tree-sitter actually parses your code into a syntax tree, so it knows that fn is a keyword only in the right context, handles deeply nested structures correctly, and recovers gracefully from syntax errors.
IDEs with LSP support (like rust-analyzer) can do even better with semantic highlighting—they understand types and dependencies across files—but tree-sitter gets you 90% of the way there without needing a full language server.
Arbor is Latin for tree (as in tree-sitter), and -ium denotes a place or collection (like aquarium, arboretum).
It's a place where tree-sitter grammars live.
Yes! Open an issue on the repo with a link to the grammar.
We'll review it and add it if the grammar and highlight queries are in good shape.
When doing full-stack Rust, it's nice to have exactly the same code on the frontend and the backend.
Rust crates compile to both native and WASM, so you get one dependency that works everywhere.
Tree-sitter uses table-driven LR parsing. The grammar compiles down to massive state transition tables—every possible parser state and every possible token gets an entry.
These tables are optimized for O(1) lookup speed, not size. A complex grammar like TypeScript can have tens of thousands of states.
The tradeoff is worth it: you get real parsing (not regex hacks) that handles edge cases correctly and recovers gracefully from syntax errors.