Intuiting Pratt Parsing

Love Pratt parsing! Not a compiler guy, but I've spent way too many hours reflecting on parsing. I remember trying to get though the dragon book so many times and reading all about formal grammar etc. Until I landed on; recursive descent parsing + Pratt for expressions. Super simple technique, and for me is sufficient. I'm sure it doesn't cover all cases, but just for toy languages it feels like we can usually do everything with 2-token lookahead.

Not to step on anyone's toes, I just don't feel that formal grammar theory is that important in practice. :^)

I can recommend anyone reading pratts original paper. Its written in a very cool and badass style.

https://dl.acm.org/doi/epdf/10.1145/512927.512931

> I’ve read many articles on the same topic but never found it presented this way - hopefully N + 1 is of help to someone.

Can confirm; yes it was helpful! I've never thought seriously about parsing and I've read occasionally (casually) about Pratt parsing, but this is the first time it seemed like an intuitive idea I'll remember.

(Then I confused myself by following some references and remembering the term "precedence climbing" and reading e.g. https://www.engr.mun.ca/~theo/Misc/pratt_parsing.htm by the person who coined that term, but nevermind — the original post here has still given me an idea I think I'll remember.)

The latest implementation of Picol has a Tcl-alike [expr] implemented in 40 lines of code that uses Pratt-style parsing: https://github.com/antirez/picol/blob/main/picol.c#L490

You can either use the stack in an intuitive way, or you can change the tree directly in a somewhat less intuitive way without recursion. Essentially either DF or BF. I don’t see how it matters much anymore with stacks that grow automatically, but it’s good to understand.

An even simpler way imo, is explicit functions instead of a precedence table, then the code pretty much has the same structure as EBNF.

Need to parse * before +? Begin at add, have it call parse_mul for its left and right sides, and so on.

  parse_mul() {
    left = parse_literal()
    while(is_mul_token()) { // left associative
      right = parse_literal()
      make_mul_node(left, right)
    }
  }

  parse_add() {
    left = parse_mul()
    while(is_add_token()) { // left associative
      right = parse_mul()
      make_add_node(left, right)
    }
  }

Then just add more functions as you climb up the precedence levels.

Also if you're looking into this area you'll find there is another algorithm called "Precedence climbing", which is really the same thing with some insignificant differences in how precedence is encoded.

There's also the "shunting yard" algorithm, which is basically the iterative version of these algorithms (instead of recursive). It is usually presented with insufficient error checking, so it allows invalid input, but there's actually no reason you have to do it like that.

> I’ve read many articles on the same topic but never found it presented this way - hopefully N + 1 is of help to someone.

Not to step on anyone's toes, I just don't feel that formal grammar theory is that important in practice. :^)

The Dragon book is not very good, to be honest.

It was probably decent when all you had was something like Pascal and you wanted to write a C compiler.

Parsing and compiling and interpreting etc are all much more at home in functional languages. Much easier to understand there. And once you do, then you can translate back into imperative.

For parsing: by default you should be using parser combinators.

Until you need to do more than all-or-nothing parsing :) see tree-sitter for example, or any other efficient LSP implementation of incremental parsing.

I am a compiler guy, and I completely agree. Parsing is not that hard and not that important. Recursive descent + pratt expressions is almost always the practical choice.

It's not for toy languages. Most big compilers use recursive descent parsing.

> Not to step on anyone's toes, I just don't feel that formal grammar theory is that important in practice. :^)

exactly this ! a thousand times this !

Not to step on anyone's toes, I just don't feel that formal grammar theory is that important in practice. :^)

Well, it depends how formal you're talking about. I have to say that the standard you mention, recursive descent parsing + Pratt for expressions. actually requires you to understand what a formal language is - that it's a "thing" that can't (or shouldn't) be an object or a data structure but exists abstractly before any objects created by the program.

Moreover, the standard way of producing a recursive descend parser is to begin with your language in Chomsky normal form or some human understandable format and then convert to Greibach Normal form and that specification converts readily to your series of recursive functions. So all language transforms are useful to know (though you can skip steps if you have a good intuition of your language).

Quick other one: To parse infix expressions, every time you see "x·y | (z | w)", find the operator of least binding power: In my example, I've given "|" less binding power than "·". Anyway, this visually breaks the expression into two halves: "x·y" and "(z | w)". Recursively parse those two subexpressions. Essentially, that's it.

The symbols "·" and "|" don't mean anything - I've chosen them to be visually intuitive: The "|" is supposed to look like a physical divider. Also, bracketed expressions "(...)" or "{...}" should be parsed first.

Wikipedia mentions that a variant of this got used in FORTRAN I. You could also speed up my naive O(n^2) approach by using Cartesian trees, which you can build using something suspiciously resembling precedence climbing.

I can recommend anyone reading pratts original paper. Its written in a very cool and badass style.

https://dl.acm.org/doi/epdf/10.1145/512927.512931

> Its written in a very cool and badass style.

Out of curiosity, what do you mean by this? Do you mean you like the prose, or the typesetting, or...?

For some reason I struggled to get my head around Pratt parsing. Then I read an offhand comment on Reddit that said to start with a recursive descent parser and add table parsing to that. Once I did that it all clicked.

The latest implementation of Picol has a Tcl-alike [expr] implemented in 40 lines of code that uses Pratt-style parsing: https://github.com/antirez/picol/blob/main/picol.c#L490

Love Picol, and love this! When I first revisited Tcl, I was a bit miffed about needing [expr] but now really appreciate both it and the normal Tcl syntax.

I am a compiler guy, and I completely agree. Parsing is not that hard and not that important. Recursive descent + pratt expressions is almost always the practical choice.

The Dragon book is not very good, to be honest.

It was probably decent when all you had was something like Pascal and you wanted to write a C compiler.

Parsing and compiling and interpreting etc are all much more at home in functional languages. Much easier to understand there. And once you do, then you can translate back into imperative.

For parsing: by default you should be using parser combinators.

Is there a production compiler out there that doesn't use recursive descent, preferably constructed from combinators? Table-driven parsers seem now to be a "tell" of an old compiler or a hobby project.

I was just going into the second quarter of compiler design when the dragon book came out. My copy was still literally “hot of the press” — still warm from the ink baking ovens. It was worlds better that anything else available at the time.

The Dragon book wasn't good for me either but I'd disagree about using parser combinators. The problem that I'd see the Dragon book having is basically starting to use concepts (phases of compilation) before it introduces and motivates them in the abstract. I can see how people who already know these concepts can look at the Dragon book and say "oh, that's a good treatment of this" so perhaps it's good reference but it's problematic for a class and terrible to pick up and try to read as a stand alone (which I did back in Berkeley in the 80s).

As far as I can tell, parser combinators are just one way that promises to let "write a compiler without understanding abstract languages" but all these methods actually wind-up being libraries that are far complicated than gp's "recursive descent + pratt parsing", which is easy once you understand the idea of an abstract language.

An even simpler way imo, is explicit functions instead of a precedence table, then the code pretty much has the same structure as EBNF.

Need to parse * before +? Begin at add, have it call parse_mul for its left and right sides, and so on.

  parse_mul() {
    left = parse_literal()
    while(is_mul_token()) { // left associative
      right = parse_literal()
      make_mul_node(left, right)
    }
  }

  parse_add() {
    left = parse_mul()
    while(is_add_token()) { // left associative
      right = parse_mul()
      make_add_node(left, right)
    }
  }

Then just add more functions as you climb up the precedence levels.

You lose in versatility, then you can't add user-defined operators, which is pretty easy with a Pratt parser.

With a couple of function pointers you can climb precedence with just functions:

  parse_left_to_right(with(), is_token()) {
    left = with()
    while(is_token()) {
      right = with()
      left = operate(left, right, operator)
    }
    ret left;
  }

  p0() { ret lex digit or ident; };
  p1() { ret parse_left_right(p0, is_mul); };
  p2() { ret parse_left_right(p1, is_add); };

... and so on for all operators

> Not to step on anyone's toes, I just don't feel that formal grammar theory is that important in practice. :^)

exactly this ! a thousand times this !

It's not for toy languages. Most big compilers use recursive descent parsing.

Until you need to do more than all-or-nothing parsing :) see tree-sitter for example, or any other efficient LSP implementation of incremental parsing.

Love Picol, and love this! When I first revisited Tcl, I was a bit miffed about needing [expr] but now really appreciate both it and the normal Tcl syntax.

I think even the theory of Regular Languages is somewhat overdone: You can get the essence of what NFAs are without really needing NFAs. You can get O(n) string matching without formally implementing NFAs, or using any other formal model like regex-derivatives. In fact, thinking in terms of NFAs makes it harder to see how to implement negation (or "complement" if you prefer to call it that) efficiently. It's still only linear time!

The need for NFA/DFA/derivative models is mostly unnecessary because ultimately, REG is just DSPACE(O(1)). That's it. Thinking in any other way is confusing the map with the territory. Furthermore, REG is extremely robust, because we also have REG = DSPACE(o(log log n)) = NSPACE(o(log log n)) = 1-DSPACE(o(log n)). For help with the notation, see here: https://en.wikipedia.org/wiki/DSPACE

Language design benefits from parser generators that can point out ambiguities and verify a language is easy to parse.

It is easily possible to parse at > 1MM lines per second with a well designed grammar and handwritten parser. If I'm editing a file with 100k+ lines, I likely have much bigger problems than the need for incremental parsing.

> Its written in a very cool and badass style.

Out of curiosity, what do you mean by this? Do you mean you like the prose, or the typesetting, or...?

He is a bit offensive towards traditional academia that favors BNF and parser generators. It's been a while since a read it but I remember e.g. a rhetoric question (not exactly cited but by meaning): "Has anyone learned a programming language by reading the BNF?"

The style is very good and fun to read for someone who also reads other more boring papers.

I cannot say what this person means, and I have never read this paper before, but just the fourth paragraph of the paper has piqued my interest and I will read it all.

An even easier approach is to give all infix operators the same precedence and force the programmer to group subexpressions.

With a couple of function pointers you can climb precedence with just functions:

  parse_left_to_right(with(), is_token()) {
    left = with()
    while(is_token()) {
      right = with()
      left = operate(left, right, operator)
    }
    ret left;
  }

  p0() { ret lex digit or ident; };
  p1() { ret parse_left_right(p0, is_mul); };
  p2() { ret parse_left_right(p1, is_add); };

... and so on for all operators

Oh, I was talking much more about how you can first learn how to write a compiler. I wasn't talking about how you write a production industry-strength compiler.

Btw, I mentioned parser combinators: those are basically just a front-end. Similar to regular expressions. The implementation can be all kinds of things, eg could be recursive descent or a table or backtracking or whatever. (Even finite automata, if your combinators are suitably restricted.)

The thing about LR parsers is that since it is parsing bottom-up, you have no idea what larger syntactic structure is being built, so error recovery is ugly, and giving the user a sensible error message is a fool’s errand.

In the end, all the hard work in a compiler is in the back-end optimization phases. Put your mental energy there.

Some people appreciate that an LR/LALR parser generator can prove non-ambiguity and linear time parse-ability of a grammar. A couple of examples are the creator of the Oil shell, and one of the guys responsible for Rust.

It does make me wonder though about why grammars have to be so complicated that such high-powered tools are needed. Isn't the gist of LR/LALR that the states of an automaton that can parse CFGs can be serialised to strings, and the set of those strings forms a regular language? Once you have that, many desirable "infinitary" properties of a parsing automaton can be automatically checked in finite time. LR and LALR fall out of this, in some way.

You lose in versatility, then you can't add user-defined operators, which is pretty easy with a Pratt parser.

You can have user-defined operators with plain old recursive descent.

Consider if you had functions called parse_user_ops_precedence_1, parse_user_ops_precedence_2, etc. These would simply take a table of user-defined operators as an argument (or reference some shared/global state), and participate in the same recursive callstack as all your other parsing functions.

Language design benefits from parser generators that can point out ambiguities and verify a language is easy to parse.

The style is very good and fun to read for someone who also reads other more boring papers.

I cannot say what this person means, and I have never read this paper before, but just the fourth paragraph of the paper has piqued my interest and I will read it all.

It's not just speed - incremental parsing allows for better error recovery. In practice, this means that your editor can highlight the code as-you-type, even though what you're typing has broken the parse tree (especially the code after your edit point).

It does not follow that a generated parser would make sense in production code.

Not to step on anyone's toes, I just don't feel that formal grammar theory is that important in practice. :^)

In the end, all the hard work in a compiler is in the back-end optimization phases. Put your mental energy there.

You can have user-defined operators with plain old recursive descent.

Oh, I was talking much more about how you can first learn how to write a compiler. I wasn't talking about how you write a production industry-strength compiler.

I used a small custom parser combinator library to parse Fortran from raw characters (since tokenization is so context-dependent), and it's worked well.

Production compilers must have robust error recovery and great error messages, and those are pretty straightforward in recursive descent, even if ad hoc.

It does not follow that a generated parser would make sense in production code.

An even easier approach is to give all infix operators the same precedence and force the programmer to group subexpressions.

You can always write lisp but most people can read code better that doesnt have that many (((()))))))

Production compilers must have robust error recovery and great error messages, and those are pretty straightforward in recursive descent, even if ad hoc.

I used a small custom parser combinator library to parse Fortran from raw characters (since tokenization is so context-dependent), and it's worked well.

You can always write lisp but most people can read code better that doesnt have that many (((()))))))

I'm sure there's a middle ground which still gives you some of the metaprogramming power of Lisp. OTOH this: https://www.gingerbill.org/article/2026/02/21/does-syntax-ma...

2026-03-26

You already know that a + b * c + d is calculated as a + (b * c) + d. But how do you encode that knowledge precisely enough for a machine to act on it?

The most common solution employed by compilers is to make use of a tree known as an abstract syntax tree. In an AST, each operator sits above its operands, and evaluation works bottom-up: resolve the children, then apply the operation.

     +
    / \
   +   d
  / \
 a   *
    / \
   b   c

This tree encodes the desired ordering (a + (b * c)) + d in a format that is very convenient to work with programmatically.

But of course, people (for the most part) don’t write programs as trees. This means we face the problem of deriving this structure from flat text.

This is known as parsing. It has been the focus of decades of computer science research. It has also, in many cases, been wildly overcomplicated.

Simplifying

The difficulty in parsing lies with mixed precedence. To be precise, cases where precedence changes direction. Let’s imagine our users only ever wrote programs of either increasing or decreasing precedence. What would that mean for our tree representation?

In the case of decreasing precedence, we repeatedly evaluate the leftmost operator as it is higher precedence - multiplication before addition, addition before comparison, and so on. The first operator sits deepest in the tree, the last the shallowest. The resulting tree is left-leaning.

Decreasing

       <
      / \
     +   d
    / \
   *   c
  / \
 a   b

You can imagine what happens when precedence is increasing. It’s the exact opposite: the leftmost operator is now the shallowest and the rightmost the deepest, as each operator depends on the result to its right. The tree is right-leaning.

Increasing

   >
  / \
 a   +
    / \
   b   *
      / \
     c   d

Now, what is the most reasonable encoding of equal precedence?

It depends on the operation. Convention favours a left-to-right evaluation for arithmetic, known as left associativity. Some language features favour the opposite: assignment in C, for instance, is right associative.

Let’s suppose (for now) that all operators are left associative. This is represented by a left-leaning tree, as the leftmost operator must be evaluated earlier and therefore rests deeper in the tree.

We should refine our definitions accordingly. Given a sequence of operators, let x_i be the precedence of the ith operator:

Decreasing: weakly decreasing - x_i >= x_{i+1}. This now includes the case of equal precedence.
Increasing: strictly increasing - x_i < x_{i+1}.

This means any two equal-precedence operators are encoded exactly like two decreasing ones.

Extending

A natural continuation is to consider expressions with exactly one change in direction. We’ll focus on the more interesting of the two resulting cases: a transition from increasing to decreasing precedence. The reverse simply continues a right-leaning tree from the tip of the existing left-leaning.

Consider the expression I = (a > b + c * d), given by the tree below.

   >
  / \
 a   +
    / \
   b   *
      / \
     c   d

The tree is right-leaning, as expected for increasing precedence. Now, suppose we extend I with a new operator whose precedence is equal to or less than *. This would mean the increasing property no longer holds, and so continuing a right-leaning tree would produce an incorrect encoding.

We need a left-leaning tree somewhere, but where? The visualisation of each possible precedence level for the new operator begins to reveal a pattern:

[I] * e:

   >
  / \
 a   +
    / \
   b   *
      / \
     *   e
    / \
   c   d

[I] + e:

   >
  / \
 a   +
    / \
   +   e
  / \
 b   *
    / \
   c   d

[I] == e:

     ==
    /  \
   >    e
  / \
 a   +
    / \
   b   *
      / \
     c   d

The left-leaning tree starts at the first place it can: at an operator of equal or lesser precedence. This lower precedence operator must evaluate later than at least the current one, but there could be several prior operators that also must evaluate first. Those operators are found along the spine of the right-leaning tree.

The == case is the clearest example. All previous operators are higher precedence than ==, so the entire spine of the tree must evaluate first, and therefore must be its left child.

The observation is as follows: when we encounter a transition operator, we must walk back up the spine, collecting every operator that evaluates first. That collected chain - a right-leaning subtree - becomes the left child of the new operator, which starts a left-leaning tree of its own.

Since any expression is just a sequence of these transitions, this is all we need. This walk-back procedure is Pratt parsing.

Parsing

We can hopefully make things a little more concrete with some pseudocode, extending the right-leaning case to handle a transition as we did earlier.

Right Leaning

A right-leaning tree can be built by recursing onto ourselves and then building the tree bottom up.

def parse():
    left = leaf()

    if peek() is not None:
        op = advance()
        right = parse()
        return Node(op, left, right)

    return left

parse() first defers to leaf() to handle a literal such as a, consuming it from the token stream before checking the next token with peek(). The next token (an operator for a valid program) is consumed by advance() which moves the parser to the next token. parse() finally calls itself for the right-hand side.

The parser advances through the tokens as it recurses, and constructs the tree as it returns. For our earlier tree [I]:

-- Down: advance

(1) parse(0)
    a [>] b  +  c  *  d

(2) parse(prec(>))
    a  >  b [+] c  *  d

(3) parse(prec(+))
    a  >  b  +  c [*] d

(4) parse(prec(*))
    a  >  b  +  c  *  d [None]


-- Up: build

(4)  d

(3) [*]
    / \
   c   d

(2) [+]
    / \
   b   *
      / \
     c   d

(1) [>]
    / \
   a   +
      / \
     b   *
        / \
       c   d

The use of recursion is how we will backtrack to find a continuation point.

Until then, our parser produces incorrect trees for decreasing precedence. To prevent parsing tokens we shouldn’t, we can forward the current precedence to the recursive child call:

def parse(prev_prec=0):
    left = leaf()

    if peek() is not None and prec(peek()) > prev_prec:
        op = advance()
        right = parse(prec(op))
        return Node(op, left, right)

    return left

This works, but we can simplify by giving end-of-file its own token with the lowest precedence - peek() would cause the condition to fail naturally, letting us drop the null check:

def parse(prev_prec=0):
    left = leaf()

    if prec(peek()) > prev_prec:
        op = advance()
        right = parse(prec(op))
        return Node(op, left, right)

    return left

Left Leaning

As parse() recurses, it pushes a frame onto the call stack with the left child and minimum precedence. The call stack, representing the yet-to-be-built spine, is always of increasing precedence.

This means that when unwinding, we visit each level in decreasing order. So, the first level where peek() can bind is also the correct level: every level above is strictly lower in precedence, and we already know it can’t go deeper because parse() wouldn’t have returned. The greedy choice is the only correct choice.

But if is only capable of taking that greedy choice once - we need to take it every time, because anything else is incorrect. So, we replace if with while:

def parse(prev_prec=0):
    left = leaf()

    while prec(peek()) > prev_prec:
        op = advance()
        right = parse(prec(op))
        left = Node(op, left, right)

    return left

This is the complete Pratt parser. The while loop is the walkback procedure we described earlier: when a transition operator appears, parse() returns up the call stack until it finds the right level, then the loop consumes it and continues with the left-leaning subtree.

Here’s the trace [I] * e, where I = a > b + c * d:

-- Down: advance
(1) a [>] b  +  c  *  d  *  e
(2) a  >  b [+] c  *  d  *  e
(3) a  >  b  +  c [*] d  *  e
(4) a  >  b  +  c  *  d [*] e  FAIL

-- Up: build

(4)  d

(3)  iteration 1:
     [*]
     / \
    c   d

     iteration 2:
     [*]
     / \
    *   e
   / \
  c   d

(2) [+]
    / \
   b   *
      / \
     *   e
    / \
   c   d

(1) [>]
    / \
   a   +
      / \
     b   *
        / \
       *   e
      / \
     c   d

The left-leaning subtree (c * d) * e is built entirely within frame 3’s while loop.

Right Associativity

In practice every operator has two types of precedence: left and right. Pratt refers to this as left and right binding power, or LBP and RBP. All of our operators so far have an equal LBP and RBP.

An operator’s LBP determines how strongly it attracts the expression to its left - this is what peek() checks in the while condition. Its RBP determines how strongly it attracts the expression to its right - this is what gets passed as prev_prec to the recursive call.

For left-associative operators, LBP and RBP are equal. When two * operators meet, the second *’s LBP is not greater than the first *’s RBP, so parse() does not recurse but instead loops at the same level to build left.

We want the opposite for right-associative operators. a = b = c should parse as a = (b = c). The second = should be consumed by the recursive call, not by the loop. We can achieve this by setting RBP lower than LBP - the recursive child’s precedence threshold is low enough that a consecutive operator still passes the > check and gets consumed deeper.

def parse(prev_prec=0):
    left = leaf()

    while lbp(peek()) > prev_prec:
        op = advance()
        right = parse(rbp(op))
        left = Node(op, left, right)

    return left

To ensure this, we set rbp = lbp for left-associative operators, and rbp = lbp - 1 for right-associative.

Summary

I’ve often found Pratt parsing to be presented as if it were a clever trick. Well, it is, but with a very simple geometric intuition: trees are either left-leaning or right-leaning depending on precedence. When precedence drops, walk back up the spine until you find where the new operator belongs.

I’ve read many articles on the same topic but never found it presented this way - hopefully N + 1 is of help to someone.

Hacker Times