Sunday August 31 2025

Hacker Times

How to safely escape JSON inside HTML SCRIPT elements

Discussion (7 comments)

Listen to this article (with local TTS)

<script> tags follow unintuitive parsing rules that can break a webpage in surprising ways. Fortunately, it’s relatively straightforward to escape JSON for script tags.

Just do this

Replace < with \x3C or \u003C in JSON strings.
In PHP, use json_encode($data, JSON_HEX_TAG | JSON_UNESCAPED_SLASHES) for safe JSON in <script> tags.
In WordPress, use [wp_json_encode](https://developer.wordpress.org/reference/functions/wp_json_encode/) with the same flags.

You don’t have to take my word for it, the HTML standard recommends this type of escaping:

The easiest and safest … is to always escape an ASCII case-insensitive match for “<!--” as “\x3C!--“, “<script” as “\x3Cscript“, and “</script” as “\x3C/script“…

This post will dive deep into the exotic script tag parsing rules in order to understand how they work and why this is the appropriate way to escape JSON.

What’s so gnarly about a script tag?

Script tags are used to embed other languages in HTML. The most common example is JavaScript:

This is great, JavaScript can be embedded directly. Imagine if script tags required HTML escaping:

In fact, script tags can contain any language (not necessarily JavaScript) or even arbitrary data. In order to support this behavior, script tags have special parsing rules. For the most part, the browser accepts whatever is inside the script tag until it finds the script close tag </script>1.

So, what happens when we embed this perfectly valid JavaScript that contains a script close tag?

Oops! We can see that </script> was part of a JavaScript string, but the browser is just parsing the HTML. This script element closes prematurely, resulting in the following tree:

├─SCRIPT
│ └─#text console.log('
└─#text ')

Ok, let’s use json_encode() and we should be all set:

Now we’ve got this HTML:

</script> has become <\/script>. The JavaScript string value is preserved and the script element does not close prematurely. Perfect, right?

Not so fast, things are about to get messy

Let’s expand with a more complex example. Here’s some data used by an imaginary HTML library. We’ll escape the JSON again with json_encode2:

Our HTML page includes the following, with a safely escaped script close tag:

Lovely. We’re good at this. Let’s just ship that 🚀

🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥

Great. Production is now a blank page and we need to write a post-mortem. What happened? The HTML looks just fine. Let’s inspect the document tree:

└─SCRIPT
  └─#text {␊
              "closeComment": "-->",␊
              "closeScript": "<\/script>",␊
              "openComment": "<!--",␊
              "openScript": "<script>"␊
          }</script>␊
          <h1>Success! 🎉</h1>

The script tag did not close as expected at </script>. The script close tag and all of the subsequent HTML are part of the script tag contents.

Wait, what???

We’ve just discovered some of those unintuitive parsing rules. In short, the HTML parser entered script data double escaped state and got stuck. Yes, this does break real pages.

If you’re not steeped in HTML arcana, fear not, this handy chart should clarify things 🙃

This is a real and mostly accurate diagram of how script tag tokenization works. I’ve taken some liberties with things like end-of-file tokens and null bytes that aren’t relevant to the discussion.

You may be wondering, like I did, why HTML would work like this. Well, the web wasn’t always the mature platform we know and love today:

When JavaScript was first introduced, many browsers did not support it. So they would render the content of the script tag – the JavaScript code itself. The normal way to get around that was to put the script into a comment — things like

This kind of practice was commonplace on the web. As the web evolved, browsers continued to support the behavior so they wouldn’t break existing pages. Then, HTML5 came along and standardized the behavior so folks knew what to expect, even if it’s surprising. We can see other remnants of this practice in the HTML scripting specification:

for related historical reasons, the string “<!–” in classic scripts is actually treated as a line comment start, just like “//”.

Back to our script data double escaped state. We can simplify the diagram above to collapse some states and focus on the interesting transitions:

This diagram names some transitions <script and </script. This is true, but the tag name only matches when the name script is followed by a byte that terminates a tag name — Space, Tab, “/”, “>”, or a newline (\n, \f, \r). For example, <script-o-rama or </scripty do not transition.

To understand the problem with our example above, locate the three transitions for </script:

script data → close
script data escaped → close
‼️ script data double escaped → script data escaped ‼️

</script> does not close a script element from the script data double escaped state.

I encourage you to pause for a moment and play with this example to get a feel for how the script tag escaped states work.

Avoid the doubled escaped state

The complexity of script tag parsing and escaping comes from the escaped states. Avoid the script data double escaped state and script tags become simple. Everything until the tag closer </script> is inside the script element.

How can we avoid the double escaped state? Script tag parsing always starts in the script data state and there’s a pattern in its transitions:

</script: script data → close
<!--: script data → script data escaped

Both require “<” as their first character Everything will be handled predictably if < never appears inside of the script tag. Remember what the HTML standard on scripting said? It recommends escaping < in specific places:

[Escape] “<!--” as “\x3C!--“, “<script” as “\x3Cscript“, and “</script” as “\x3C/script” [in literals.]

PHP has the [JSON_HEX_TAG](https://www.php.net/manual/en/json.constants.php#constant.json-hex-tag) flag that will escape all < as \u003C and > \u003E. This will escape much more than is strictly necessary, but it’s sufficient and is provided by the language. Perfect!

How to escape JSON escaping in PHP

For JSON that will be printed in a script tag, use the following flags:

[JSON_HEX_TAG](https://www.php.net/manual/en/json.constants.php#constant.json-hex-tag)
All < and > are converted to \u003C and \u003E.
[JSON_UNESCAPED_SLASHES](https://www.php.net/manual/en/json.constants.php#constant.json-unescaped-slashes)
Don’t escape /.

If everything is UTF-8 (both the data and the charset of the page) you can add these flags for cleaner and shorter JSON:

[JSON_UNESCAPED_UNICODE](https://www.php.net/manual/en/json.constants.php#constant.json-unescaped-unicode)
Encode multibyte Unicode characters literally (default is to escape as \uXXXX).
[JSON_UNESCAPED_LINE_TERMINATORS](https://www.php.net/manual/en/json.constants.php#constant.json-unescaped-line-terminators)
The line terminators are kept unescaped when [JSON_UNESCAPED_UNICODE](https://www.php.net/manual/en/json.constants.php#constant.json-unescaped-unicode) is supplied. It uses the same behaviour as it was before PHP 7.1 without this constant. Available as of PHP 7.1.0.

JSON_UNESCAPED_LINE_TERMINATORS is a fun one. Before ES2019, JavaScript strings did not accept two characters U+2028 (LINE SEPARATOR) and U+2029 (PARAGRAPH SEPARATOR) that JSON strings do allow. Some valid JSON was invalid JavaScript. Since the JavaScript is a superset of JSON proposal landed in ES2019, that’s no longer the case and those characters no longer require escaping. Phew! Browser support today is very good.

JSON escaping in action

Here’s the problematic example again, now with the recommended flags:

Let’s see the printed HTML and its resulting tree:

├─SCRIPT
│ └─#text {␊
│             "closeComment": "--\u003E",␊
│             "closeScript": "\u003C/script\u003E",␊
│             "openComment": "\u003C!--",␊
│             "openScript": "\u003Cscript\u003E"␊
│         }
├─#text ␊ 
└─H1
  └─#text Success! 🎉

“Success! 🎉” is displayed and the tree structure is exactly what we expected.

What about JavaScript?

The problems with JSON seem to be solved. But what about JavaScript source text? Or what if we decide to embed XML, Python, or Haskell in a script tag? All of those are permitted but bring different challenges.

Given what we learned here, see if you can find a general solution for escaping JavaScript safely. Remember that script data double escaped state is dangerous and should be avoided. We also can’t allow the script tag to close prematurely with </script>. The path from our entry state to double-escaped looks like this:

Script data state: “<!--” transition to
Script data escaped state: “<script>” transition to
Script data double escaped state: ‼️

The diagrams in this post were generated with Mermaid and Graphviz. Their source is available in this gist. Thanks to Dennis Snell for an improved version of the reduced state graph.

It’s easiest to talk about </script> as the script close tag. Technically, it’s not strictly </script>, but a sequence of characters that looks like a script tag closer. For example </SCRIPT/> closes a script element but </script-no> does not. ↩︎
Several examples include [JSON_PRETTY_PRINT](https://www.php.net/manual/en/json.constants.php#constant.json-pretty-print) in the output for legibility. This flag is omitted from the example code. ↩︎
Script data transitions to script data less-than sign state when the < character is encountered. That is the only transition from the script data state. ↩︎