Nominal Types in WebAssembly

Good lord. WebAssembly was sold as "portable assembly for the web". It's in the fricking name. Web. Assembly. Assembly for the web.

It was supposed to solve the problem of: some computers run x86, some arm, we need something that is equivalent, but portable across different cpus

What business is it for WebAssembly to know about complex types? What x86 instructions is there for `(type $t (struct i32))` ? Or doing garbage collection.

We would be better off standardizing on a subset of x86 and writing translators to arm etc. Or standardize on arm and translate to x86.

We know it can work. Apple did it with rosetta. Microsoft did it with Prism. I don't think WebAssembly implementation generate faster code than rosetta or prism.

QEMU did it simply (albeit slowly).

WebAssembly is becoming another JVM. It's not simple. It's not fast. It's not easy to use.

But now we're stuck with it and the only path is to add and add and add.

Andy jests, but I would actually like to add nominal types to Wasm (along with type imports to make them usable). No proposal yet, but maybe later this year.

This blog post mentions that you can kind of emulate nominal types by putting all your types in one rec group, but then it brushes that off as inferior to using exceptions. (Which is hilarious! Good work, Andy.) What it doesn’t make clear is that people actually use this rec group trick in practice. There are two ways to do it: you can put literally all your types in one rec group, or you can emit minimal rec groups with additional “brand types” that serve no purpose but to ensure the groups have different structures. The former solution is better for code size when the entire application is one module, but the latter solution is better if there are multiple modules involved. You don’t want to repeat every type definition in every module, and using smaller rec groups lets you define only the types that are (transitively) used in each module.

The Binaryen optimizer has to ensure that it does not accidentally give distinct types the same structural identity because that would generally be observable by casts. Most of its type optimizations therefore put all the types in one rec group. However, it does have a type merging optimization that takes the casts into account[0]. That optimization is fun because it reuses the DFA minimization code from the original equirecursive type system we were experimenting with for Wasm GC. We also have a rec group minimization optimization[1] that creates minimal rec groups (by finding strongly connected components of the type definition graph), then ensures the types remain distinct first by using different permutations of the types within a rec group and then only as necessary by adding brand types.

[0]: https://github.com/WebAssembly/binaryen/blob/main/src/passes...

[1]: https://github.com/WebAssembly/binaryen/blob/main/src/passes...

Not familiar with WebAssembly, but from the namesake was expecting the syntax to kind of resemble assembly.

He is showing S-expressions ? That is its' syntax ? I am intrigued now.

webassembly adding nominal types is like watching a toddler slowly reinvent java. next up: checked exceptions and enterprise beans

> field access is a bit odd; unlike structs which have struct.get, nominal types receive all their values via a catch handler.

I know this is meant to be silly, and I am no expert, but I kinda do like this syntax. Its like shaking the struct and seeing what falls out.

Before the managed data types extension to WebAssembly was incorporated in the standard, there was a huge debate about type equality. The end result is that if you have two types in a Wasm module that look the same, like this:

(type $t (struct i32)) (type $u (struct i32))

Then they are for all intents and purposes equivalent. When a Wasm implementation loads up a module, it has to partition the module’s types into equivalence classes. When the Wasm program references a given type by name, as in (struct.get $t 0) which would get the first field of type $t, it maps $t to the equivalence class containing $t and $u. See the spec, for more details.

This is a form of structural type equality. Sometimes this is what you want. But not always! Sometimes you want nominal types, in which no type declaration is equivalent to any other. WebAssembly doesn’t have that, but it has something close: recursive type groups. In fact, the type declarations above are equivalent to these:

(rec (type $t (struct i32))) (rec (type $u (struct i32)))

Which is to say, each type is in a group containing just itself. One thing that this allows is self-recursion, as in:

(type $succ (struct (ref null $succ)))

Here the struct’s field is itself a reference to a $succ struct, or null (because it’s ref null and not just ref).

To allow for mutual recursion between types, you put them in the same rec group, instead of each having its own:

(rec (type $t (struct i32)) (type $u (struct i32)))

Between $t and $u we don’t have mutual recursion though, so why bother? Well rec groups have another role, which is that they are the unit of structural type equivalence. In this case, types $t and $u are not in the same equivalence class, because they are part of the same rec group. Again, see the spec.

Within a Wasm module, rec gives you an approximation of nominal typing. But what about between modules? Let’s imagine that $t carries important capabilities, and you don’t want another module to be able to forge those capabilities. In this case, rec is not enough: the other module could define an equivalent rec group, construct a $t, and pass it to our module; because of isorecursive type equality, this would work just fine. What to do?

cursèd nominal typing

I said before that Wasm doesn’t have nominal types. That was true in the past, but no more! The nominal typing proposal was incorporated in the standard last July. Its vocabulary is a bit odd, though. You have to define your data types with the tag keyword:

(tag $v (param $secret i32))

Syntactically, these data types are a bit odd: you have to declare fields using param instead of field and you don’t have to wrap the fields in struct.

They also omit some features relative to isorecursive structs, namely subtyping and mutability. However, sometimes subtyping is not necessary, and one can always assignment-convert mutable fields, wrapping them in mutable structs as needed.

To construct a nominally-typed value, the mechanics are somewhat involved; instead of (struct.new $t (i32.const 42)), you use throw:

(block $b (result (ref exn)) (try_table (catch_all_ref $b) (throw $v (i32.const 42))) (unreachable))

Of course, as this is a new proposal, we don’t yet have precise type information on the Wasm side; the new instance instead is returned as the top type for nominally-typed values, exn.

To check if a value is a $v, you need to write a bit of code:

(func $is-v? (param $x (ref exn)) (result i32) (block $yep (result (ref exn)) (block $nope (try_table (catch_ref $v $yep) (catch_all $nope) (throw_ref (local.get $x)))) (return (i32.const 0))) (return (i32.const 1)))

Finally, field access is a bit odd; unlike structs which have struct.get, nominal types receive all their values via a catch handler.

(func $v-fields (param $x (ref exn)) (result i32) (try_table (catch $v 0) (throw_ref (local.get $x))) (unreachable))

Here, the 0 in the (catch $v 0) refers to the function call itself: all fields of $v get returned from the function call. In this case there’s only one, othewise a get-fields function would return multiple values. Happily, this accessor preserves type safety: if $x is not actually $v, an exception will be thrown.

Now, sometimes you want to be quite strict about your nominal type identities; in that case, just define your tag in a module and don’t export it. But if you want to enable composition in a principled way, not just subject to the randomness of whether another module happens to implement a type structurally the same as your own, the nominal typing proposal also gives a preview of type imports. The facility is direct: you simply export your tag from your module, and allow other modules to import it. Everything will work as expected!

fin

Friends, as I am sure is abundantly clear, this is a troll post :) It’s not wrong, though! All of the facilities for nominally-typed structs without subtyping or field mutability are present in the exception-handling proposal.

The context for this work was that I was updating Hoot to use the newer version of Wasm exception handling, instead of the pre-standardization version. It was a nice change, but as it introduces the exnref type, it does open the door to some funny shenanigans, and I find it hilarious that the committee has been hemming and hawwing about type imports for 7 years and then goes and ships it in this backward kind of way.

Next up, exception support in Wastrel, as soon as I can figure out where to allocate type tags for this new nominal typing facility. Onwards and upwards!

webassembly adding nominal types is like watching a toddler slowly reinvent java. next up: checked exceptions and enterprise beans

> field access is a bit odd; unlike structs which have struct.get, nominal types receive all their values via a catch handler.

I know this is meant to be silly, and I am no expert, but I kinda do like this syntax. Its like shaking the struct and seeing what falls out.

Andy jests, but I would actually like to add nominal types to Wasm (along with type imports to make them usable). No proposal yet, but maybe later this year.

[0]: https://github.com/WebAssembly/binaryen/blob/main/src/passes...

[1]: https://github.com/WebAssembly/binaryen/blob/main/src/passes...

I'm using WASM via Emscripten almost since the beginning but have never encountered 'rec' or 'struct' (or generally types beyond integers and floats). Why would WASM even need to know how structs are composed internally, instead of 'dissolving' them at compile time into offsets? Was this stuff coming in via the GC feature?

Good lord. WebAssembly was sold as "portable assembly for the web". It's in the fricking name. Web. Assembly. Assembly for the web.

It was supposed to solve the problem of: some computers run x86, some arm, we need something that is equivalent, but portable across different cpus

What business is it for WebAssembly to know about complex types? What x86 instructions is there for `(type $t (struct i32))` ? Or doing garbage collection.

We would be better off standardizing on a subset of x86 and writing translators to arm etc. Or standardize on arm and translate to x86.

We know it can work. Apple did it with rosetta. Microsoft did it with Prism. I don't think WebAssembly implementation generate faster code than rosetta or prism.

QEMU did it simply (albeit slowly).

WebAssembly is becoming another JVM. It's not simple. It's not fast. It's not easy to use.

But now we're stuck with it and the only path is to add and add and add.

This post is actually a joke, but it does bring about an important point: For an interpreter, having more information results in faster execution. WASM is much closer to Java bytecode than you might think, and SpiderMonkey/V8 are basically the JVM. WASM also undergoes multiple different stages and kinds of JIT compilation in most browsers, and detailed type and usage information helps that produce faster execution.

Also, don't forget that WASM is designed to replace JavaScript, thus it must interoperate with it to smooth the transition. Rosetta and Prism also work to smooth the transition from x86 -> ARM, and much of the difficult work that they do actually involves translating between the calling conventions of the different architectures, and making them work across binaries compiled both for and not for ARM, not with the bytecode translation. WebAssembly is designed to not have that limitation: it's much more closely aligned to JS. That's why it wouldn't make sense to use a subset of x86 or similar, as it would simply produce more work trying to get it to interface with JavaScript.

> It's not fast.

My emulators here have roughly the same performance as the same code compiled as native executable (e.g. within around 5%) - this is mostly integer bit twiddling code. Unless you hand-optimize your code beyond what portable C provides (like manually tuned SIMD intrinsics), WASM code pretty much runs at native speed these days:

https://floooh.github.io/tiny8bit/

The types are there for garbage collection, which is there for integration with the Web APIs which are all defined in terms of garbage collected objects.

> We would be better off standardizing on a subset of x86 and writing translators to arm etc. Or standardize on arm and translate to x86.

This is basically what Native Client (NaCl) was, and it was really hard to work with! We don't use it anymore and developed WASM instead.

> It's not fast.

Not disagreeing with you, but here’s an article from Akamai about how using WASM can minimize cold startup time for serverless functions.

https://www.akamai.com/blog/developers/build-serverless-func...

I guess you didn't read to the end?

> Friends, as I am sure is abundantly clear, this is a troll post :)

I mean standardizing on an x86 subset would replace wasm's native portability with a kind of 'emulated' compatibility, and this is one of wasm's strengths. If we do that, non-x86 hardware(mobile etc.) will pay the translation tax. So, keeping Wasm agnostic makes more sense anyway.

Not familiar with WebAssembly, but from the namesake was expecting the syntax to kind of resemble assembly.

He is showing S-expressions ? That is its' syntax ? I am intrigued now.

Real-world WAT (WASM text format) looks more like this (e.g. it looks like a 'structured assembly' type of thing):

    i32.const 27512
    i32.load
    local.tee $var1
    if
      i32.const 27404
      i32.load
      local.get $var1
      call_indirect (param i32)
    end

S-expressions are only used outside such instruction blocks for the 'program-structure' (e.g. see: https://developer.mozilla.org/en-US/docs/WebAssembly/Referen...). IIRC early pre-release-versions of WASM were entirely built from S-expressions and as a 'pure stack machine' (I may remember wrong though).

To see what a complete WASM blob looks like in WAT format you can go here: https://floooh.github.io/sokol-html5/clear-sapp.html, open the browser devtools, go to the 'Sources' tab and click the `clear-sapp.wasm` file).

> It's not fast.

https://floooh.github.io/tiny8bit/

The types are there for garbage collection, which is there for integration with the Web APIs which are all defined in terms of garbage collected objects.

> We would be better off standardizing on a subset of x86 and writing translators to arm etc. Or standardize on arm and translate to x86.

This is basically what Native Client (NaCl) was, and it was really hard to work with! We don't use it anymore and developed WASM instead.

I guess you didn't read to the end?

> Friends, as I am sure is abundantly clear, this is a troll post :)

> It's not fast.

Not disagreeing with you, but here’s an article from Akamai about how using WASM can minimize cold startup time for serverless functions.

https://www.akamai.com/blog/developers/build-serverless-func...

Yes, this is all part of Wasm GC. WebAssembly needs to know the structures of heap objects so that a GC can trace them and also to preserve type safety when accessing them. Treating the heap objects as uninterpreted bags of bytes wouldn't have worked because so many of their fields are references, which must remain opaque in Wasm.

Real-world WAT (WASM text format) looks more like this (e.g. it looks like a 'structured assembly' type of thing):

    i32.const 27512
    i32.load
    local.tee $var1
    if
      i32.const 27404
      i32.load
      local.get $var1
      call_indirect (param i32)
    end

This is partially true, but the standard text format also allows the instructions to be nested as S-expressions, for example:

  (i32.add
    (i32.const 0)
    (i32.const 1))

Many projects, including the official spec test suite and the Binaryen test suite, primarily use this format.

> IIRC early pre-release-versions of WASM were entirely built from S-expressions and as a 'pure stack machine' (I may remember wrong though).

Yes, the S-expressions predate WebAssembly even being a stack machine. Originally the design was that it encoded an AST, so the folded S-expression format was the only option.

There was a lot of discussion back in the day (before my time) about creating a better text format, but no one could agree on what it should be, so they just defaulted to the S-expression idea and focused on getting WebAssembly out the door.

This is partially true, but the standard text format also allows the instructions to be nested as S-expressions, for example:

  (i32.add
    (i32.const 0)
    (i32.const 1))

Many projects, including the official spec test suite and the Binaryen test suite, primarily use this format.

> IIRC early pre-release-versions of WASM were entirely built from S-expressions and as a 'pure stack machine' (I may remember wrong though).

Yes, the S-expressions predate WebAssembly even being a stack machine. Originally the design was that it encoded an AST, so the folded S-expression format was the only option.

Hacker Times

Hacker Times

Nominal Types in WebAssembly

Discussion

Discussion

cursèd nominal typing

fin