Compiler ยท Jun 22, 2020

Introducing string literal types in BuckleScript version 8.2

Highlights of our newest changes to the internal representation and how they will benefit our users.
Hongbo Zhang
Compiler Team

String literal types in BuckleScript

String literal types was introduced by TypeScript to model JavaScript behavior, it's a relatively new concept since most type systems are runtime encoding agnostic. However, to smooth the user experience when writing bindings to existing JS API, we are introducing string literal types which is unique in several behaviors compared with TypeScript: it supports type inference, pattern match and can be attached to data.

Vanilla string literal types

The notation in Reason for string literal types is like this `hello, it will be compiled into "hello", the difference is that `hello is given a type so that you can not mix it with other strings.

Take such code snippet below as an example:

RE
let encoding = (enc) => switch (enc) { | `utf8 => 0 | `ascii => 1 | `utf16 => 2 };

It will be compiled into

JS
function encoding(en) { if (en === "ascii") { return 1; } else if (en === "utf16") { return 2; } else { return 0; } }

If you pass a random encoding, e.g, encoding (`ucs32), you get a type error:

This expression has type [> `ucs32 ] but an expression was expected of type [< `ascii | `utf16 | `utf8 ] The second variant type does not allow tag(s) `ucs32

Another thing you can observe from the generated JS is that since the compiler can guarantee that the input could only be `utf8, `ascii,`utf16, it will skip the comparison with "utf8" when the first two are compared.

If we add a wild catch to match any encoding

RE
let encoding = (enc) => switch (enc) { | `utf8 => 0 | `ascii => 1 | `utf16 => 2 | _ => 3 };

It will generate JS as below:

JS
function encoding(en) { if (en === "utf8") { return 0; } else if (en === "ascii") { return 1; } else if (en === "utf16") { return 2; } else { return 3; } }

Declaring types for string literal types

Note all string literal types can be inferred, this is great for develop experience when you are doing the development. When things get more stable, it would be nice to give string literal types a name as below:

RE
type utf = [ | `utf8 | `utf19 ]; type ascii = [ | `ascii ]

The cool thing is that you can creat union types by simply put them together:

RE
type encoding = [ | utf | ascii ]

The compiler even supports a sugar over named string literal types:

RE
let classify = (enc) => switch (enc) { | #utf => "utf" // string literals belong to utf | #ascii => "asci" // string literals belog to ascii };

The compiler would generate well optimized code as below:

JS
function classify(enc) { if (enc === "ascii") { return "asci"; } else { return "utf"; } }

String literal types attached to data

Since Reason is a typed language, you can not mix data of different types in a collection, for example, you will get a type error when writing code like this: [ 3, "3" ].

The deep reason is that if the compiler allows you to do such things, after you box different types of data in a single collection, it is hard to give such collection a type and process it later.

With string literal types, you can do things like this:

RE
[ 3 -> `Int , "3" -> `String ]

Note the generated code for 3 -> `Int`, "3"-> `String would be:

JS
{ NAME: "Int", VAL : 3} { NAME: "String", VAL : "3"}

And you can also write code to process such collections:

RE
let handle = (xs) => Belt.List.map( xs, fun | `Int(n) => n | `String(s) => String.length(s), );

The generated code would be:

JS
function handle(xs) { return Belt_List.map(xs, function (param) { if (param.NAME === "Int") { return param.VAL; } else { return param.VAL.length; } }); }

To conclude, string literal types give users a convenient way to mix data with different types together and process it via pattern matching later.

Declaring types for string literal types attached to data

Type inference is great during development, user can also write down the formal types for string literal types attched to data:

RE
type number_or_string = [ | `Int(int) | `String(string) ];

Further reading

Here we only cover some basic usage of string literal types, user can refer here for more advanced stuff. The type theory is almost the same, however, we adapt it to make sure it's compiled into string literals to match the JS runtime.

Want to read more?
Back to Overview