Luau is the primary programming language to place the facility of semantic subtyping within the palms of thousands and thousands of creators.
Minimizing false positives
One of many points with kind error reporting in instruments just like the Script Evaluation widget in Roblox Studio is false positives. These are warnings which might be artifacts of the evaluation, and don’t correspond to errors which might happen at runtime. For instance, this system
native x = CFrame.new() native y if (math.random()) then y = CFrame.new() else y = Vector3.new() finish native z = x * y
reviews a sort error which can’t occur at runtime, since CFrame
helps multiplication by each Vector3
and CFrame
. (Its kind is ((CFrame, CFrame) -> CFrame) & ((CFrame, Vector3) -> Vector3)
.)
False positives are particularly poor for onboarding new customers. If a type-curious creator switches on typechecking and is instantly confronted with a wall of spurious pink squiggles, there’s a robust incentive to right away change it off once more.
Inaccuracies in kind errors are inevitable, since it’s not possible to determine forward of time whether or not a runtime error shall be triggered. Kind system designers have to decide on whether or not to reside with false positives or false negatives. In Luau that is decided by the mode: strict
mode errs on the facet of false positives, and nonstrict
mode errs on the facet of false negatives.
Whereas inaccuracies are inevitable, we attempt to take away them each time attainable, since they end in spurious errors, and imprecision in type-driven tooling like autocomplete or API documentation.
Subtyping as a supply of false positives
One of many sources of false positives in Luau (and plenty of different comparable languages like TypeScript or Move) is subtyping. Subtyping is used each time a variable is initialized or assigned to, and each time a operate is known as: the kind system checks that the kind of the expression is a subtype of the kind of the variable. For instance, if we add varieties to the above program
native x : CFrame = CFrame.new() native y : Vector3 | CFrame if (math.random()) then y = CFrame.new() else y = Vector3.new() finish native z : Vector3 | CFrame = x * y
then the kind system checks that the kind of CFrame
multiplication is a subtype of (CFrame, Vector3 | CFrame) -> (Vector3 | CFrame)
.
Subtyping is a really helpful function, and it helps wealthy kind constructs like kind union (T | U
) and intersection (T & U
). For instance, quantity?
is applied as a union kind (quantity | nil)
, inhabited by values which might be both numbers or nil
.
Sadly, the interplay of subtyping with intersection and union varieties can have odd outcomes. A easy (however moderately synthetic) case in older Luau was:
native x : (quantity?) & (string?) = nil native y : nil = nil y = x -- Kind '(quantity?) & (string?)' couldn't be transformed into 'nil' x = y
This error is attributable to a failure of subtyping, the outdated subtyping algorithm reviews that (quantity?) & (string?)
is just not a subtype of nil
. It is a false constructive, since quantity & string
is uninhabited, so the one attainable inhabitant of (quantity?) & (string?)
is nil
.
That is a synthetic instance, however there are actual points raised by creators attributable to the issues, for instance https://devforum.roblox.com/t/luau-recap-july-2021/1382101/5. Presently, these points principally have an effect on creators making use of subtle kind system options, however as we make kind inference extra correct, union and intersection varieties will develop into extra widespread, even in code with no kind annotations.
This class of false positives now not happens in Luau, as we’ve moved from our outdated method of syntactic subtyping to another known as semantic subtyping.
Syntactic subtyping
AKA “what we did earlier than.”
Syntactic subtyping is a syntax-directed recursive algorithm. The attention-grabbing circumstances to cope with intersection and union varieties are:
- Reflexivity:
T
is a subtype ofT
- Intersection L:
(T₁ & … & Tⱼ)
is a subtype ofU
each time among theTᵢ
are subtypes ofU
- Union L:
(T₁ | … | Tⱼ)
is a subtype ofU
each time the entireTᵢ
are subtypes ofU
- Intersection R:
T
is a subtype of(U₁ & … & Uⱼ)
each timeT
is a subtype of the entireUᵢ
- Union R:
T
is a subtype of(U₁ | … | Uⱼ)
each timeT
is a subtype of among theUᵢ
.
For instance:
- By Reflexivity:
nil
is a subtype ofnil
- so by Union R:
nil
is a subtype ofquantity?
- and:
nil
is a subtype ofstring?
- so by Intersection R:
nil
is a subtype of(quantity?) & (string?)
.
Yay! Sadly, utilizing these guidelines:
quantity
isn’t a subtype ofnil
- so by Union L:
(quantity?)
isn’t a subtype ofnil
- and:
string
isn’t a subtype ofnil
- so by Union L:
(string?)
isn’t a subtype ofnil
- so by Intersection L:
(quantity?) & (string?)
isn’t a subtype ofnil
.
That is typical of syntactic subtyping: when it returns a “sure” consequence, it’s right, however when it returns a “no” consequence, it is perhaps flawed. The algorithm is a conservative approximation, and since a “no” consequence can result in kind errors, this can be a supply of false positives.
Semantic subtyping
AKA “what we do now.”
Fairly than pondering of subtyping as being syntax-directed, we first take into account its semantics, and later return to how the semantics is applied. For this, we undertake semantic subtyping:
- The semantics of a sort is a set of values.
- Intersection varieties are considered intersections of units.
- Union varieties are considered unions of units.
- Subtyping is considered set inclusion.
For instance:
Kind | Semantics |
---|---|
quantity |
{ 1, 2, 3, … } |
string |
{ “foo”, “bar”, … } |
nil |
{ nil } |
quantity? |
{ nil, 1, 2, 3, … } |
string? |
{ nil, “foo”, “bar”, … } |
(quantity?) & (string?) |
{ nil, 1, 2, 3, … } ∩ { nil, “foo”, “bar”, … } = { nil } |
and since subtypes are interpreted as set inclusions:
Subtype | Supertype | As a result of |
---|---|---|
nil |
quantity? |
{ nil } ⊆ { nil, 1, 2, 3, … } |
nil |
string? |
{ nil } ⊆ { nil, “foo”, “bar”, … } |
nil |
(quantity?) & (string?) |
{ nil } ⊆ { nil } |
(quantity?) & (string?) |
nil |
{ nil } ⊆ { nil } |
So in accordance with semantic subtyping, (quantity?) & (string?)
is equal to nil
, however syntactic subtyping solely helps one path.
That is all effective and good, but when we need to use semantic subtyping in instruments, we’d like an algorithm, and it seems checking semantic subtyping is non-trivial.
Semantic subtyping is difficult
NP-hard to be exact.
We are able to cut back graph coloring to semantic subtyping by coding up a graph as a Luau kind such that checking subtyping on varieties has the identical consequence as checking for the impossibility of coloring the graph
For instance, coloring a three-node, two colour graph could be completed utilizing varieties:
kind Purple = "pink" kind Blue = "blue" kind Shade = Purple | Blue kind Coloring = (Shade) -> (Shade) -> (Shade) -> boolean kind Uncolorable = (Shade) -> (Shade) -> (Shade) -> false
Then a graph could be encoded as an overload operate kind with subtype Uncolorable
and supertype Coloring
, as an overloaded operate which returns false
when a constraint is violated. Every overload encodes one constraint. For instance a line has constraints saying that adjoining nodes can’t have the identical colour:
kind Line = Coloring & ((Purple) -> (Purple) -> (Shade) -> false) & ((Blue) -> (Blue) -> (Shade) -> false) & ((Shade) -> (Purple) -> (Purple) -> false) & ((Shade) -> (Blue) -> (Blue) -> false)
A triangle is analogous, however the finish factors additionally can’t have the identical colour:
kind Triangle = Line & ((Purple) -> (Shade) -> (Purple) -> false) & ((Blue) -> (Shade) -> (Blue) -> false)
Now, Triangle
is a subtype of Uncolorable
, however Line
is just not, because the line could be 2-colored. This may be generalized to any finite graph with any finite variety of colours, and so subtype checking is NP-hard.
We cope with this in two methods:
- we cache varieties to cut back reminiscence footprint, and
- hand over with a “Code Too Complicated” error if the cache of varieties will get too massive.
Hopefully this doesn’t come up in apply a lot. There’s good proof that points like this don’t come up in apply from expertise with kind methods like that of Normal ML, which is EXPTIME-complete, however in apply you need to exit of your technique to code up Turing Machine tapes as varieties.
Kind normalization
The algorithm used to determine semantic subtyping is kind normalization. Fairly than being directed by syntax, we first rewrite varieties to be normalized, then verify subtyping on normalized varieties.
A normalized kind is a union of:
- a normalized nil kind (both
by no means
ornil
) - a normalized quantity kind (both
by no means
orquantity
) - a normalized boolean kind (both
by no means
ortrue
orfalse
orboolean
) - a normalized operate kind (both
by no means
or an intersection of operate varieties) and many others
As soon as varieties are normalized, it’s easy to verify semantic subtyping.
Each kind could be normalized (sigh, with some technical restrictions round generic kind packs). The vital steps are:
- eradicating intersections of mismatched primitives, e.g.
quantity & bool
is changed byby no means
, and - eradicating unions of features, e.g.
((quantity?) -> quantity) | ((string?) -> string)
is changed by(nil) -> (quantity | string)
.
For instance, normalizing (quantity?) & (string?)
removes quantity & string
, so all that’s left is nil
.
Our first try at implementing kind normalization utilized it liberally, however this resulted in dreadful efficiency (complicated code went from typechecking in lower than a minute to working in a single day). The rationale for that is annoyingly easy: there may be an optimization in Luau’s subtyping algorithm to deal with reflexivity (T
is a subtype of T
) that performs an inexpensive pointer equality verify. Kind normalization can convert pointer-identical varieties into semantically-equivalent (however not pointer-identical) varieties, which considerably degrades efficiency.
Due to these efficiency points, we nonetheless use syntactic subtyping as our first verify for subtyping, and solely carry out kind normalization if the syntactic algorithm fails. That is sound, as a result of syntactic subtyping is a conservative approximation to semantic subtyping.
Pragmatic semantic subtyping
Off-the-shelf semantic subtyping is barely completely different from what’s applied in Luau, as a result of it requires fashions to be set-theoretic, which requires that inhabitants of operate varieties “act like features.” There are two explanation why we drop this requirement.
Firstly, we normalize operate varieties to an intersection of features, for instance a horrible mess of unions and intersections of features:
((quantity?) -> quantity?) | (((quantity) -> quantity) & ((string?) -> string?))
normalizes to an overloaded operate:
((quantity) -> quantity?) & ((nil) -> (quantity | string)?)
Set-theoretic semantic subtyping doesn’t assist this normalization, and as a substitute normalizes features to disjunctive regular type (unions of intersections of features). We don’t do that for ergonomic causes: overloaded features are idiomatic in Luau, however DNF is just not, and we don’t need to current customers with such non-idiomatic varieties.
Our normalization depends on rewriting away unions of operate varieties:
((A) -> B) | ((C) -> D) → (A & C) -> (B | D)
This normalization is sound in our mannequin, however not in set-theoretic fashions.
Secondly, in Luau, the kind of a operate software f(x)
is B
if f
has kind (A) -> B
and x
has kind A
. Unexpectedly, this isn’t at all times true in set-theoretic fashions, resulting from uninhabited varieties. In set-theoretic fashions, if x
has kind by no means
then f(x)
has kind by no means
. We don’t need to burden customers with the concept that operate software has a particular nook case, particularly since that nook case can solely come up in useless code.
In set-theoretic fashions, (by no means) -> A
is a subtype of (by no means) -> B
, it doesn’t matter what A
and B
are. This isn’t true in Luau.
For these two causes (that are largely about ergonomics moderately than something technical) we drop the set-theoretic requirement, and use pragmatic semantic subtyping.
Negation varieties
The opposite distinction between Luau’s kind system and off-the-shelf semantic subtyping is that Luau doesn’t assist all negated varieties.
The widespread case for wanting negated varieties is in typechecking conditionals:
-- initially x has kind T if (kind(x) == "string") then -- on this department x has kind T & string else -- on this department x has kind T & ~string finish
This makes use of a negated kind ~string
inhabited by values that aren’t strings.
In Luau, we solely permit this sort of typing refinement on check varieties like string
, operate
, Half
and so forth, and not on structural varieties like (A) -> B
, which avoids the widespread case of normal negated varieties.
Prototyping and verification
Through the design of Luau’s semantic subtyping algorithm, there have been modifications made (for instance initially we thought we had been going to have the ability to use set-theoretic subtyping). Throughout this time of speedy change, it was vital to have the ability to iterate rapidly, so we initially applied a prototype moderately than leaping straight to a manufacturing implementation.
Validating the prototype was vital, since subtyping algorithms can have sudden nook circumstances. Because of this, we adopted Agda because the prototyping language. In addition to supporting unit testing, Agda helps mechanized verification, so we’re assured within the design.
The prototype doesn’t implement all of Luau, simply the practical subset, however this was sufficient to find refined function interactions that may most likely have surfaced as difficult-to-fix bugs in manufacturing.
Prototyping is just not excellent, for instance the primary points that we hit in manufacturing had been about efficiency and the C++ commonplace library, that are by no means going to be caught by a prototype. However the manufacturing implementation was in any other case pretty easy (or not less than as easy as a 3kLOC change could be).
Subsequent steps
Semantic subtyping has eliminated one supply of false positives, however we nonetheless have others to trace down:
- Overloaded operate purposes and operators
- Property entry on expressions of complicated kind
- Learn-only properties of tables
- Variables that change kind over time (aka typestates)
The hunt to take away spurious pink squiggles continues!
Acknowledgments
Because of Giuseppe Castagna and Ben Greenman for useful feedback on drafts of this submit.
Alan coordinates the design and implementation of the Luau kind system, which helps drive lots of the options of improvement in Roblox Studio. Dr. Jeffrey has over 30 years of expertise with analysis in programming languages, has been an lively member of quite a few open-source software program tasks, and holds a DPhil from the College of Oxford, England.