Clashsoft Clashsoft - 7 months ago 12
Swift Question

How does Swift disambiguate Type Arguments in Expression Contexts?

Take a look at the following two expressions:

baz(Foo<Bar, Bar>(0))
baz(Foo < Bar, Bar > (0))


Without knowing what,
baz
,
Foo
and
Bar
are (
baz
can be a type or a method,
Foo
and
Bar
can be types or variables), there is no way of disambiguating whether the
<
represents a type argument list or a less-than operator.

// two different outcomes, difference shown with parentheses
baz((Foo<Bar,Bar>(0))) // generics
baz((Foo < Bar), (Bar > 0)) // less-than


Any sane programming language should not rely on what
baz
,
Foo
and
Bar
are when parsing an expression like this. Yet, Swift manages to disambiguate the below expression no matter where I place whitespaces:

println(Dictionary<String, String>(0))
println(Dictionary < String, String > (0))


How does the compiler manage this? And, more importantly, is there any place in the Swift Language Spec. where the rules for this are described. Looking through the
Language Reference
part of the Swift book, I only found this section:


In certain constructs, operators with a leading
<
or
>
may be split into two or more tokens. The remainder is treated the same way and may be split again. As a result, there is no need to use whitespace to disambiguate between the closing
>
characters in constructs like
Dictionary<String, Array<Int>>
. In this example, the closing
>
characters are not treated as a single token that may then be misinterpreted as a bit shift
>>
operator.


What does
certain constructs
refer to in this context? The actual grammar only contains one production rule that mentions type arguments:


explicit-member-expression → postfix-expression­
.
­identifier­generic-argument-clause­opt


Any explanation or resource would be greatly appreciated.

Answer

Thanks to @Martin R, I found the relevant part of the compiler source code, which contains a comment that explains how it resolves the ambiguity.

swift/ParseExpr.cpp, line 1533:

///   The generic-args case is ambiguous with an expression involving '<'
///   and '>' operators. The operator expression is favored unless a generic
///   argument list can be successfully parsed, and the closing bracket is
///   followed by one of these tokens:
///     lparen_following rparen lsquare_following rsquare lbrace rbrace
///     period_following comma semicolon

Basically, the compiler attempts to parse a list of types and then checks the token after the closing angle bracket. If that token is

  • a closing parenthesis, bracket or brace,
  • an opening parenthesis, bracket or period without whitespace between itself and the closing angle bracket (>(, >[, but not > (, > [),
  • an opening brace or
  • a comma or semicolon

It parses the expression as a generic call, otherwise it parses it as one or more relational expressions.

As described in the book Annotated C#, the problem is solved in a similar way in C#.