mailmindlin mailmindlin - 3 months ago 9
Javascript Question

Spaces required between keyword and literal

Looking at the output of UglifyJS2, I noticed that no spaces are required between literals and the

in
operator (e.g.,
'foo'in{foo:'bar'}
is valid).

Playing around with Chrome's DevTools, however, I noticed that hex and binary number literals require a space before the
in
keyword:

enter image description here

Internet explorer returned true to all three tests, while FireFox 48.0.1 threw a SyntaxError for the first one (
1in foo
), however it is okay with string literals (
'1'in foo==true
).

It seems that there should be no problem parsing JavaScript, allowing for keywords to be next to numeric literals, but I can't find any explicit rule in the ECMAScript specification (any of them).

Further testing shows that statements like
for(var i of[1,2,3])...
are allowed in both Chrome and FireFox (IE11 doesn't support for..of loops), and
typeof"string"
works in all three.

Which behavior is correct? Is it, in fact, defined somewhere that I missed, or are all these effects a result of idiosyncrasies of each browser's parser?

EML EML
Answer

Not an expert - I haven't done a JS compiler, but have done others.

ecma-262.pdf is a bit vague, but it's clear that an expression such as 1 in foo should be parsed as 3 input elements, which are all tokens. Each token is a CommonToken (11.5); in this case, we get numericLiteral, identifierName (yes, in is an identifierName), and identifierName. Exactly the same is true when parsing 0b1 in foo (see 11.8.3).

So, what happens when you take out the WS? It's not covered explicitly (as far as I can see), but it's common practice (in other languages) when writing a lexer to scan the longest character sequence that will match something you could potentially be looking for. The introduction to section 11 pretty much says exactly that:

The source text is scanned from left to right, repeatedly taking the longest possible sequence of code points as the next input element.

So, for 0b1in foo the lexer goes through 0b1, which matches a numeric literal, and reaches i, giving 0b1i, which doesn't match anything. So it passes the longest match (0b1) to the rest of the parser as a token, and starts again at i. It finds n, followed by WS, so passes in as the second token, and so on.

So, basically, and rather bizarrely, it looks like IE is correct.

Comments