Nikolas Charalambidis Nikolas Charalambidis - 2 months ago 8
Java Question

Underscore as the name of variable

Considering the following code:

String _‎ = "One ";
String _‏ = "two ";
String _‎‏ = "three ";
System.out.println( _ ‎+ _ ‏+ _ ‎‏);


It surprisingly prints

One two three


I wonder how can the underscore
_
become 3 different variables at once in the 4th line. I am aware of legit characters using in the variable naming. Oracle Doc sais it clearly:


An identifier is an unlimited-length sequence of Java letters and Java digits, the first of which must be a Java letter.

A "Java letter" is a character for which the method Character.isJavaIdentifierStart(int) returns true.

The "Java letters" include uppercase and lowercase ASCII Latin letters A-Z (\u0041-\u005a), and a-z (\u0061-\u007a), and, for historical reasons, the ASCII underscore (_, or \u005f) and dollar sign ($, or \u0024). The $ character should be used only in mechanically generated source code or, rarely, to access pre-existing names on legacy systems.


It means underscore named
_
variable is allowed and there is nothing said about a standalone charachter. Naming all three variables with a letter (ex.
a
) gives me obviously an error. What do I miss here?

Answer

Java source code supports unicode characters for variable names (Java ain't Fortran 77 you know).

You actually have different underscore characters in your code. Verify that by using your favourite hexadecimal editor.

There's nothing special going on here: just a use of a fancy obfuscation trick.


To clarify, the code appears to actually be:

String _\u200E = "One ";
String _\u200F‏ = "two ";
String _\u200E\u200F = "three ";

And the println() doesn't compile, because it actually:

System.out.println( _ ‎\u200E+ _ ‏\u200F+ _ ‎‏\u200E\u200F);

And those special characters are:
200E: LEFT-TO-RIGHT MARK
200F: RIGHT-TO-LEFT MARK