I recently noticed that, String.replaceAll(regex,regex) behaves very weirdly when it comes to the escape-character "\"(slash)
For example consider there is a string with filepath -
String text = "E:\\dummypath"
@Peter Lawrey's answer describes the mechanics. Basically, the "problem" is that backslash is an escape character in both Java string literals, and in the mini-language of regexes. So when you use a string literal to represent a regex, there are two sets of escaping to consider ... depending on what you want the regex to mean.
But why is it like that?
Basically, it is a historical thing. Java originally didn't have regexes at all. The syntax rules for Java String literals were borrowed from C / C++, which also didn't have built-in regex support. Awkwardness of double escaping didn't become apparent in Java until they added regex support in the form of the
Pattern class ... in Java 1.4.
So how do other languages manage to avoid this?
text.replaceAll("\\\n","/")all give the same output?
The first case is a newline character at the String level. The Java regex language treats all non-special characters as matching themselves.
The second case is a backslash followed by an "n" at the String level. The Java regex language interprets a backslash followed by an "n" as a newline.
The final case is a backslash followed by a newline character at the String level. The Java regex language doesn't recognize this as a specific (regex) escape sequence. However a backslash followed by any non-alphabetic character means the latter character. So, a backslash followed by a newline character ... means the same thing as a newline.