Eric Citaire Eric Citaire - 26 days ago 11
Java Question

Does Guava provide a method to unescape a String?

I need to escape special characters in a

String
.

Guava provides the
Escaper
class, which does exactly this:

Escaper escaper = Escapers.builder()
.addEscape('[', "\\[")
.addEscape(']', "\\]")
.build();

String escapedStr = escaper.escape("This is a [test]");

System.out.println(escapedStr);
// -> prints "This is a \[test\]"


Now that I have an escaped
String
, I need to unescape it and I can't find anything in Guava to do this.

I was expecting
Escaper
to have a
unescape()
method, but it isn't the case.

Edit : I'm aware that unescaping can be tricky, even impossible in some non-sense cases.

For example, this
Escaper
usage can lead to ambiguities :

Escaper escaper = Escapers.builder()
.addEscape('@', " at ")
.addEscape('.', " dot ")
.build();


Unless the escaped data contains only email addresses and nothing more, you can't safely get your data back by unescaping it.

A good example of a safe usage of the
Escaper
is HTML entities :

Escaper escaper = Escapers.builder()
.addEscape('&', "&")
.addEscape('<', "&lt;")
.addEscape('>', "&gt;")
.build();


Here, you can safely escape any text, incorporate it in a HTML page and unescape it at any time to display it, because you covered every possible ambiguities.

In conclusion, I don't see why unescaping is so controversial. I think it is the developper's responsability to use this class properly, knowing his data and avoiding ambiguities.
Escaping, by definition, means you will eventually need to unescape. Otherwise, it's obfuscation or some other concept.

Answer

No, it does not. And apparently, this is intentional. Quoting from this discussion where Chris Povirk answered:

The use case for unescaping is less clear to me. It's generally not possible to even identify the escaped source text without a parser that understands the language. For example, if I have the following input:

String s = "foo\n\"bar\"\n\\";

Then my parser has to already understand \n, \", and \\ in order to identify that...

foo\n\"bar\"\n\\

...is the text to be "unescaped." In other words, it has to do the unescaping already. The situation is similar with HTML and other formats: We don't need an unescaper so much as we need a parser.

So it looks like you'll have to do it yourself.

Comments