Huy Huy - 6 months ago 104
Ruby Question

Removing backslash (escape character) from a string

I am trying to work on my own JSON parser. I have an input string that I want to tokenize:

input = "{ \"foo\": \"bar\", \"num\": 3}"


How do I remove the escape character
\
so that it is not a part of my tokens?

Currently, my solution using
delete
works:

tokens = input.delete('\\"').split("")


=> ["{", " ", "f", "o", "o", ":", " ", "b", "a", "r", ",", " ", "n", "u", "m", ":", " ", "3", "}"]


However, when I try to use
gsub
, it fails to find any
\"
.

tokens = input.gsub('\\"', '').split("")


=> ["{", " ", "\"", "f", "o", "o", "\"", ":", " ", "\"", "b", "a", "r", "\"", ",", " ", "\"", "n", "u", "m", "\"", ":", " ", "3", "}"]


I have two questions:

1. Why does gsub not work in this case?

2. How do I remove the backslash (escape) character? I currently have to remove the backslash character with the quotes to make this work.

Answer

When you write:

input = "{ \"foo\": \"bar\", \"num\": 3}"

The actual string stored in input is:

{ "foo": "bar", "num": 3}

The escape \" here is interpreted by Ruby parser, so that it can distinguish between the boundary of a string (the left most and the right most "), and a normal character " in a string (the escaped ones).

String#delete deletes a character set specified the first parameter, rather than a pattern. All characters that is in the first parameter will be removed. So by writing

input.delete('\\"')

You got a string with all \ and " removed from input, rather than a string with all \" sequence removed from input. This is wrong for your case. It may cause unexpected behavior some time later.

String#gsub, however, substitute a pattern (either regular expression or plain string).

input.gsub('\\"', '')

means find all \" (two characters in a sequence) and replace them with empty string. Since there isn't \ in input, nothing got replaced. What you need is actually:

input.gsub('"', '')