RareFever RareFever - 6 months ago 36
Ruby Question

Why does ruby's JSON parser eat my backslash?

The following example in JSON format contains one backslash, and if I run

JSON.load
, the backslash disappears:

JSON.load('{ "88694": { "regex": ".*?\. (CVE-2015-46055)" } }')
# => {"88694"=>{ "regex"=>".*?. (CVE-2015-46055)"}}


How can I keep the backslash?

My goal is to have this structure, and whenever I need, read the file, load the JSON into Hash, and search for those regular expressions.

UPDATE 1

here is an example what I want.

irb> "stack.overflow"[/.*?\./]
=> "stack."


I can't pass the regex from JSON to my string in order to catch that ".", because the "\." disappears.

Answer
str = '{ "88694": { "regex": ".*?\. (CVE-2015-46055)" } }'
  #=> "{ \"88694\": { \"regex\": \".*?\\. (CVE-2015-46055)\" } }"

str.chars
  #=> ["{", " ", "\"", "8", "8", "6", "9", "4", "\"", ":", " ", "{", " ",
  #   "\"", "r", "e", "g", "e", "x", "\"", ":", " ", "\"", ".", "*", "?",
  #   "\\", ".",
  #   ~~~   ~~                                        
  #   " ", "(",..., "}", " ", "}"]

This shows us that str does indeed contain a backslash character followed by a period. The reason is that str is enclosed in single quotes. \. would only be treated as an escaped period if str were enclosed in double quotes:

 "{ '88694': { 'regex': '.*?\. (CVE-2015-46055)' } }".chars[25,3]
   #=> ["?", ".", " "] 

The return value of str converts the single-quoted string to a double-quoted string:

"{ \"88694\": { \"regex\": \".*?\\. (CVE-2015-46055)\" } }"

\\ is one backslash character followed by a period. With the double quotes the period can now be escaped, but it is not preceded by a backslash, only by a backspace character.

Now let's add another backslash and see what happens:

str1 = '{ "88694": { "regex": ".*?\\. (CVE-2015-46055)" } }' 
str1.chars == str.chars
  #=> true

The result is the same. That is because single quotes support the escape sequence \\ (single backslash) (and only one other: \' [single quote]).

Now let's add a third backslash:

str2 = '{ "88694": { "regex": ".*?\\\. (CVE-2015-46055)" } }'   
str2.chars
  #=> ["{", " ", "\"", "8", "8", "6", "9", "4", "\"", ":", " ", "{", " ",
  #   "\"", "r", "e", "g", "e", "x", "\"", ":", " ", "\"", ".", "*", "?",
  #   "\\", "\\", ".",
  #   ~~~~  ~~~~  ~~~                                        
  #   " ", "(",..., "}", " ", "}"]

Surprised? \\ produces one backslash character (escaped backslash in single quotes), \ products a second backslash character (backslash in single quotes) and . is a period in single quotes.

We obtain:

JSON.parse(str)
  #=> {"88694"=>{"regex"=>".*?. (CVE-2015-46055)"}} 
JSON.parse(str1)
  #=> {"88694"=>{"regex"=>".*?. (CVE-2015-46055)"}} 
JSON.parse(str2)
  #=> {"88694"=>{"regex"=>".*?\\. (CVE-2015-46055)"}} 

str2 is what we want, as

JSON.parse(str2)["88694"]["regex"].chars[2,4]   
  #=> ["?", "\\", ".", " "] 

It appears that JSON treats two successive backslash characters as one backslash character. See @Jordan's comment.

Perhaps a reader can elaborate what JSON is doing here.

Comments