Stuart Stuart - 3 months ago 19
Ruby Question

How does Ruby JSON.parse differ to OJ.load in terms of allocating memory/Object IDs

This is my first question and I have tried my best to find an answer - I have looked everywhere for an answer but haven't managed to find anything concrete to answer this in both the oj docs and ruby json docs and here.

Oj is a gem that serves to improve serialization/deserialization speeds and can be found at: https://github.com/ohler55/oj

I noticed this difference when I tried to dump and parse a hash with a NaN contained in it, twice, and compared the two, i.e.

# Create json Dump
dump = JSON.dump ({x: Float::NAN})
# Create first JSON load
json_load = JSON.parse(dump, allow_nan: true)
# Create second JSON load
json_load_2 = JSON.parse(dump, allow_nan: true)
# Create first OJ load
oj_load = Oj.load(dump, :mode => :compat)
# Create second OJload
oj_load_2 = Oj.load(dump, :mode => :compat)

json_load == json_load_2 # Returns true
oj_load == oj_load_2 # Returns false


I always thought NaN could not be compared to NaN so this confused me for a while until I realised that json_load and json_load_2 have the same object ID and oj_load and oj_load_2 do not.

Can anyone point me in the direction of where this memory allocation/object ID allocation occurs or how I can control that behaviour with OJ?

Thanks and sorry if this answer is floating somewhere on the internet where I could not find it.

Additional info:
I am running Ruby 1.9.3.

Here's the output from my tests re object IDs:

puts Float::NAN.object_id; puts JSON.parse(%q({"x":NaN}), allow_nan: true)["x"].object_id; puts JSON.parse(%q({"x":NaN}), allow_nan: true)["x"].object_id
70129392082680
70129387898880
70129387898880

puts Float::NAN.object_id; puts Oj.load(%q({"x":NaN}), allow_nan: true)["x"].object_id; puts Oj.load(%q({"x":NaN}), allow_nan: true)["x"].object_id
70255410134280
70255410063100
70255410062620


Perhaps I am doing something wrong?

Answer

I believe that is a deep implementation detail. Oj does this:

if (ni->nan) {
  rnum = rb_float_new(0.0/0.0);
}

I can't find a Ruby equivalent for that, Float.new doesn't appear to exist, but it does create a new Float object every time (from an actual C's NaN it constructs on-site), hence different object_ids.

Whereas Ruby's JSON module uses (also in C) its own JSON::NaN Float object everywhere:

CNaN = rb_const_get(mJSON, rb_intern("NaN"));

That explains why you get different NaNs' object_ids with Oj and same with Ruby's JSON.


No matter what object_ids the resulting hashes have, the problem is with NaNs. If they have the same object_ids, the enclosing hashes are considered equal. If not, they are not.

According to the docs, Hash#== uses Object#== for values that only outputs true if and only if the argument is the same object (same object_id). This contradicts NaN's property of not being equal to itself.

Spectacular. Inheritance gone haywire.


One could, probably, modify Oj's C code (and even make a pull request with it) to use a constant like Ruby's JSON module does. It's a subtle change, but it's in the spirit of being compat, I guess.

Comments