unflores unflores - 29 days ago 20
Ruby Question

Performance - ID for Mongo: BSON or String

Background



I was doing some tests to see which would be the best for a primary key. I assumed that BSON would be better than a string. When I run some tests though, I'm getting about the same results. Am I doing something wrong here or can someone confirm that this is correct?

About my tests



I have created 200k records with 2 mongoid models. I ran everything in ruby benchmark. I did three main queries, a
find(id)
query, a
where(id: id)
query and a
where(:id.in => array_of_ids)
. All of which gave me pretty similar response times.

Benchmark.bm(10) do |x|
x.report("String performance") { 100.times { ModelString.where(id: '58205ae41d41c81c5a0289e5').pluck(:id) } }
x.report("BSON performance") { 100.times { ModelBson.where(id: '581a1d271d41c82fc3030a34').pluck(:id) } }
end


Here are my models in Mongoid:

class ModelBson
include Mongoid::Document

end

class ModelString
include Mongoid::Document
field :_id, type: String, pre_processed: true, default: ->{ BSON::ObjectId.new.to_s }
end


Benchmark Results



ID miss "find" query
user system total real
String performance 0.140000 0.070000 0.210000 ( 2.187263)
BSON performance 0.280000 0.060000 0.340000 ( 2.308928)

ID hit "find" query
user system total real
String performance 0.280000 0.060000 0.340000 ( 2.392995)
BSON performance 0.190000 0.060000 0.250000 ( 2.245230)

100 IDs "in" query hit
String performance 0.850000 0.110000 0.960000 ( 9.221822)
BSON performance 0.770000 0.060000 0.830000 ( 8.055971)


db.collection.stats



{
"ns" : "model_bsons",
"count" : 199221,
"size" : 9562704,
"avgObjSize" : 48,
"numExtents" : 7,
"storageSize" : 22507520,
"lastExtentSize" : 11325440,
"paddingFactor" : 1,
"paddingFactorNote" : "paddingFactor is unused and unmaintained in 3.0. It remains hard coded to 1.0 for compatibility only.",
"userFlags" : 1,
"capped" : false,
"nindexes" : 1,
"indexDetails" : {

},
"totalIndexSize" : 6475392,
"indexSizes" : {
"_id_" : 6475392
},
"ok" : 1
}


{
"ns" : "model_strings",
"count" : 197680,
"size" : 9488736,
"avgObjSize" : 48,
"numExtents" : 7,
"storageSize" : 22507520,
"lastExtentSize" : 11325440,
"paddingFactor" : 1,
"paddingFactorNote" : "paddingFactor is unused and unmaintained in 3.0. It remains hard coded to 1.0 for compatibility only.",
"userFlags" : 1,
"capped" : false,
"nindexes" : 1,
"indexDetails" : {

},
"totalIndexSize" : 9304288,
"indexSizes" : {
"_id_" : 9304288
},
"ok" : 1
}

Answer

This is correct.

As you can see from collections stats, documents from both collections have the same size (avgObjSize field). So there is no difference between BSON ObjectID and string field size (both 12 bytes).

What really matters is the index size. Here you can notice that index size on BSON collections is about 30% smaller than on String collection, because BSON objectID can take full advantage of index prefix compression. The index size difference is too small to see a real performance change with 200 000 documents, but I guess that increasing the number of documents could show different results