Simon H Simon H - 4 months ago 62
Node.js Question

Mongodb: data versioning with search

Related to Ways to implement data versioning in MongoDB and structure of documents for versioning of a time series on mongodb

What data structure should I adopt for versioning when I also need to be able to handle queries?

Suppose I have 8500 documents of the form

{ _id: '12345-11',
noFTEs: 5
}


Each month I get details of a change to
noFTEs
in about 30 docs, I want to store the new data along with the previous one(s), together with a date.

That would seem to result in:

{ _id: '12345-11',
noFTEs: {
'2015-10-28T00:00:00+01:00': 5,
'2015-1-8T00:00:00+01:00': 3
}
}


But I also want to be able to do searches on the most recent data (e.g.
noFTEs > 4
, and the element should be considered as 5, not 3). At that stage I all I know is I want to use the most recent data, and will not know the key. So an alternative would be an array

{ _id: '12345-11',
noFTEs: [
{date: '2015-10-28T00:00:00+01:00', val: 5},
{date: '2015-1-8T00:00:00+01:00', val: 3}
}
}


Another alternative - as suggested by @thomasbormans in the comments below - would be

{ _id: '12345-11',
versions: [
{noFTEs: 5, lastModified: '2015-10-28T00:00:00+01:00', other data...},
{noFTEs: 3, lastModified: '2015-1-8T00:00:00+01:00', other...}
}
}


I'd really appreciate some insights about considerations I need to make before jumping all the way in, I fear I am resulting in a query that is pretty high workload for Mongo. (In practise there are 3 other fields that can be combined for searching, and one of these is also likely to see changes over time.)

Answer

To add versioning without compromising usability and speed of access for the most recent data, consider creating two collections: one with the most recent documents and one to archive the old versions of the documents when they get changed.

You can use currentVersionCollection.findAndModify to update a document while also receiving the previous (or new, depending on parameters) version of said document in one command. You then just need to remove the _id of the returned document, add a timestamp and/or revision number (when you don't have these already) and insert it into the archive collection.

By storing each old version in an own document you also avoid document growth and prevent documents from bursting the 16MB document limit when they get changed a lot.