selanac82 selanac82 - 1 year ago 44
Javascript Question

Storing large array in MongoDB

I'm working on a little side project which has a search capability. I'm using typeahead.js attached to a REST api built with expressJS and mongoDB. I'm wondering what the best approach to two problems I have it. I'm primarily a front-end guy just starting out with Node and MongoDB. Here are the two issues I need help with. But first a little background to better understand the issues.

The site I'm building allows you to upload videos. You can add tags to these videos. When searching for a video I want to be able to search through these tags using the typeahead.js. Just like YouTube.

So here are the issues.

1 - I have a "tags" collection in MongoDB. When uploading a video I take the tags for that video and add them to this collection which I'll use for predictive searching. As time progresses this collection should have plenty of tags to search through. The issue I'm having is how to insert only the unique tags (the ones that don't already exist). For example say I want to insert the following document into MongoDB:

tags: "tag1, tag2, tag3, tag4, tag5, tag6, tag7, tag8"

The collection already has "tag1, tag2, tag4 and tag7". So I only want to insert 3, 5, 6 and 8. My issues/question is what would be the best approach to do this. Should I just first query the collection, parse through it and compare each tag, separate the ones that don't exist and then "append" them to the collection? The issue I see with this is that, again, as time progresses this will be alot to parse through. So I'm not sure what the best approach here is.

2 - Would storing all of the tags in a simple array in a collection be the best approach? In time this array will be EXTREMEMLY large. Again I'm not a database guy, so I don't have a great understanding of how to approach an issue like this.

As always any and all help is much appreciated.

Answer Source

Since mongodb can't do joins I would store the tags in each video document a la myVideo.tags = ['sports', 'baseball', 'pitcher']. Then to power your autosuggest I would periodically map/reduce across the videos collection and output the set of active tags to a separate tags collection. You could even compute a popularity score and store something like {tag: 'baseball', score: 156} for the case where the 'baseball' tag was used in 156 videos, and use that to sort your autosuggest results so that more popular tags are shown earlier when the user is typing 'ba' for example 'baseball' is listed before 'baking' because it's a more likely correct completion vs being alphabetically second.

Here's an example of exactly this straight out of the mongodb cookbook.

To point 2 in your question, nope. Never store an unbounded-length set of data as an array within a mongodb document. There's a maximum document size (currently 16MB), so anything that will just grow and grow over time must be a collection of distinct documents.