romain-nio romain-nio - 11 days ago 4
JSON Question

MarkLogic - Best solution between collection and index

I have several JSON like that :

[{
"type": "car",
"field1": "test"
}, {
"type": "bike",
"field1": "test"
}]


I stored them in MarkLogic 8.4 and I want to perform some search query on them to retrieve documents according to the type (eg : Find all documents with the type "car").

I have two potential solutions :


  • Set Marklogic collections to each document. Example : put "car" and "bike" collection on the example document. In my search query I can add a collection restriction.

  • Put an index on "type" field of each JSON



Is one method is better than another one in term of performances and/or best practices ?

Thanks,
Romain.

Answer

Try cts.jsonPropertyValueQuery:

cts.search(cts.jsonPropertyValueQuery("type", "car"))

The Universal Index should have the information you need.

Edit to expand on my answer: Both solutions you mentioned require storing additional information. In the case described, the Universal Index already has the information you need, making it the preferred solution. This approach would stop being my preferred choice if the jsonPropertyValueQuery became ambiguous; that is, if there was more than one type property per document. In that case, the query would match against any of the type properties.

If that were the case, putting a JSON property range index on the type property wouldn't help, as the range index would still contain all instances of the type property.

To handle multiple types within a document you would have two choices:

  1. use collections
  2. use a path range index

Of the two, I like the first. It's flexible -- you can use it even if you have documents with different structures in your database. In that way, it may "future proof" your project. The tradeoff is that your code needs to manage your documents' collections when doing an insert. That's pretty simple to do though.

In terms of performance, either of these approaches will do well with queries, but option two will have slightly more work to do during indexing. MarkLogic will need to check whether the configured path exists in a document, and if so, update the index accordingly. That's a minor difference, but has potential to add up.

Comments