Lone Sloane Lone Sloane - 1 year ago 49
JSON Question

extract-document-data comes as xml string element in json output

I am trying to enrich my search results with some elements taken from the "matching" documents, using the query option "extract-document-data" like

<options xmlns="http://marklogic.com/appservices/search">
<extract-document-data selected="include">

When I run the search and I ask for Json output (using the header Accept:
) I get as mix of json and "strinxml" as a result:

"snippet-format": "snippet",
"total": 564,
"start": 1,
"page-length": 10,
"selected": "include",
"results": [
"index": 1,
"uri": "ENV/CHEM/NANO(2015)22/ANN5/2",
"path": "fn:doc(\"ENV/CHEM/NANO(2015)22/ANN5/2\")",
"matches": [
"path": "fn:doc(\"ENV/CHEM/NANO(2015)22/ANN5/2\")/ns2:language-version/ns2:language-version-raw-data/*:document/*:page[22]",
"extracted": {
"kind": "element",
"content": [
"&lt;title&gt;ZINC OXIDE DOSSIERANNEX 5&lt;/title&gt;",
"&lt;subject label_en=\"media\" &gt;media&lt;/subject&gt;",
"&lt;subject label_en=\"fish\" "&gt;fish&lt;/subject&gt;",

The problem here is with the "extracted" part, as you can see, it looks like the xml elements have been simply copied as string, when I would really expect them to be converted to json.

Does anybody have an idea about this problem?

Answer Source

MarkLogic won’t convert content. So, XML will remain XML when asking for JSON formatted search response. And since you can't really embed XML inside JSON, it gets serialized as a string.

You could try applying a REST transform on your search results, and use something like json:transform-to-json (probably with the custom config) to convert those on the fly. For instance something like this Server-side JavaScript transform:

/* jshint node:true,esnext:true */
/* global xdmp */

var json = require('/MarkLogic/json/json.xqy');
var config = json.config('custom');

function toJson(context, params, content) {
  'use strict';

  var response = content.toObject();

  if (response.results) {
    response.results.map(function(result) {
      if (result.extracted && result.extracted.content) {
        result.extracted.content.map(function(content, index) {
          if (content.match(/^</) && !content.match(/^<!/)) {
            result.extracted.content[index] = json.transformToJson(xdmp.unquote(content), config);

  return response;

exports.transform = toJson;

You could also convert client-side of course.