Mark Mark - 3 years ago 238
JSON Question

Jolt reference first element in array as target name

I have been looking at this for a few weeks (in the background) and am stumped on how to convert JSON data approximating a CSV into a tagged set using the NiFi JoltTransformJson processor. What I mean by this is to use the data from the first row of an array in the input as the JSON object name in the output.

As an example I have this input data:

[
[
"Company",
"Retail Cost",
"Percentage"
],
[
"ABC",
"5,368.11",
"17.09%"
],
[
"DEF",
"101.47",
"0.32%"
],
[
"GHI",
"83.79",
"0.27%"
]
]


and what I am trying to get as output is:

[
{
"Company": "ABC",
"Retail Cost": "5,368.11",
"Percentage": "17.09%"
},
{
"Company": "DEF",
"Retail Cost": "101.47",
"Percentage": "0.32%"
},
{
"Company": "GHI",
"Retail Cost": "83.79",
"Percentage": "0.27%"
}
]


I see this as primarily 2 problems: getting access to the content of the first array and then making sure that the output data does not include that first array.

I would love to post a Jolt Specification showing myself getting somewhat close, but the closest gives me the correct shape of output without the correct content. It looks like this:

[
{
"operation": "shift",
"spec": {
"*": {
"*": "[&1].&0"
}
}
}
]


But it results in an output like this:

[ {
"0" : "Company",
"1" : "Retail Cost",
"2" : "Percentage"
}, {
"0" : "ABC",
"1" : "5,368.11",
"2" : "17.09%"
}, {
"0" : "DEF",
"1" : "101.47",
"2" : "0.32%"
}, {
"0" : "GHI",
"1" : "83.79",
"2" : "0.27%"
} ]


Which clearly has the wrong object name and it has 1 too many elements in the output.

Answer Source

Can do it, but wow it is hard to read / looks like terrible regex

Spec

[
  {
    // this does most of the work, but producs an output
    //  array with a null in the Zeroth space.
    "operation": "shift",
    "spec": {
      // match the first item in the outer array and do 
      //  nothing with it
      "0": null,
      // 
      // loop over all the rest of the items in the outer array
      "*": {
        // this is rather confusing
        // "*" means match the array indices of the innner array
        // and we will write the value at that index "ABC" etc
        // to "[&1].@(2,[0].[&])"
        // "[&1]" means make the ouput be an array, and at index
        //   &1, which is the index of the outer array we are
        //   currently in.
        // Then "lookup the key" (Company, Retail Cost) using
        //  @(2,[0].[&])
        // Which is go back up the tree to the root, then 
        //  come back down into the first item of the outer array
        //  and Index it by the by the array index of the current
        //  inner array that we are at.
        "*": "[&1].@(2,[0].[&])"
      }
    }
  },
  {
    // we know the first item in the array will be null, so 
    // match it and accumulate everything into a new array
    "operation": "shift",
    "spec": {
      "0": null,
      "*": "[]"
    }
  }
]
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download