mwag mwag - 2 months ago 15
Javascript Question

using jq to break array of elements into multiple arrays of varying length

I have a JSON object that I'd like to transform using jq from one form to another (of course, I could use javascript or python and iterate, but jq would be preferable). The issue is that the input contains long arrays that needs to be broken into multiple smaller arrays whenever data stops repeating within the first array. I'm not really sure how to describe this problem so I'll just put an example out here which is hopefully more explanatory. The one safe assumption-- if it is of any help-- is that the input data is always pre-sorted on the first two elements (e.g. "row_x" and "col_y"):

input:

{
"headers": [ "col1", "col2", "col3" ],
"data": [
[ "row1","col1","b","src2" ],
[ "row1","col1","b","src1" ],
[ "row1","col1","b","src3" ],
[ "row1","col2","d","src4" ],
[ "row1","col2","e","src5" ],
[ "row1","col2","f","src6" ],
[ "row1","col3","j","src7" ],
[ "row1","col3","g","src8" ],
[ "row1","col3","h","src9" ],
[ "row1","col3","i","src10" ],
[ "row2","col1","l","src13" ],
[ "row2","col1","j","src11" ],
[ "row2","col1","k","src12" ],
[ "row2","col3","o","src15" ]
]
}


desired output:

{
"headers": [ "col1", "col2", "col3" ],
"values": [
[["b","b","b"],["d","e","f"],["g","h","i","j"]],
[["j","k","l"],null,["o"]]
],
"sources": [
[["src1","src2","src3"],["src4","src5","src6"],["src7","src8","src9","src10"]],
[["src11","src12","src13"],null,["src15"]]
]
}


Is this doable at all in jq?

Answer

Is it doable? Of course!

First you'll want to group the data by rows then columns. Then with the groups, build your values/sources arrays.

.headers as $headers | .data
    # make the data easier to access
    | map({ row: .[0], col: .[1], val: .[2], src: .[3] })
    # keep it sorted so they are in expected order in the end
    | sort_by([.row,.col,.src])
    # group by rows
    | group_by(.row)
    # create a map to each of the cols for easier access
    | map(group_by(.col)
        | reduce .[] as $col ({};
            .[$col[0].col] = [$col[] | {val,src}]
        )
    )
    # build the result
    | {
        headers: $headers,
        values: map([.[$headers[]] | [.[]?.val]]),
        sources: map([.[$headers[]] | [.[]?.src]])
    }

This will produce the following result:

{
  "headers": [ "col1", "col2", "col3" ],
  "values": [
    [
      [ "b", "b", "b" ],
      [ "d", "e", "f" ],
      [ "i", "j", "g", "h" ]
    ],
    [
      [ "j", "k", "l" ],
      [],
      [ "o" ]
    ]
  ],
  "sources": [
    [
      [ "src1", "src2", "src3" ],
      [ "src4", "src5", "src6" ],
      [ "src10", "src7", "src8", "src9" ]
    ],
    [
      [ "src11", "src12", "src13" ],
      [],
      [ "src15" ]
    ]
  ]
}