Martin Preusse Martin Preusse - 1 year ago 52
JSON Question

Process large JSON stream with jq

I get a very large JSON stream (several GB) from

and try to process it with

The relevant output I want to parse with
is packed in a document representing the result structure:

"columns": ["n"],

// get this
"data": [
{"row": [{"key1": "row1", "key2": "row1"}], "meta": [{"key": "value"}]},
{"row": [{"key1": "row2", "key2": "row2"}], "meta": [{"key": "value"}]}
// ... millions of rows

"errors": []

I want to extract the
data with
. This is simple:

curl XYZ | jq -r -c '.results[0].data[0].row[]'


{"key1": "row1", "key2": "row1"}
{"key1": "row2", "key2": "row2"}

However, this always waits until
is completed.

I played with the
option which is made for dealing with this. I tried the following command but is also waits until the full object is returned from

curl XYZ | jq -n --stream 'fromstream(1|truncate_stream(inputs)) | .[].data[].row[]'

Is there a way to 'jump' to the
field and start parsing
one by one without waiting for closing tags?

Answer Source

(1) The vanilla filter you would use would be as follows:

jq -r -c '.results[0].data[].row'

(2) One way to use the streaming parser here would be to use it to process the output of .results[0].data, but the combination of the two steps will probably be slower than the vanilla approach.

(3) You may wish to try something along these lines:

jq -n --stream 'inputs
      | select(length==2)
      | select( .[0]|[.[0],.[2],.[4]] == ["results", "data", "row"])
      | [ .[0][6], .[1]] '

For the illustrative input (modified to make it valid JSON), the output would be:

[ "key", "value1" ] [ "key", "value2" ]