SupposeXYZ SupposeXYZ - 4 months ago 13
JSON Question

Convert json using jq based on specific constraints

I have a json file 'OpenEnded_mscoco_val2014.json'.The json file contains 121,512 questions.

Here is some sample :

"questions": [
"question": "What is the table made of?",
"image_id": 350623,
"question_id": 3506232
"question": "Is the food napping on the table?",
"image_id": 350623,
"question_id": 3506230
"question": "What has been upcycled to make lights?",
"image_id": 350623,
"question_id": 3506231
"question": "Is this an Spanish town?",
"image_id": 8647,
"question_id": 86472


I used
jq -r '.questions | [map(.question), map(.image_id), map(.question_id)] | @csv' OpenEnded_mscoco_val2014_questions.json >> temp.csv
to convert json into csv.

But here output in csv is question followed by image_id which is what above code does.

The expected output is :

"What is table made of",350623,3506232
"Is the food napping on the table?",350623,3506230

Also is it possible to filter only results having
image_id <= 10000
and to
group questions having same image_id
? e.g. 1,2,3 result of json can be combined to have 3 questions, 1 image_id, 3 question_id.

EDIT : The first problem is solved by
possible duplicate question
.I would like to know if is it possible to invoke comparison operator on command line in jq for converting json file. In this case get all fields from json if
image_id <= 10000


1) Given your input (suitably elaborated to make it valid JSON), the following query generates the CSV output as shown:

$ jq -r '.questions[] | [.question, .image_id, .question_id] | @csv'

"What is the table made of?",350623,3506232
"Is the food napping on the table?",350623,3506230
"What has been upcycled to make lights?",350623,3506231
"Is this an Spanish town?",8647,86472

The key thing to remember here is that @csv requires a flat array, but as with all jq filters, you can feed it a stream.

2) To filter using the criterion .image_id <= 10000, just interpose the appropriate select/1 filter:

| select(.image_id <= 10000)
| [.question, .image_id, .question_id]
| @csv

3) To sort by image_id, use sort_by(.image_id)

| sort_by(.image_id)
| [.question, .image_id, .question_id]
| @csv

4) To group by .image_id you would pipe the output of the following pipeline into your own pipeline:

.questions | group_by(.image_id)

You will, however, have to decide exactly how you want to combine the grouped objects.