DGaffneyDC DGaffneyDC - 3 months ago 14
JSON Question

Python Script to filter arrays containing a specific value in JSON object

I have a json object that consists of one object with key 'data', that has values listed in a set of arrays. I need to return all arrays that contain the value x, but the arrays themselves do not have keys. I'm trying to write a script to enter a source file (inFile) an define an export file (outFile). Here is my data structure:

{ "data": [
["x", 1, 4, 6, 2, 7],
["y", 3, 2, 5, 8, 4],
["z", 5, 2, 5, 9, 9],
["x", 3, 7, 2, 6, 8]
]
}


And here is my current script:

import json

def jsonFilter( inFile, outFile ):
out = None;

with open( inFile, 'r') as jsonFile:
d = json.loads(json_data)
a = d['data']
b = [b for b in a if b != 'x' ]
del b
out = a


if out:
with open( outFile, 'w' ) as jsonFile:
jsonFile.write( json.dumps( out ) );

else:
print "Error creating new jsonFile!"


SOLUTION

Thanks to Rob and everyone for your help! Here's the final working command-line tool. This takes two arguments: inFile and Outfile. ~$ python jsonFilter.py inFile.json outFile.json

import json

def jsonFilter( inFile, outFile ):
# make a dictionary instead.
out = {};

with open( inFile, 'r') as jsonFile:
json_data = jsonFile.read()
d = json.loads(json_data)
# build the data you want to save to look like the original
# by taking the data in the d['data'] element filtering what you want
# elements where b[0] is 'x'
out['data'] = [b for b in d['data'] if b[0] == 'x' ]


if out:
with open( outFile, 'w' ) as jsonFile:
jsonFile.write( json.dumps( out ) );

else:
print "Error creating new JSON file!"

if __name__ == "__main__":
import argparse

parser = argparse.ArgumentParser()
parser.add_argument('inFile', nargs=1, help="Choose the in file to use")
parser.add_argument('outFile', nargs=1, help="Choose the out file to use")
args = parser.parse_args()
jsonFilter( args.inFile[0] , args.outFile[0] );

Rob Rob
Answer

First problem the query string will be true for everything (aka return the whole data set back since you are comparing b (a list) to 'x' a string

  b = [b for b in a if b != 'x' ]

What you wanted to do was:

  b = [b for b in a if b[0] != 'x' ]

The second problem is you are trying to delete the data by querying and deleting the results. Since the results contain a copy that will not delete anything from the original container.
Instead build the new data with only the elements you want, and save those. Also you were not recreating the 'data' element in your out data, so the json so the output have the same structure as the input data.

import json

def jsonFilter( inFile, outFile ):
    # make a dictionary instead.
    out = {};

    with open( inFile, 'r') as jsonFile:
       d = json.loads(json_data)
       # build the data you want to save to look like the original
       # by taking the data in the d['data'] element filtering what you want
       # elements where b[0] is 'x'
       out['data'] = [b for b in d['data'] if b[0] == 'x' ]


    if out:
        with open( outFile, 'w' ) as jsonFile:
            jsonFile.write( json.dumps( out ) );

    else:
       print "Error creating new jsonFile!"

output json data looks like:

 '{"data": [["x", 1, 4, 6, 2, 7], ["x", 3, 7, 2, 6, 8]]}'

If you did not want the output to have the 'data' root element but just the array of data that matched your filter then change the line:

 out['data'] = [b for b in d['data'] if b[0] == 'x' ]

to

 out = [b for b in d['data'] if b[0] == 'x' ]

with this change the output json data looks like:

 '[["x", 1, 4, 6, 2, 7], ["x", 3, 7, 2, 6, 8]]'