Scott Scott - 4 months ago 20
JSON Question

Trying to convert a CSV into JSON in python for posting to REST API

I've got the following data in a CSV file (a few hundred lines) that I'm trying to massage into sensible JSON to post into a rest api
I've gone with the bare minimum fields required, but here's what I've got:

dateAsked,author,title,body,answers.author,answers.body,topics.name,answers.accepted
13-Jan-16,Ben,Cant set a channel ,"Has anyone had any issues setting channels. it stays at �0�. It actually tells me there are �0� files.",Silvio,"I�m not sure. I think you can leave the cable out, because the control works. But you could try and switch two port and see if problem follows the serial port. maybe �extended� clip names over 32 characters.
Please let me know if you find out!
Best regards.",club_k,TRUE


Here's a sample of JSON that is roughly like where I need to get to:

json_test = """{
"title": "Can I answer a question?",
"body": "Some text for the question",
"author": "Silvio",
"topics": [
{
"name": "club_k"
}
],
"answers": [
{
"author": "john",
"body": "I\'m not sure. I think you can leave the cable out. Please let me know if you find out! Best regards.",
"accepted": "true"
}
]
}"""


Pandas seems to import it into a dataframe okay (ish) but keeps telling me I can't serialize it to json - also need to clean it and sanitise, but that should be fairly easy to achieve within the script.

There must also be a way to do this in Pandas, but I'm beating my head against a wall here - as the columns for both answers and topics can't easily be merged together into a dict or a list in python.

Answer

You can use a csv.DictReader to process the CSV file as a dictionary for each row. Using the field names as keys, a new dictionary can be constructed that groups common keys into a nested dictionary keyed by the part of the field name after the .. The nested dictionary is held within a list, although it is unclear whether that is really necessary - the nested dictionary could probably be placed immediately under the top-level without requiring a list. Here's the code to do it:

import csv
import json

json_data = []

for row in csv.DictReader(open('/tmp/data.csv')):
    data = {}

    for field in row:
        key, _, sub_key = field.partition('.')
        if not sub_key:
            data[key] = row[field]
        else:
            if key not in data:
                data[key] = [{}]
            data[key][0][sub_key] = row[field]

#    print(json.dumps(data, indent=True))
#    print('---------------------------')
    json_data.append(json.dumps(data))

For your data, with the print() statements enabled, the output would be:

{
 "body": "Has anyone had any issues setting channels. it stays at '0'. It actually tells me there are '0' files.", 
 "author": "Ben", 
 "topics": [
  {
   "name": "club_k"
  }
 ], 
 "title": "Cant set a channel ", 
 "answers": [
  {
   "body": "I'm not sure. I think you can leave the cable out, because the control works. But you could try and switch two port and see if problem follows the serial port.  maybe 'extended' clip names over 32 characters.    \nPlease let me know if you find out!\n    Best regards.", 
   "accepted ": "TRUE", 
   "author": "Silvio"
  }
 ], 
 "dateAsked": "13-Jan-16"
}
---------------------------
Comments