shad0w shad0w - 14 days ago 5
Python Question

Converting json to csv keeping non-ascii characters intact

I have a .json file which is in this form

{"contributors": null, "truncated": false, "text": "“cool jeans,” i tell a cute boy\nlittle did he know that im talking about his genes bc those chromosomes have combined beautifully ay papi", "is_quote_status": false, "in_reply_to_status_id": null, "id": 786650297116532736, "favorite_count": 631, "source": "<a href=\"http://bufferapp.com\" rel=\"nofollow\">Buffer</a>", "retweeted": false, "coordinates": null, "entities": {"symbols": [], "user_mentions": [], "hashtags": [], "urls": []}, "in_reply_to_screen_name": null, "in_reply_to_user_id": null, "retweet_count": 233, "id_str": "786650297116532736", "favorited": false, "user": {"follow_request_sent": false, "has_extended_profile": false, "profile_use_background_image": true, "default_profile_image": false, "id": 321445166, "profile_background_image_url_https": "https://pbs.twimg.com/profile_background_images/378800000024702436/c56435230dd53432a3aa7685fdd98cf7.jpeg", "verified": false, "profile_text_color": "333333", "profile_image_url_https": "https://pbs.twimg.com/profile_images/782124224198631424/zuUAjl5o_normal.jpg", "profile_sidebar_fill_color": "EFEFEF", "entities": {"url": {"urls": [{"indices": [0, 23], "expanded_url": "https://youtu.be/1qjR-p_o3BE", "display_url": "youtu.be/1qjR-p_o3BE"}]}, "description": {"urls": []}}, "followers_count": 4992012, "profile_sidebar_border_color": "FFFFFF", "id_str": "321445166", "profile_background_color": "182D66", "listed_count": 5284, "is_translation_enabled": false, "utc_offset": 7200, "statuses_count": 28488, "description": "There's a fine line between being sassy and being an asshole and I cross it everyday. Youtuber and student. Link in bio thatssarcasmposts@gmail.com", "friends_count": 55420, "location": "Cape Town", "profile_link_color": "536BA7", "profile_image_url": "http://pbs.twimg.com/profile_images/782124224198631424/zuUAjl5o_normal.jpg", "following": false, "geo_enabled": false, "profile_banner_url": "https://pbs.twimg.com/profile_banners/321445166/1475082140", "profile_background_image_url": "http://pbs.twimg.com/profile_background_images/378800000024702436/c56435230dd53432a3aa7685fdd98cf7.jpeg", "screen_name": "ThatsSarcasm", "lang": "en", "profile_background_tile": false, "favourites_count": 3931, "name": "joke", "notifications": false, "created_at": "Tue Jun 21 15:52:41 +0000 2011", "contributors_enabled": false, "time_zone": "Pretoria", "protected": false, "default_profile": false, "is_translator": false}, "geo": null, "in_reply_to_user_id_str": null, "lang": "en", "created_at": "Thu Oct 13 19:30:20 +0000 2016", "in_reply_to_status_id_str": null, "place": null}
{"contributors": null, "truncated": false, "text": "If you're not following @relatabIe for the most relatable tweets ever, then what are you doing?I love their posts

mx0 mx0
Answer

Update Question didn't specify python version, first answer was for python 3.

There are few problems in your code.

Your json file is not valid (or you pasted it wrong here). Should be like

[{"first":"object"}, {"second":"object"}] 

When reading json, you have to specify encoding

x = open('MyFile.json', encoding='utf-8')

When writing, don't use binary mode, and specify encoding. Also add newline='' parameter or else file will have double newlines in output.

f = csv.writer(open('data.csv', 'w+', encoding='utf-8', newline=''))

Function writerow takes a list and writes it as row. Your text will end up as I,f, ,y,o,u,',r,e, ,n,o,t if you don't enclose it in another list

f.writerow([item["text"]])

Complete working example

Python 3.5

import json
import csv

x = open('MyFile.json', encoding='utf-8')
data = json.load(x)
x.close()

f = csv.writer(open('data.csv', 'w+', encoding='utf-8', newline=''))
for item in data:
    f.writerow([item["text"]])

Python 2.7

For python 2 you were very close to get working program. In py2 csv.writerow writes files in bytes mode, but your item["text"] is unicode string so you have to encode it before. This will work as long your json file is really unicode encoded.

import json
import csv

x = open('MyFile.json')
data = json.load(x)
x.close()


f = csv.writer(open('data.csv', 'wb+'))
for item in data:
    f.writerow([item["text"].encode('utf-8')])