view raw
rcplusplus rcplusplus - 11 months ago 73
JSON Question

Python 3 not reading a JSON file right

I have some json files created by powershell using the

command. The content of the json file looks like

"Key1": "Value1",
"Key2": "Value2"

I ran the python interpreter to see if I could read the file but I get this weird output

>>> f=open('test.json', 'r')
'ÿ\xfe{\x00\n\x00\n\x00 \x00 \x00 \x00 \x00"\x00K\x00e\x00y\x001\x00"\x00:\x00 \x00 \x00"\x00V\x00a\x00l\x00u\x00e\x001\x00"\x00,\x00\n\x00\n\x00 \x00 \x00 \x00 \x00"\x00K\x00e\x00y\x002\x00"\x00:\x00 \x00 \x00"\x00V\x00a\x00l\x00u\x00e\x002\x00"\x00\n\x00\n\x00}\x00\n\x00\n\x00'

For some reason all the characters are escaped byte characters and there's the weird
at the begninning (powershell error?).

The weird thing is this:

>>> f=open('test.json', 'r')
>>> type(str)
<class 'str'>
>>> json.loads(str)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\Rutvik_Choudhary\AppData\Local\Programs\Python\Python35-32\lib\json\", line 319, in loads
return _default_decoder.decode(s)
File "C:\Users\Rutvik_Choudhary\AppData\Local\Programs\Python\Python35-32\lib\json\", line 339, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "C:\Users\Rutvik_Choudhary\AppData\Local\Programs\Python\Python35-32\lib\json\", line 357, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

So the input is a string, but the json module can't parse it (
return the same error). What is causing this error? Is it a python thing, a powershell thing, a json thing?


As pointed out by jwodder, PowerShell has encoded your json using UTF-16LE. To get this data into json correctly, you need to open the file using the correct encoding. eg.

with open("test.json", "r", encoding="utf16") as f:
    json_string =
my_dict = json.loads(json_string)

You don't need to tell Python which variant of UTF-16 is being used. This is the purpose of the first two bytes of the text file. It's called a Byte Order Mark (BOM). It lets a program know if UTF-16LE or UTF-16BE has been used to encode the text file.