rcplusplus rcplusplus - 6 months ago 30
JSON Question

Python 3 not reading a JSON file right

I have some json files created by powershell using the

ConvertTo-Json
command. The content of the json file looks like

{
"Key1": "Value1",
"Key2": "Value2"
}


I ran the python interpreter to see if I could read the file but I get this weird output

>>> f=open('test.json', 'r')
>>> f.read()
'ÿ\xfe{\x00\n\x00\n\x00 \x00 \x00 \x00 \x00"\x00K\x00e\x00y\x001\x00"\x00:\x00 \x00 \x00"\x00V\x00a\x00l\x00u\x00e\x001\x00"\x00,\x00\n\x00\n\x00 \x00 \x00 \x00 \x00"\x00K\x00e\x00y\x002\x00"\x00:\x00 \x00 \x00"\x00V\x00a\x00l\x00u\x00e\x002\x00"\x00\n\x00\n\x00}\x00\n\x00\n\x00'


For some reason all the characters are escaped byte characters and there's the weird
ÿ
at the begninning (powershell error?).

The weird thing is this:

>>> f=open('test.json', 'r')
>>> str=f.read()
>>> type(str)
<class 'str'>
>>> json.loads(str)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\Rutvik_Choudhary\AppData\Local\Programs\Python\Python35-32\lib\json\__init__.py", line 319, in loads
return _default_decoder.decode(s)
File "C:\Users\Rutvik_Choudhary\AppData\Local\Programs\Python\Python35-32\lib\json\decoder.py", line 339, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "C:\Users\Rutvik_Choudhary\AppData\Local\Programs\Python\Python35-32\lib\json\decoder.py", line 357, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)


So the input is a string, but the json module can't parse it (
json.load(f)
return the same error). What is causing this error? Is it a python thing, a powershell thing, a json thing?

Answer

As pointed out by jwodder, PowerShell has encoded your json using UTF-16LE. To get this data into json correctly, you need to open the file using the correct encoding. eg.

with open("test.json", "r", encoding="utf16") as f:
    json_string = f.read()
my_dict = json.loads(json_string)

You don't need to tell Python which variant of UTF-16 is being used. This is the purpose of the first two bytes of the text file. It's called a Byte Order Mark (BOM). It lets a program know if UTF-16LE or UTF-16BE has been used to encode the text file.

Comments