Henry Thornton Henry Thornton - 2 months ago 66
JSON Question

Python 3, read/write compressed json objects from/to gzip file

For Python3, I followed @Martijn Pieters's code with this:

import gzip
import json

# writing
with gzip.GzipFile(jsonfilename, 'w') as fout:
for i in range(N):
uid = "whatever%i" % i
dv = [1, 2, 3]
data = json.dumps({
'what': uid,
'where': dv})

fout.write(data + '\n')


but this results in an error:

Traceback (most recent call last):
...
File "C:\Users\Think\my_json.py", line 118, in write_json
fout.write(data + '\n')
File "C:\Users\Think\Anaconda3\lib\gzip.py", line 258, in write
data = memoryview(data)
TypeError: memoryview: a bytes-like object is required, not 'str'


Any thoughts about what is going on?

Answer

You have four steps of transformation here.

  1. a Python data structure
  2. a Python string containing a serialized representation of that data structure ("JSON")
  3. a list of bytes containing a representation of that string ("UTF-8")
  4. a list of bytes containing a representation of that previous byte list ("gzip")

So let's take these steps one by one.

import gzip
import json

with gzip.GzipFile(jsonfilename, 'w') as fout:
    for i in range(N):
        uid = "whatever%i" % i
        dv = [1, 2, 3]

        data = {
            'what': uid,
            'where': dv
        }                                            # 1. data

        json_str = json.dumps(data) + "\n"           # 2. string
        json_bytes = json_str.encode('utf-8')        # 3. bytes (i.e. UTF-8)

        fout.write(json_bytes)                       # 4. gzip

Reading works exactly the other way around.

with gzip.GzipFile(jsonfilename, 'r') as fin:        # 4. gzip
    json_bytes = fin.read()                          # 3. bytes (i.e. UTF-8)
    json_str = json_bytes.decode('utf-8')            # 2. string
    data = json.loads(json_str)                      # 1. data

    print(data)

Note that adding "\n" is completely superfluous here. It does not break anything, but beyond that it has no use.