keisuke keisuke - 6 months ago 14
JSON Question

Convert redundant array to dict (or JSON)?

Suppose I have an array:

[['a', 10, 1, 0.1],
['a', 10, 2, 0.2],
['a', 20, 2, 0.3],
['b', 10, 1, 0.4],
['b', 20, 2, 0.5]]


And I want a
dict
(or JSON):

{
'a': {
10: {1: 0.1, 2: 0.2},
20: {2: 0.3}
}
'b': {
10: {1: 0.4},
20: {2: 0.5}
}
}


Is there any good way or some library for this task?

In this example the array is just 4-column, but my original array is more complicated (7-column).

Currently I implement this naively:

import pandas as pd
df = pd.DataFrame(array)
grouped1 = df.groupby('column1')
for column1 in grouped1.groups:
group1 = grouped1.get_group(column1)
grouped2 = group1.groupby('column2')
for column2 in grouped2.groups:
group2 = grouped2.get_group(column2)
...


And
defaultdict
way:

d = defaultdict(lambda x: defaultdict(lambda y: defaultdict ... ))
for row in array:
d[row[0]][row[1]][row[2]... = row[-1]


But I think neither is smart.

Answer

Introduction

Here is a recursive solution. The base case is when you have a list of 2-element lists (or tuples), in which case, the dict will do what we want:

>>> dict([(1, 0.1), (2, 0.2)])
{1: 0.1, 2: 0.2}

For other cases, we will remove the first column and recurse down until we get to the base case.

The code:

from itertools import groupby

def rows2dict(rows):
    if len(rows[0]) == 2:
        # e.g. [(1, 0.1), (2, 0.2)] ==> {1: 0.1, 2: 0.2}
        return dict(rows)
    else:
        dict_object = dict()
        for column1, groupped_rows in groupby(rows, lambda x: x[0]):
            rows_without_first_column = [x[1:] for x in groupped_rows]
            dict_object[column1] = rows2dict(rows_without_first_column)
        return dict_object

if __name__ == '__main__':
    rows = [['a', 10, 1, 0.1],
            ['a', 10, 2, 0.2],
            ['a', 20, 2, 0.3],
            ['b', 10, 1, 0.4],
            ['b', 20, 2, 0.5]]
    dict_object = rows2dict(rows)
    print dict_object

Output

{'a': {10: {1: 0.1, 2: 0.2}, 20: {2: 0.3}}, 'b': {10: {1: 0.4}, 20: {2: 0.5}}}

Notes

  • We use the itertools.groupby generator to simplify grouping of similar rows based on the first column
  • For each group of rows, we remove the first column and recurse down
  • This solution assumes that the rows variable has 2 or more columns. The result is unpreditable for rows which has 0 or 1 column.
Comments