Ray Ray - 1 month ago 6
Python Question

Creating a multiindexed `DataFrame` with a nested dictionary

This question is related to this one. This time I want to go one step further. Given a dictionary like:

dd = {0: {"russell": {"score": numpy.random.rand(), "ping": numpy.random.randint(10, 100)},
"cantor": {"score": numpy.random.rand(), "ping": numpy.random.randint(10, 100)},
"godel": {"score": numpy.random.rand(), "ping": numpy.random.randint(10, 100)}},

1: {"russell": {"score": numpy.random.rand(), "ping": numpy.random.randint(10, 100)},
"cantor": {"score": numpy.random.rand(), "ping": numpy.random.randint(10, 100)},
"godel": {"score": numpy.random.rand(), "ping": numpy.random.randint(10, 100)}}}


or a list like:

ll = [{"russell": {"score": numpy.random.rand(), "ping": numpy.random.randint(10, 100)},
"cantor": {"score": numpy.random.rand(), "ping": numpy.random.randint(10, 100)},
"godel": {"score": numpy.random.rand(), "ping": numpy.random.randint(10, 100)}},

{"russell": {"score": numpy.random.rand(), "ping": numpy.random.randint(10, 100)},
"cantor": {"score": numpy.random.rand(), "ping": numpy.random.randint(10, 100)},
"godel": {"score": numpy.random.rand(), "ping": numpy.random.randint(10, 100)}}]


I would like to construct a
DataFrame
like:

russell godel cantor
score ping score ping score ping
0 0.17473916938994682 40 0.3443303845926545 47 0.43576522521017247 42
1 0.7341005512329682 22 0.14682222267827938 81 0.5662517436162526 59


where as we can see the column index is a
MultiIndex
. Is there a way of achieving that? If I try
pandas.DataFrame.from_dict(dd, orient="index")
or
pandas.DataFrame(ll)
then I get:

russell godel cantor
0 {'score': 0.17473916938994682, 'ping': 40} {'score': 0.3443303845926545, 'ping': 47} {'score': 0.43576522521017247, 'ping': 42}
1 {'score': 0.7341005512329682, 'ping': 22} {'score': 0.14682222267827938, 'ping': 81} {'score': 0.5662517436162526, 'ping': 59}


which is not what I want.

Answer

Now it is more complicated, but Panel with transpose, to_frame and unstack can help:

df = pd.Panel(dd).transpose(2,0,1).to_frame().unstack()
print (df)
      cantor           godel           russell          
minor   ping     score  ping     score    ping     score
major                                                   
0       69.0  0.050641  51.0  0.765994    20.0  0.935196
1       91.0  0.398624  33.0  0.408681    75.0  0.464876
Comments