user3821012 user3821012 - 2 months ago 15
Python Question

Convert a dict to numpy multi-dimensional array

I have a python dictionary defined as follows, where the innermost items are two-element array:

mydict = {1: {1: [1, 2], 2: [3, 4]}, 2: {1: [5, 6], 2: [7, 8]}}

What I need now is to form all the 0th elements as a new array, i.e., using a[:,:,0] or a[...,0] to return [1,3,5,6]. However, a[:,:,0] or a[...,0] would not work in this case as shown below.

import numpy as np
import pandas as pd
a = np.array(pd.DataFrame.from_dict(mydict))
print a

which gives the following output:

[[[1, 2] [5, 6]]
[[3, 4] [7, 8]]]

It seems that this is an 2x2x2 array. There is no problem with accessing the corresponding element using separate brackets, e.g., a[0][0][0] returns 1. However, a[0,0,0] would cause an error.

IndexError Traceback (most recent call last)
<ipython-input-150-f68aba7de42a> in <module>()
----> 1 a[0,0,0]

IndexError: too many indices for array

It seems that the two-element arrays are considered as elements in the 2x2 array -- but what I need is a 2x2x2 array in order to achieve my goal. Is there any way to convert this to a 2x2x2 array?


Your issue comes from the fact that pandas is treating your initial entries (lists) as objects, so then when you convert to a numpy array, your inner most entries are list objects. For example,

> type(a)
> type(a[0])
> type(a[0,0])

If you know the shape you ultimately want (2x2x2), you could always do:

> b = np.array(map(np.array, a.flat)).reshape(2,2,2)
> b.shape
(2, 2, 2)
> b[0,0,0]

Edit: Or even simpler:

> b = np.array(a.tolist())
array([[[1, 2],
        [5, 6]],

       [[3, 4],
        [7, 8]]])

If you want the first item of each innermost row, e.g. 1,3,5,7, you could do b[...,0] or b[...,0].flatten() depending on the resulting shape you want.