user2805751 user2805751 - 1 month ago 15
Python Question

Pandas Dataframe or Panel to 3d numpy array

Setup:

pdf = pd.DataFrame(np.random.rand(4,5), columns = list('abcde'))
pdf['a'][2:]=pdf['a'][0]
pdf['a'][:2]=pdf['a'][1]
pdf.set_index(['a','b'])


output:

c d e
a b
0.439502 0.115087 0.832546 0.760513 0.776555
0.609107 0.247642 0.031650 0.727773
0.995370 0.299640 0.053523 0.565753 0.857235
0.392132 0.832560 0.774653 0.213692


Each data series is grouped by the index ID
a
and
b
represents a time index for the other features of
a
. Is there a way to get the pandas to produce a numpy 3d array that reflects the
a
groupings? Currently it reads the data as two dimensional so
pdf.shape
outputs
(4, 5)
. What I would like is for the array to be of the variable form:

array([[[-1.38655912, -0.90145951, -0.95106951, 0.76570984],
[-0.21004144, -2.66498267, -0.29255182, 1.43411576],
[-0.21004144, -2.66498267, -0.29255182, 1.43411576]],

[[ 0.0768149 , -0.7566995 , -2.57770951, 0.70834656],
[-0.99097395, -0.81592084, -1.21075386, 0.12361382]]])


Is there a native Pandas way to do this? Note that number of rows per
a
grouping in the actual data is variable, so I cannot just transpose or reshape
pdf.values
. If there isn't a native way, what's the best method for iteratively constructing the arrays from hundreds of thousands of rows and hundreds of columns?

Answer
panel.values

will return a numpy array directly. this will by necessity be the highest acceptable dtype as everything is smushed into a single 3-d numpy array. It will be new array and not a view of the pandas data (no matter the dtype).