ShanZhengYang ShanZhengYang - 7 months ago 138
Python Question

How to unpack a pandas Panel created with a dictionary?

I have several

.txt
files in a subdirectoy,
/subdirect/


These files are

file1.txt
file2.txt
file3.txt
file4.txt
...


Using glob, I can put these into a three-dimensional panel, using the filename as the key for key-value pairs.

import glob
import pandas as pd

dataframe = {filename: pd.read_csv(filename) for filename in glob.glob('*.txt') # dictionary
data = pd.Panel.from_dict(dataframe) # create panel


Now, I would like to unpack these files to manipulate each DataFrame individually and plot data.

for fname in data:
df = pd.read_csv(fname)
df['total_sum'] = df[["column1", "column2", "column3"]].sum(axis=1) # sum total reads
df.plot(kind='bar')


However, I do not seem to be unpacking the panel correctly as the dimensions have completely changed.

How does one unpack a pandas Panel?

Answer

How about reading the data files individually instead, since you don't seem to be interested in the Panel structure per se:

import glob
import pandas as pd

for filename in glob.glob('*.txt'):
    df = pd.read_csv(filename)
    df['total_sum'] = df[["column1", "column2", "column3"]].sum(axis=1)  # sum total reads
    df.plot(kind='bar')

Alternatively, take a look at pd.Panel.to_frame() to convert Panel to DataFrame. For instance, with a Panel from a dict with two DataFrames:

df = pd.DataFrame(np.random.random(size=(20, 10)))

panel = pd.Panel.from_dict({'1': df, '2': df.add(10)})

<class 'pandas.core.panel.Panel'>
Dimensions: 2 (items) x 20 (major_axis) x 10 (minor_axis)
Items axis: 1 to 2
Major_axis axis: 0 to 19
Minor_axis axis: 0 to 9

Using to_frame() gets you a long-format DataFrame with two columns and a MultiIndex with length of row x column. To plot, you could iterate over the columns of data_frame using .items() and use .unstack() to convert into format suitable for plotting:

data_frame = panel.to_frame()

MultiIndex: 200 entries, (0, 0) to (19, 9)
Data columns (total 2 columns):
1    200 non-null float64
2    200 non-null float64
dtypes: float64(2)
memory usage: 4.7+ KB
None

for i, data in data_frame.items():
    data.unstack().plot()

On performance - if you start from a panel, summing there is faster than grouping and unstacking. It's also faster than summing an individual dataframe.

%timeit panel.sum(axis=1)
10000 loops, best of 3: 111 µs per loop

%timeit panel.to_frame().groupby(data_frame.columns, axis=1).apply(lambda x: x.unstack(0).sum(axis=1))
100 loops, best of 3: 3.63 ms per loop

df = data_frame.unstack(0)
%timeit df.loc[:, '1'].sum(axis=1)
1000 loops, best of 3: 409 µs per loop
Comments