OrangeSherbet OrangeSherbet - 4 years ago 136
Python Question

Grow a pandas panel along major_axis with a stream of new data?

My application has a stream of incoming data of the form

name, datetime, {x, y, z}


or in other words, I receive single rows of data, with columns
name, datetime, x, y, z
. I get bursts of data, every few minutes, some of which is new.

I want to store this data in a pandas
Panel
. The structure of this storage panel is such that

panel.items - [name_1,name_2,...,name_n]
panel.major_axis - [datetime1,datetime2,...datetime_m]
panel.minor_axis - [x, y, z]


It seems
Panel
objects are not very resizable for efficiency reasons. However, it is not possible to know what
datetimes
the
major_axis
will contain. It's even possible I receive a new
datetime
between previous ones.

I do know what
items
and
minor_axis
will contain (although I would be happier without this restriction).

I want to grow this storage panel as necessary. Speed is not an issue. Open to other using other container types convertible to a panel. Any suggestions?




Ideas

I am playing with temporarily storing the data into a dict-of-dict-of-dict structure, perhaps converting to a
Panel
after each burst, and then somehow combining it with the storage
Panel
.

old_panel = old_panel.set_value(item, major, minor, value)
didn't seem to work like it should in the documentation, but also it seems so drastic to create a new panel every time (since the old one could be huge).

Answer Source

I'd say scrap the panel for now and use a dataframe with a pd.Multi-Index

sample data
assuming 'x', 'y', 'z' come in a tuple

data = [
    ['a', pd.Timestamp('2016-03-31'), (1, 2, 3)],
    ['a', pd.Timestamp('2016-04-30'), (1, 2, 3)],
    ['a', pd.Timestamp('2016-07-31'), (1, 2, 3)],
    ['a', pd.Timestamp('2016-05-31'), (1, 2, 3)],
    ['b', pd.Timestamp('2016-03-31'), (1, 2, 3)],
    ['b', pd.Timestamp('2016-05-31'), (1, 2, 3)],
    ['b', pd.Timestamp('2016-08-31'), (1, 2, 3)],
]

Build empty dataframe

mux = pd.MultiIndex.from_product([list('abc'), list('xyz')])
df = pd.DataFrame(columns=mux)

Iterate over data to simulate streaming data

for d in data:
    df.loc[d[1], pd.IndexSlice[d[0], list('xyz')]] = d[2]

results

print(df)

              a              b              c          
              x    y    z    x    y    z    x    y    z
2016-03-31    1    2    3    1    2    3  NaN  NaN  NaN
2016-04-30    1    2    3  NaN  NaN  NaN  NaN  NaN  NaN
2016-07-31    1    2    3  NaN  NaN  NaN  NaN  NaN  NaN
2016-05-31    1    2    3    1    2    3  NaN  NaN  NaN
2016-08-31  NaN  NaN  NaN    1    2    3  NaN  NaN  NaN

You could always get the panel like this

df.stack().to_panel()

<class 'pandas.core.panel.Panel'>
Dimensions: 3 (items) x 5 (major_axis) x 3 (minor_axis)
Items axis: a to c
Major_axis axis: 2016-03-31 00:00:00 to 2016-08-31 00:00:00
Minor_axis axis: x to z
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download