piRSquared piRSquared - 26 days ago 7
Python Question

roll value within grouped regions in a dataframe

consider the

df


idx = map('first {}'.format, range(2)) + map('last {}'.format, range(3))
df = pd.DataFrame(np.arange(25).reshape(5, -1), idx, idx)
df


enter image description here

I want to group the dataframe into four quadrants based on the text in the row and column headers. Meaning that the upper left quadrant consists of columns with
'first'
and rows with
'first'
. The upper right quadrant consists of columns with
'last'
and rows with
'first'
and so on.

Then within each group, I want to


  • roll each element one to right if it can

  • otherwise start on next row at the beggining if it can

  • otherwise start at the very beginning



This should help illustrate

enter image description here

The expected output should look like this.

enter image description here

Answer

Using a nested groupby-apply pattern and np.roll. Perform a groupby on the columns, followed by a groupby on the index to get the desired subgroups to roll. Then use np.roll to perform the roll, wrapping the output in a DataFrame since np.roll only returns an array.

def roll_frame(df, shift):
    return pd.DataFrame(np.roll(df, shift), index=df.index, columns=df.columns)

# Groupers for the index and the columns.
idx_groups = df.index.map(lambda x: x.split()[0])
col_groups = df.columns.map(lambda x: x.split()[0])

# Nested groupby, then perform the roll..
df = df.groupby(col_groups, axis=1) \
       .apply(lambda grp: grp.groupby(idx_groups).apply(roll_frame, 1))

Kind of gross, but gets the job done. The order in which you perform the nested groupby doesn't really matter.

The resulting output:

         first 0  first 1  last 0  last 1  last 2
first 0        6        0       9       2       3
first 1        1        5       4       7       8
last 0        21       10      24      12      13
last 1        11       15      14      17      18
last 2        16       20      19      22      23
Comments