xdzzz xdzzz - 1 month ago 15
Python Question

pandas reorder subset of columns from a grouped data frame

I have forecast data that I grouped by month.
The original dataframe something like this:

>>clean_table_grouped[0:5]
STYLE COLOR SIZE FOR
MONTH 01/17 10/16 11/16 12/16
0 ####### ###### #### 0.0 15.0 15.0 15.0
1 ####### ###### #### 0.0 15.0 15.0 15.0
2 ####### ###### #### 0.0 15.0 15.0 15.0
3 ####### ###### #### 0.0 15.0 15.0 15.0
4 ####### ###### #### 0.0 15.0 15.0 15.0

>>clean_table_grouped.ix[0:,"FOR"][0:5]
MONTH 01/17 10/16 11/16 12/16
0 0.0 15.0 15.0 15.0
1 0.0 15.0 15.0 15.0
2 0.0 15.0 15.0 15.0
3 0.0 15.0 15.0 15.0
4 0.0 15.0 15.0 15.0


I simply want reorder these 4 columns in the follow way:

(keeping the rest of the dataframe untouched)

MONTH 10/16 11/16 12/16 01/17
0 15.0 15.0 15.0 0.0
1 15.0 15.0 15.0 0.0
2 15.0 15.0 15.0 0.0
3 15.0 15.0 15.0 0.0
4 15.0 15.0 15.0 0.0


My attempted solution was to reorder the columns of the subset following the post below:
How to change the order of DataFrame columns?

I went about it by grabbing the column list and sorting it first

>>for_cols = clean_table_grouped.ix[:,"FOR"].columns.tolist()
>>for_cols.sort(key = lambda x: x[0:2]) #sort by month ascending
>>for_cols.sort(key = lambda x: x[-2:]) #then sort by year ascending


Querying the dataframe works just fine

>>clean_table_grouped.ix[0:,"FOR"][for_cols]
MONTH 10/16 11/16 12/16 01/17
0 15.0 15.0 15.0 0.0
1 15.0 15.0 15.0 0.0
2 15.0 15.0 15.0 0.0
3 15.0 15.0 15.0 0.0
4 15.0 15.0 15.0 0.0


However, when I try to set values in the original table, I get a table of "NaN":

>>clean_table_grouped.ix[0:,"FOR"] = clean_table_grouped.ix[0:,"FOR"][for_cols]
>>clean_table_grouped.ix[0:,"FOR"]
MONTH 01/17 10/16 11/16 12/16
0 NaN NaN NaN NaN
1 NaN NaN NaN NaN
2 NaN NaN NaN NaN
3 NaN NaN NaN NaN
4 NaN NaN NaN NaN
5 NaN NaN NaN NaN


I have also tried zipping to avoid chained syntax (.ix[][]).
This avoids the NaN, however, it doesn't change the dataframe -__-

>>for_cols = zip(["FOR", "FOR", "FOR", "FOR"], for_cols)
>>clean_table_grouped.ix[0:,"FOR"] = clean_table_grouped.ix[0:,for_cols]
>>clean_table_grouped.ix[0:,"FOR"]
MONTH 01/17 10/16 11/16 12/16
0 0.0 15.0 15.0 15.0
1 0.0 15.0 15.0 15.0
2 0.0 15.0 15.0 15.0
3 0.0 15.0 15.0 15.0
4 0.0 15.0 15.0 15.0


I realize I'm using ix to reassign values. However, I've used this technique in the past on dataframes that are not grouped and it has worked just fine.

If this question as already been answered in another post (in a CLEAR way), please provide the link. I searched but could not find anything similar.

EDIT:
I have found a solution. If you iterate through the columns
Thanks,

Answer

Sort the column names containing date strings and later use it as a subset to return the columns in that particular order:

from datetime import datetime
df[sorted(df.columns, key=lambda x: datetime.strptime(x, '%m/%y'))]

Image


Toy Data:

from datetime import datetime
np.random.seed(42)

cols = [['STYLE', 'COLOR', 'SIZE', 'FOR', 'FOR', 'FOR', 'FOR'],
        ['', '', '', '01/17', '10/16', '11/16', '12/16']]
tups = list(zip(*cols))
index = pd.MultiIndex.from_tuples(tups, names=[None, 'MONTH'])
clean_table_grouped = pd.DataFrame(np.random.randint(0, 100, (100, 7)), 
                                   index=np.arange(100), columns=index)
clean_table_grouped = clean_table_grouped.head()
clean_table_grouped

Image

Split the multi-index DF into two with the one containing the forecast values and the other the remaining DF.

for_df = clean_table_grouped[['FOR']]
clean_table_grouped = clean_table_grouped.drop(['FOR'], axis=1, level=0)

Forecast DF:

for_df

Image

Remaining DF:

clean_table_grouped

Image

Sorting the columns in the forecast DF by applying the same procedure as done in the pre-edited post.

order = sorted(for_df['FOR'].columns.tolist(), key=lambda x: datetime.strptime(x, '%m/%y'))

Making the DF in the same order by subsetting the sorted list of columns.

for_df = for_df['FOR'][order]

Concatenate the forecast DF with itself to create a multi-index like column.

for_df = pd.concat([for_df, for_df], axis=1, keys=['FOR'])

Finally, join them on the common index.

clean_table_grouped.join(for_df)

Image

Comments