E. V. E. V. - 1 year ago 67
Python Question

Extracting groups of 3 columns from dataframe based on partial names

I have a dataframe containing marker positions in xyz, for 24 markers, in every frame of a 10 second (100Hz) timeseries. Like this:

LHEE_x LHEE_y LHEE_z LTOE_x LTOE_y LTOE_z RHEE_x RHEE_y
0 -347.013 526.914 27.256 -138.267 578.015 39.859 -352.402 297.541
1 -347.035 526.934 27.261 -138.271 578.074 39.874 -352.409 297.548
2 -347.041 526.933 27.265 -138.260 578.073 39.898 -352.402 297.533
3 -347.030 526.928 27.251 -138.242 578.079 39.905 -352.427 297.535
4 -347.029 526.939 27.222 -138.244 578.072 39.915 -352.445 297.525
.....


I want to extract columns that contain positions from one marker at a time (i.e. LHEE_x, LHEE_y and LHEE_z) to new dataframes, and preferrably give the new dataframes the name of the marker (i.e. LHEE).

I know I can do this "manually" by

LHEE=pos_xyz.iloc[:,0:3]
or
LHEE=pos_xyz['LHEE_x','LHEE_y','LHEE_z']


but since I have a ton of markers this would be a hassle..

Also I would want to make a function that does this so I can use it on dataframes containing marker positions from different markers.

What would be the most efficient way to do this?
This question
How to select DataFrame columns based on partial matching?
seemed promising, but I need the result to be a dataframe.
I've tried using loops and grouping, but I can't seem to find a way that works properly.

Answer Source

I think you need create Multiindex by split column names first:

df.columns = df.columns.str.split('_', expand=True)
print (df)
      LHEE                      LTOE                      RHEE         
         x        y       z        x        y       z        x        y
0 -347.013  526.914  27.256 -138.267  578.015  39.859 -352.402  297.541
1 -347.035  526.934  27.261 -138.271  578.074  39.874 -352.409  297.548
2 -347.041  526.933  27.265 -138.260  578.073  39.898 -352.402  297.533
3 -347.030  526.928  27.251 -138.242  578.079  39.905 -352.427  297.535
4 -347.029  526.939  27.222 -138.244  578.072  39.915 -352.445  297.525

And then create dictionary of DataFrames by dict comprehension:

dfs = {x:df[x] for x in df.columns.levels[0]}
print (dfs)
{'RHEE':          x        y
0 -352.402  297.541
1 -352.409  297.548
2 -352.402  297.533
3 -352.427  297.535
4 -352.445  297.525, 'LTOE':          x        y       z
0 -138.267  578.015  39.859
1 -138.271  578.074  39.874
2 -138.260  578.073  39.898
3 -138.242  578.079  39.905
4 -138.244  578.072  39.915, 'LHEE':          x        y       z
0 -347.013  526.914  27.256
1 -347.035  526.934  27.261
2 -347.041  526.933  27.265
3 -347.030  526.928  27.251
4 -347.029  526.939  27.222}

print (dfs['LHEE'])
         x        y       z
0 -347.013  526.914  27.256
1 -347.035  526.934  27.261
2 -347.041  526.933  27.265
3 -347.030  526.928  27.251
4 -347.029  526.939  27.222
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download