dgikmo dgikmo - 3 months ago 12
Python Question

Split and merge pandas dataframe

I am trying to split and merge a Pandas dataframe.

The columns of the original data frame are arranged like so:

dataTime Record1Field1 ... Record1FieldN Record2Field1 ... Record1FieldN
time1 << record 1 data >> << record 2 data >>


I would like to take split the
Record2
fields into a separate data frame
tempdf
, indexed by the dataTime.
tempdf
will therefore look something like this:

dataTime Record2Field1 ... Record2FieldN
time1 << record 2 data >>


Once
tempdf
is populated, delete the Record2 columns from the original data frame. The first difficulty I'm having is in creating this
tempdf
which contains the record 2 data.

Then, I would like to rename the columns in
tempdf
so that they align with the
Record1
columns in the original data frame. (This portion I know how to do)

Finally I would like to merge
tempdf
back into the original data frame.

The end result should look something like this:

dataTime Record1Field1 ... Record1FieldN
time1 <<record 1 data>>
time1 <<record 2 data>>


So far I haven't determined a good method of doing this. Any help is appreciated! Thanks.

Answer

You could get all your Record2 values under the Record1 columns as follows:

Data Setup:

data = StringIO(
'''
dataTime Record1Field1 Record1Field2 Record1Field3 Record2Field1 Record2Field2 Record2Field3
01-01-2015 1 2 3 4 5 6 
''')

df = pd.read_csv(data, delim_whitespace=True, parse_dates=['dataTime'])
print (df)

    dataTime  Record1Field1  Record1Field2  Record1Field3  Record2Field1  \
0 2015-01-01              1              2              3              4   

   Record2Field2  Record2Field3  
0              5              6 

Operations:

df.set_index('dataTime', inplace=True)

# Filter column names corresponding to Record2
tempdf = df[[col for col in list(df) if col.startswith('Record2')]]

# Drop those columns after assigning to tempdf
df.drop(tempdf.columns, inplace=True, axis=1)

# Rename the column names for appending
tempdf.columns = [col for col in list(df) if col.startswith('Record1')]

# Concatenate row-wise
print (df.append(tempdf))

            Record1Field1  Record1Field2  Record1Field3
dataTime                                               
2015-01-01              1              2              3
2015-01-01              4              5              6