AaronDT AaronDT - 3 months ago 9
Python Question

Adding Columns from different dataframes (with matching date) stored in a python dictionary

I have columns with the same name in several dataframes. these dataframes are stored in a dictionary. now i want to add the columns with the same name (and the respective date) across those dataframes in the dictionary and store the result in a new dataframe.

The code I have so far doest get me very far...

#create a new empty dataframe
sum_df = pd.DataFrame()
# my dataframes are stored in the dictionary frames_dict
for tables in frames_dict:
df = frames_dict[tables]
df = df[(df['date'] >= '01.01.2010') & (df['date'] <'01.01.2011')]
#here I filter for all columns starting with "a4_"
filter_col = [col for col in list(df) if col.startswith('a4_')]
df2 = df[["date","filter_col"]
sum_df = sum_df + df2


Any suggestions on how to tackle such a problem?

Answer
# Initialize 'sum_df'
sum_df = pd.DataFrame(columns=['date'])

# Iterate over dataframes of dictionary
for i, tables in enumerate(frames_dict):

    # Create dataframe
    df = frames_dict[tables]

    # Filter rows by 'date'
    df = df[(df['date'] >= '01.01.2010') & (df['date'] <'01.01.2011')] 

    # Filter for all columns starting with "a4_"
    filter_col = [col for col in list(df) if col.startswith('a4_')]

    # Keep only proper cols
    df2 = df[['date'] + filter_col]

    # Join new columns from dictionary to old 'sum_df' dataframe
    if i == 0:
        sum_df = df2.rename(columns={i:'{}_{}'.format(i, tables) for i in filter_col}).copy()
    else:
        df2 = df2.rename(columns={i:'{}_{}'.format(i, tables) for i in filter_col})
        sum_df = df2.merge(sum_df, how='outer', on=['date']) #, suffixes=('_{}'.format(tables), '_y'))
        # Use either 'suffix' for renaming or 'df2 = df2.rename()' or both...
Comments