SharpObject - 3 months ago 23

Python Question

Given a dataframe with multiple columns of dictionaries, how can I sum and/or multiply the keys within the dataframes together to get one column

`A B`

{"ab":1, "b":2, "c":3} {"ab":1, "b":3, "c":5}

So added together you get another column

`C`

{"ab":2, "b":5, "c":8}

Or multiplied together you get

`C`

{"ab":1, "b":6, "c":15}

I know if they were just columns of numbers I could use sum/etc., but whats the best way if they're columns of dictionaries? A lambda function?

Answer

*Naive approach:*

Use `json_normalize`

which can handle *dict or list of dicts* as data input really well:

```
from pandas.io.json import json_normalize
processed_df = json_normalize(df.T.to_dict('list'), 0)
```

To find ** sum**:

```
processed_df.sum() # Append .to_dict() if you want to render it as a dictionary
ab 2
b 5
c 8
dtype: int64
```

To find ** product**:

```
processed_df.prod() # Append .to_dict() if you want to render it as a dictionary
ab 1
b 6
c 15
dtype: int64
```

It would be an overkill if the dataframe has too many columns/rows which could although be overcome by melting it into a long form having a single column header. But again, reshaping it back to align the matching rows and calculating would be too much of a deal.

*More generalized approach:*

Sample `DF`

:

```
df = pd.DataFrame({'A': [{"ab":1, "b":2, "c":3}, {'b':4, 'c':5, 'ab':6}],
'B': [{"ab":7, "b":8, "c":9}, {'b':10, 'c':11, 'ab':12}]})
df
```

Compute ** sum**:

```
df.stack().apply(pd.Series).sum(level=0)
```

Compute ** product**:

```
df.stack().apply(pd.Series).prod(level=0)
```

*Assigning* it back to a new column:

```
df['C'] = df.stack().apply(pd.Series).sum(level=0).to_dict('records')
df
```