Dhruv Ghulati - 1 year ago 51
Python Question

# Dictionary comprehension to calculate statistics across dict of dicts for each key in inner dicts

I have a dictionary like this:

``````property2region2value = {
'countryA':{
'a': 24,
'b': 56,
'c': 78
},
'countryB':{
'a': 3,
'b': 98
},
'countryC':{
'a': 121,
'b': 12121,
'c': 12989121,
'd':16171
},
'countryD':{
'a': 123,
'b': 1312,
'c': 1231
},
'countryE':{
'a': 1011,
'b': 1911
},
'countryF'{
'a': 1433,
'b': 19829,
'c': 1132,
'd':1791
}
}
``````

And I am trying to create multiple dictionaries that each contains certain statistics about the master dictionary (min, max, std etc) about all the values for that property (
`a','b','c`
etc) across all the countries (e.g.
`countryA, countryB`
) etc.

So for example:
`{'a': min of 'a' across all countries, 'b': min of 'b' across all countries....}`
might be one dictionary.

At the moment, I do a huge loop of code which a) isn't efficient and b) doesn't allow me to quickly calculate statistics e.g. using
`np.min()`
or
`np.max()`
as functions in numpy.

How can I use dictionary comprehension to achieve this? Current code for calculating
`min`
and
`max`
:

``````for country, property2value in property2region2value.items():
for property,value in property2value.items():
if property not in property2rangeMax:
property2rangeMax[property] = 0
if property2rangeMax[property]<value:
property2rangeMax[property]=value
if property not in property2rangeMin:
property2rangeMin[property] = 0
if property2rangeMin[property]>value:
property2rangeMin[property] = value
``````

You should use `pandas` for this task:

## Edit

``````In [3]: pd.DataFrame(property2region2value)
Out[3]:
countryA  countryB  countryC  countryD  countryE  countryF
a      24.0       3.0       121     123.0    1011.0      1433
b      56.0      98.0     12121    1312.0    1911.0     19829
c      78.0       NaN  12989121    1231.0       NaN      1132
d       NaN       NaN     16171       NaN       NaN      1791

In [4]: df.apply(np.min, axis=1)
Out[4]:
a       3.0
b      56.0
c      78.0
d    1791.0
dtype: float64

In [5]: df.apply(np.mean, axis=1)
Out[5]:
a    4.525000e+02
b    5.887833e+03
c    3.247890e+06
d    8.981000e+03
dtype: float64

In [6]: mean_dict = df.apply(np.mean, axis=1).to_dict()

In [7]: mean_dict
Out[7]: {'a': 452.5, 'b': 5887.833333333333, 'c': 3247890.5, 'd': 8981.0}
``````

Or, even more easily, you can transpose the DataFrame:

``````In [20]: df.T
Out[20]:
a        b           c        d
countryA    24.0     56.0        78.0      NaN
countryB     3.0     98.0         NaN      NaN
countryC   121.0  12121.0  12989121.0  16171.0
countryD   123.0   1312.0      1231.0      NaN
countryE  1011.0   1911.0         NaN      NaN
countryF  1433.0  19829.0      1132.0   1791.0

In [21]: df.T.describe()
Out[21]:
a             b             c             d
count     6.000000      6.000000  4.000000e+00      2.000000
mean    452.500000   5887.833333  3.247890e+06   8981.000000
std     612.768717   8215.770187  6.494154e+06  10168.195513
min       3.000000     56.000000  7.800000e+01   1791.000000
25%      48.250000    401.500000  8.685000e+02   5386.000000
50%     122.000000   1611.500000  1.181500e+03   8981.000000
75%     789.000000   9568.500000  3.248204e+06  12576.000000
max    1433.000000  19829.000000  1.298912e+07  16171.000000

In [22]: df.T.describe().to_dict()
Out[22]:
{'a': {'25%': 48.25,
'50%': 122.0,
'75%': 789.0,
'count': 6.0,
'max': 1433.0,
'mean': 452.5,
'min': 3.0,
'std': 612.76871656441472},
'b': {'25%': 401.5,
'50%': 1611.5,
'75%': 9568.5,
'count': 6.0,
'max': 19829.0,
'mean': 5887.833333333333,
'min': 56.0,
'std': 8215.770187065038},
'c': {'25%': 868.5,
'50%': 1181.5,
'75%': 3248203.5,
'count': 4.0,
'max': 12989121.0,
'mean': 3247890.5,
'min': 78.0,
'std': 6494153.687626767},
'd': {'25%': 5386.0,
'50%': 8981.0,
'75%': 12576.0,
'count': 2.0,
'max': 16171.0,
'mean': 8981.0,
'min': 1791.0,
'std': 10168.195513462553}}
``````

And if you want finer control, you can pick and choose:

``````In [24]: df.T.describe().loc[['mean','std','min','max'],:]
Out[24]:
a             b             c             d
mean   452.500000   5887.833333  3.247890e+06   8981.000000
std    612.768717   8215.770187  6.494154e+06  10168.195513
min      3.000000     56.000000  7.800000e+01   1791.000000
max   1433.000000  19829.000000  1.298912e+07  16171.000000

In [25]: df.T.describe().loc[['mean','std','min','max'],:].to_dict()
Out[25]:
{'a': {'max': 1433.0,
'mean': 452.5,
'min': 3.0,
'std': 612.76871656441472},
'b': {'max': 19829.0,
'mean': 5887.833333333333,
'min': 56.0,
'std': 8215.770187065038},
'c': {'max': 12989121.0,
'mean': 3247890.5,
'min': 78.0,
'std': 6494153.687626767},
'd': {'max': 16171.0,
'mean': 8981.0,
'min': 1791.0,
'std': 10168.195513462553}}
``````

Then you can very easily achieve whatever you want:

``````In [8]: df.apply(np.min)
Out[8]:
countryA      24.0
countryB       3.0
countryC     121.0
countryD     123.0
countryE    1011.0
countryF    1132.0
dtype: float64

In [9]: df.apply(np.max)
Out[9]:
countryA          78.0
countryB          98.0
countryC    12989121.0
countryD        1312.0
countryE        1911.0
countryF       19829.0
dtype: float64

In [10]: df.apply(np.std)
Out[10]:
countryA    2.217105e+01
countryB    4.750000e+01
countryC    5.620356e+06
countryD    5.424170e+02
countryE    4.500000e+02
countryF    7.960893e+03
dtype: float64
``````

You can even bring everything back to dictionaries with ease:

``````In [11]: df.apply(np.min).to_dict()
Out[11]:
{'countryA': 24.0,
'countryB': 3.0,
'countryC': 121.0,
'countryD': 123.0,
'countryE': 1011.0,
'countryF': 1132.0}
``````

Go nuts! All your data-processing needs will be easier:

``````In [12]: df.describe()
Out[12]:
countryA   countryB      countryC     countryD     countryE  \
count   3.000000   2.000000  4.000000e+00     3.000000     2.000000
mean   52.666667  50.500000  3.254384e+06   888.666667  1461.000000
std    27.153882  67.175144  6.489829e+06   664.322462   636.396103
min    24.000000   3.000000  1.210000e+02   123.000000  1011.000000
25%    40.000000  26.750000  9.121000e+03   677.000000  1236.000000
50%    56.000000  50.500000  1.414600e+04  1231.000000  1461.000000
75%    67.000000  74.250000  3.259408e+06  1271.500000  1686.000000
max    78.000000  98.000000  1.298912e+07  1312.000000  1911.000000

countryF
count      4.000000
mean    6046.250000
std     9192.447602
min     1132.000000
25%     1357.750000
50%     1612.000000
75%     6300.500000
max    19829.000000
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download