rojas rojas - 14 days ago 8
Python Question

Identifying values that sticks out

Given a dict:

data = {'18': [3.89, 1.28], '20': [1.39, 3.15], '15': [1.42, 3.10]}


I want to pick out items that clearly differ from the rest as in
18
. Ideally I would specify
ALLOWED_DISCREPANCY
, setting it to
0.5
for demo, a threshold which categorizes what does and does not stick out (compared to rest of values).

The
18
with its
3.89
is clearly off here because the majority has values around 1.4 (comparing either value from each list is enough to conclude) and the difference (
abs(3.89 - 1.4)
) is greater than
0.5
(max allowed).

Answer

Compute the mean of the values.

>>> from numpy import mean
>>> data = {'18': [3.89, 1.28], '20': [1.39, 3.15], '15': [1.42, 3.10]}
>>> avg = mean([x for sublist in data.values() for x in sublist])
>>> avg
2.3716666666666666

Set the threshold and build a new dictionary which maps the original keys to a list of values that match your constraint. Here's two examples:

>>> thresh = 0.5
>>> {k:[x for x in v if abs(x-avg) > thresh] for k, v in data.items()}
{'18': [3.89, 1.28], '15': [1.42, 3.1], '20': [1.39, 3.15]}
>>>
>>> thresh = 1
>>> {k:[x for x in v if abs(x-avg) > thresh] for k, v in data.items()}
{'18': [3.89, 1.28], '15': [], '20': []}