shantanuo shantanuo - 1 year ago 96
Python Question

How to find values below (or above) average

As you can see from the following summary, the count for 1 Sep (1542677) is way below the average count per month.

from StringIO import StringIO

myst="""01/01/2016 8781262
01/02/2016 8958598
01/03/2016 8787628
01/04/2016 9770861
01/05/2016 8409410
01/06/2016 8924784
01/07/2016 8597500
01/08/2016 6436862
01/09/2016 1542677
u_cols=['month', 'count']

myf = StringIO(myst)
import pandas as pd
df = pd.read_csv(StringIO(myst), sep='\t', names = u_cols)

Is there a mathematical formula that can define this "way below or too high" (ambiguous) concept?

This is easy if I define a limit (for e.g. 9 or 10%). But I want the script to decide that for me and return the values if the difference between the lowest and second last lowest value is more than overall 5%. In this case the September month count should be returned.

Answer Source

A very common approach to filtering outliers is to use standard deviation. In this case, we will calculate a zscore which will quickly identify how many standard deviations away from the mean each observation is. We can then filter those observations that are greater than 2 standard deviations. For normally distributed random variables, this should happen approximately 5% of the time.

Define a zscore function

def zscore(s):
    return (s - np.mean(s)) / np.std(s)

Apply it to the count column


0    0.414005
1    0.488906
2    0.416694
3    0.831981
4    0.256946
5    0.474624
6    0.336390
7   -0.576197
8   -2.643349
Name: count, dtype: float64

Notice that the September observation is 2.6 standard deviations away.

Use abs and gt to identify outliers


0    False
1    False
2    False
3    False
4    False
5    False
6    False
7    False
8     True
Name: count, dtype: bool

Again, September comes back true.

Tie it all together to filter your original dataframe


enter image description here

filter the other way


enter image description here

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download