abagshaw - 1 year ago 108
Javascript Question

# Javascript number data grouping and outlier removal

I have an array as follows:

`var myArray = [3, 6, 8, 9, 16, 17, 19, 37]`

I am needing to remove outliers as well as group the remaining data into any distinctive groups that appear. In this case
`37`
would be removed as an outlier and
`[3, 6, 8, 9]`
would be returned as the first group and
`[16, 17, 19]`
would be returned as the second.

Here is a second example

`var mySecondArray = [80, 90, 100, 200, 280, 281, 287, 500, 510, 520, 800]`

`200`
and
`800`
would be removed as an outlier,
`[80, 90, 100]`
would be the first group,
`[280, 281, 287]`
would be the second and
`[500, 510, 520]`
as the third.

I already have written code that works to remove outliers on the outside which is simple enough using the first and third quartile. In other words it would have no problem removing
`800`
from the
`mySecondArray`
as an outlier. But it would not remove
`280`
as an outlier.

I suppose that an outlier could then be defined as a group with less than
`n`
members so the real issue is what is an efficient method to divide up this data into an appropriate number of groups?

Any help is much appreciated!

`jsFiddle Demo`

This is just a simple implementation, it may not be the perfect solution to this set of problems but it should suffice for your example - it may work beyond that as well.

By looking at the average distance between your numbers, and comparing that distance to the distance on either side of each number, it should be possible to remove outliers. Thus following, the same metric can be used for grouping.

``````function Sum(arr){
return arr.filter(i => !isNaN(i)).reduce((p,c) => p+c,0);
};
function Avg(arr){
return Sum(arr) / arr.length;
}
function groupby(arr,dist){
var groups = [];
var group = [];
for(var i = 0; i < arr.length; i++){
group.push(arr[i]);
if(arr[i+1] == undefined)continue;
if(arr[i+1] - arr[i] > dist){
groups.push(group);
group = [];
}
}
groups.push(group);
return groups;
}
function groupOutlier(arr){
var distbefore = arr.map((c,i,a) => i == 0 ? undefined : c - a[i-1]);
var distafter = arr.map((c,i,a) => i == a.length-1 ? undefined : a[i+1] - c);

var avgdist = Avg(distafter);

var result = arr.filter((c,i,a) => !(distbefore[i] == undefined ? distafter[i] > avgdist : (distafter[i] == undefined ? distbefore[i] > avgdist : distbefore[i] > avgdist && distafter[i] > avgdist)));

return groupby(result,avgdist);
}

var myArray = [3, 6, 8, 9, 16, 17, 19, 37];

console.log(groupOutlier(myArray));

var mySecondArray = [80, 90, 100, 200, 280, 281, 287, 500, 510, 520, 800]

console.log(groupOutlier(mySecondArray));``````

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download