abagshaw abagshaw - 3 months ago 19
Javascript Question

Javascript number data grouping and outlier removal

I have an array as follows:

var myArray = [3, 6, 8, 9, 16, 17, 19, 37]


I am needing to remove outliers as well as group the remaining data into any distinctive groups that appear. In this case
37
would be removed as an outlier and
[3, 6, 8, 9]
would be returned as the first group and
[16, 17, 19]
would be returned as the second.

Here is a second example

var mySecondArray = [80, 90, 100, 200, 280, 281, 287, 500, 510, 520, 800]


200
and
800
would be removed as an outlier,
[80, 90, 100]
would be the first group,
[280, 281, 287]
would be the second and
[500, 510, 520]
as the third.

I already have written code that works to remove outliers on the outside which is simple enough using the first and third quartile. In other words it would have no problem removing
800
from the
mySecondArray
as an outlier. But it would not remove
280
as an outlier.

I suppose that an outlier could then be defined as a group with less than
n
members so the real issue is what is an efficient method to divide up this data into an appropriate number of groups?

Any help is much appreciated!

Answer

jsFiddle Demo

This is just a simple implementation, it may not be the perfect solution to this set of problems but it should suffice for your example - it may work beyond that as well.

By looking at the average distance between your numbers, and comparing that distance to the distance on either side of each number, it should be possible to remove outliers. Thus following, the same metric can be used for grouping.

function Sum(arr){
	return arr.filter(i => !isNaN(i)).reduce((p,c) => p+c,0);
};
function Avg(arr){
	return Sum(arr) / arr.length;
}
function groupby(arr,dist){
  var groups = [];
  var group = [];
  for(var i = 0; i < arr.length; i++){
    group.push(arr[i]);
    if(arr[i+1] == undefined)continue;
    if(arr[i+1] - arr[i] > dist){
      groups.push(group);
      group = [];
    }
  }
  groups.push(group);
  return groups;
}
function groupOutlier(arr){
  var distbefore = arr.map((c,i,a) => i == 0 ? undefined : c - a[i-1]);
  var distafter = arr.map((c,i,a) => i == a.length-1 ? undefined : a[i+1] - c);

  var avgdist = Avg(distafter);

  var result = arr.filter((c,i,a) => !(distbefore[i] == undefined ? distafter[i] > avgdist : (distafter[i] == undefined ? distbefore[i] > avgdist : distbefore[i] > avgdist && distafter[i] > avgdist)));
  
  return groupby(result,avgdist);
}

var myArray = [3, 6, 8, 9, 16, 17, 19, 37];

console.log(groupOutlier(myArray));

var mySecondArray = [80, 90, 100, 200, 280, 281, 287, 500, 510, 520, 800]

console.log(groupOutlier(mySecondArray));