Nicholas Tong Nicholas Tong - 1 month ago 10
C# Question

Optimal solution to balancing out a set of numbers

So I'm writing for a side project and trying to optimise:

Given a set of n numbers (e.g. [4, 10, 15, 25, 3]), we want to make each number be roughly the same within a given tolerance (i.e. if we wanted exact, then it should be 11.4 in the above example).

We can add/remove from one and add to another. For example, we can -5 from [3] and +5 to [1] which would give us [9, 10, 10, 25, 3].

The constraint that I have is that we want the minimal number of "transfers" between each number (e.g. if we do -3.6 from [3], then it counts as one "transfer").

Not fussed about performance (most I can see it going to is a set of 50 numbers max) but really want to keep the transfers to a minimum.

We can assume the tolerance is +/- 1 to start but can dynamically change.

Answer Source

The goal of the algorithm is to make sure that each of the numbers in the list are roughly the same within a given tolerance. Thus, if the tolerance is zero, all the numbers must be equal to the average of all the values in the list (which will remain constant throughout the algorithm). Taking in account the tolerance, all numbers in the list must belong to the inclusive interval [average - 0.5*TOLERANCE, average + 0.5*TOLERANCE].

The main iteration of the algorithm involves retrieving the maximum and minimum values and "transferring" just enough from the maximum to the minimum so that the value furthest from the average (this can be either the minimum or the maximum) falls in the required interval. This process iterates till the maximum and minimum values are not more than TOLERANCE units away from each other.

Pseudocode for the algorithm will look as follows:

target = average of the values in the list

while dist(max, min) > TOLERANCE
    x = maximum of dist(max, target) and dist(min, target)
    transfer (x - 0.5*TOLERANCE) units from maximum into minimum

dist(a, b) can be defined simply as abs(a - b)

This algorithm runs in about O(n^2) time on average, requiring a bit more than n iterations, where n is the number of values.

This algorithm requires less than half the number of iterations the naive sub-optimal approach of averaging out only the minimum and maximum values in each iteration takes.