Michael Gruenstaeudl Michael Gruenstaeudl - 7 months ago 59
Python Question

Evaluation of nested integer intervals in Python

This question is about interval comparison of nested integer intervals.

Assume three ranges of integers, which I call target ranges for sake of simplicity. These target ranges never overlap, but may be of different length.

> target1 = range(1,10000)
> target2 = range(10001,20000)
> target3 = range(20001,25000)

Further assume another range, which I call test range, that always has a smaller length than any of the target ranges, but which may cross into an adjacent target range.

> test1 = range(900,5000) # entirely in target1
> test2 = range(9900,10500) # mostly in target2, but crosses into target1

Is there a Python function that helps identify which target ranges a test range falls into? In case the test range crosses into an adjacent target range, only that target range shall be given that hosts the largest proportion of the test range.

> sought_function(test1, [target1, target2, target3])
# 1
> sought_function(test2, [target1, target2, target3])
# 2


In absence of a standard Python function for interval comparison of nested integer intervals, what code would you use? Below is some quick and clunky Python code for a function entitled nested_in_which that can certainly be improved.

def nested_in_which(test, targets):
for n, t in enumerate(targets):
if test[0] in t and test[-1] in t:
if test[0] in t and n < len(targets) and test[-1] in targets[n+1]:
return(n+1) # Overlap comparison not yet implemented


If you think of each range as a set. You want the target range that gives you the largest intersection with the test set.

So if you calculate the length of the intersection between each target and the test and return the index of the max intersection you should have what you want.

Here is some rough code that does it:

def which_range( testRange, *targetRanges ):
    testRange = set( testRange )
    tests = [ ( i, len( set( targetRange ).intersection( testRange ) ) ) for i, targetRange in enumerate( targetRanges ) ]
    return max( tests, key=lambda x: x[1] )[0]

>>> which_range( range(9900,10500), range(1,10000), range(10001,20000), range(20001,25000) )
1 # the second target range
>>> which_range( range(900,5000), range(1,10000), range(10001,20000), range(20001,25000) )
0 # the first target range