user2958395 - 1 year ago 64
Python Question

# Determine if one range is within another

If there is a file with ranges sorted by the first column (no overlap of ranges):

``````1 10
12 15
18 19
``````

And another, sorted by the first column (can have overlaps):

``````1 5
2 10
12 13
13 20
``````

I would like to determine for each line (range)in the second file, if this line(range) intersects with any of the ranges in the first file. I did the following so far

``````df_1 = pd.read_csv('range1.txt',sep=' ')

for i in xrange(len(df_1)):
start_1 = df_1.iloc[i,0]
stop_1 = df_1.iloc[i, 1]
for j in xrange(len(df_2)):
start_2 = df_2.iloc[j,0]
stop_2 = df_2.iloc[j, 1]
if start_2 > stop_1:
break
elif stop_2 < start_1:
continue
else:
# add ranges from second file to list
``````

This I know can be terribly inefficient, so I was wondering if there is a more computationally efficient/faster way to solve this.

@Olivier Pellier-Cuit has provided a link to fast overlap test. If you need membership check instead of overlap test, use this algorithm.

So using this algorithm we can do the following:

``````df1['m'] = (df1.a + df1.b)
df1['d'] = (df1.b - df1.a)

df2['m'] = (df2.a + df2.b)
df2['d'] = (df2.b - df2.a)

df2[['m','d']].apply(lambda x: (np.abs(df1.m - x.m) < df1.d +x.d).any(), axis=1)
``````

PS i've slightly simplified the calculations of `m` and `d` by getting rid of `division by 2`, because it can be done eliminating common terms.

Output:

``````In [105]: df2[['m','d']].apply(lambda x: (np.abs(df1.m - x.m) < df1.d +x.d).any(), axis=1)
Out[105]:
0     True
1     True
2     True
3     True
4    False
dtype: bool
``````

setup:

``````df1 = pd.read_csv(io.StringIO("""
a b
1 10
12 15
18 19
"""), delim_whitespace=True)

a b
1 5
2 10
12 13
13 20
50 60
"""), delim_whitespace=True)
``````

NOTE: i've intentionally added a pair (50, 60) to the DF2, which doesn't overlap with any interval from DF1

Data frames with calculated `m` and `d` columns:

``````In [106]: df1
Out[106]:
a   b   m  d
0   1  10  11  9
1  12  15  27  3
2  18  19  37  1

In [107]: df2
Out[107]:
a   b    m   d
0   1   5    6   4
1   2  10   12   8
2  12  13   25   1
3  13  20   33   7
4  50  60  110  10
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download