Wilmar van Ommeren Wilmar van Ommeren - 2 months ago 25
Python Question

Measure border overlap between numpy 2d regions

I have a large numpy 2d (10000,10000) with many regions (clustered cells with the same cell value). Wat I want is to merge neighbouring regions which are showing more than 35% border overlap. This overlap should be measured by dividing the size of the common border with the neighbour, by the total border size of the region.

I know how to detect the neighbouring regions (Look here), but I have no idea how to measure the border overlap.

As I am working with large arrays a vectorized solution would be most optimal.




Example



#input
region_arr=np.array([[1,1,3,3],[1,2,2,3],[2,2,4,4],[5,5,4,4]])


enter image description here

Output of the neighbour detection script is a numpy 2-d array with the region in the first and the neighbour in the second column.

#result of neighbour detection
>>> region_neighbour=detect_neighbours(region_arr)
>>> region_neighbour
array([[1, 2],
[1, 3],
[2, 1],
[2, 3],
[2, 4],
[2, 5],
[3, 1],
[3, 2],
[3, 4],
[4, 2],
[4, 3],
[4, 5],
[5, 2],
[5, 4]])


I would like to add a column to the result of the neighbour detection, which contains the percentual overlap between the region and its neighbour. Percentual overlap between region 1 and 3 = 1/8 = 0.125 = common border size/total border size of region 1.

In this example the desired output would look like this:

#output
>>> percentual_overlap=measure_border_overlap(region_arr,region_neighbour)
>>> percentual_overlap
array([[ 1. , 3. , 0.125 ],
[ 1. , 2. , 0.375 ],
[ 2. , 1. , 0.3 ],
[ 2. , 3. , 0.3 ],
[ 2. , 4. , 0.2 ],
[ 2. , 5. , 0.2 ],
[ 3. , 1. , 0.125 ],
[ 3. , 2. , 0.25 ],
[ 3. , 4. , 0.125 ],
[ 4. , 2. , 0.375 ],
[ 4. , 3. , 0.125 ],
[ 4. , 5. , 0.125 ],
[ 5. , 2. , 0.333333],
[ 5. , 4. , 0.166667]])


With this output it is relatively easy to merge the regions that overlap more than 35% (regions 1 and 2; regions 4 and 2). After the region merging the new array will look like this:

enter image description here

Edit



You can calculate the perimeter of each region by applying the function of pv..

Answer

Take a look at this Count cells of adjacent numpy regions for inspiration. Deciding how to merge based on such information is a problem with multiple answers I think; it may not have a unique solution depending on the order in which you proceed...

import numpy_indexed as npi

neighbors = np.concatenate([x[:, :-1].flatten(), x[:, +1:].flatten(), x[+1:, :].flatten(), x[:-1, :].flatten()])
centers   = np.concatenate([x[:, +1:].flatten(), x[:, :-1].flatten(), x[:-1, :].flatten(), x[+1:, :].flatten()])
border = neighbors != centers

(neighbors, centers), counts  = npi.count((neighbors[border], centers[border]))
region_group = group_by(centers)
regions, neighbors_per_region = region_group.sum(counts)
fractions = counts / neighbors_per_region[region_group.inverse]
for result in zip(centers, neighbors, fractions): 
    print(result)
Comments