Wilmar van Ommeren - 2 months ago 25
Python Question

# Measure border overlap between numpy 2d regions

I have a large numpy 2d (10000,10000) with many regions (clustered cells with the same cell value). Wat I want is to merge neighbouring regions which are showing more than 35% border overlap. This overlap should be measured by dividing the size of the common border with the neighbour, by the total border size of the region.

I know how to detect the neighbouring regions (Look here), but I have no idea how to measure the border overlap.

As I am working with large arrays a vectorized solution would be most optimal.

# Example

``````#input
region_arr=np.array([[1,1,3,3],[1,2,2,3],[2,2,4,4],[5,5,4,4]])
``````

Output of the neighbour detection script is a numpy 2-d array with the region in the first and the neighbour in the second column.

``````#result of neighbour detection
>>> region_neighbour=detect_neighbours(region_arr)
>>> region_neighbour
array([[1, 2],
[1, 3],
[2, 1],
[2, 3],
[2, 4],
[2, 5],
[3, 1],
[3, 2],
[3, 4],
[4, 2],
[4, 3],
[4, 5],
[5, 2],
[5, 4]])
``````

I would like to add a column to the result of the neighbour detection, which contains the percentual overlap between the region and its neighbour. Percentual overlap between region 1 and 3 = 1/8 = 0.125 = common border size/total border size of region 1.

In this example the desired output would look like this:

``````#output
>>> percentual_overlap=measure_border_overlap(region_arr,region_neighbour)
>>> percentual_overlap
array([[ 1.       ,  3.       ,  0.125   ],
[ 1.       ,  2.       ,  0.375   ],
[ 2.       ,  1.       ,  0.3     ],
[ 2.       ,  3.       ,  0.3     ],
[ 2.       ,  4.       ,  0.2     ],
[ 2.       ,  5.       ,  0.2     ],
[ 3.       ,  1.       ,  0.125   ],
[ 3.       ,  2.       ,  0.25    ],
[ 3.       ,  4.       ,  0.125   ],
[ 4.       ,  2.       ,  0.375   ],
[ 4.       ,  3.       ,  0.125   ],
[ 4.       ,  5.       ,  0.125   ],
[ 5.       ,  2.       ,  0.333333],
[ 5.       ,  4.       ,  0.166667]])
``````

With this output it is relatively easy to merge the regions that overlap more than 35% (regions 1 and 2; regions 4 and 2). After the region merging the new array will look like this:

# Edit

You can calculate the perimeter of each region by applying the function of pv..

Take a look at this Count cells of adjacent numpy regions for inspiration. Deciding how to merge based on such information is a problem with multiple answers I think; it may not have a unique solution depending on the order in which you proceed...

``````import numpy_indexed as npi

neighbors = np.concatenate([x[:, :-1].flatten(), x[:, +1:].flatten(), x[+1:, :].flatten(), x[:-1, :].flatten()])
centers   = np.concatenate([x[:, +1:].flatten(), x[:, :-1].flatten(), x[:-1, :].flatten(), x[+1:, :].flatten()])
border = neighbors != centers

(neighbors, centers), counts  = npi.count((neighbors[border], centers[border]))
region_group = group_by(centers)
regions, neighbors_per_region = region_group.sum(counts)
fractions = counts / neighbors_per_region[region_group.inverse]
for result in zip(centers, neighbors, fractions):
print(result)
``````