Wilmar van Ommeren - 9 months ago 57

Python Question

I have a large numpy 2d (10000,10000) with many regions (clustered cells with the same cell value). Wat I want is to merge neighbouring regions which are showing more than 35% border overlap. This overlap should be measured by dividing the size of the common border with the neighbour, by the total border size of the region.

I know how to detect the neighbouring regions (Look here), but I have no idea how to measure the border overlap.

As I am working with large arrays a vectorized solution would be most optimal.

`#input`

region_arr=np.array([[1,1,3,3],[1,2,2,3],[2,2,4,4],[5,5,4,4]])

Output of the neighbour detection script is a numpy 2-d array with the region in the first and the neighbour in the second column.

`#result of neighbour detection`

>>> region_neighbour=detect_neighbours(region_arr)

>>> region_neighbour

array([[1, 2],

[1, 3],

[2, 1],

[2, 3],

[2, 4],

[2, 5],

[3, 1],

[3, 2],

[3, 4],

[4, 2],

[4, 3],

[4, 5],

[5, 2],

[5, 4]])

I would like to add a column to the result of the neighbour detection, which contains the percentual overlap between the region and its neighbour.

In this example the desired output would look like this:

`#output`

>>> percentual_overlap=measure_border_overlap(region_arr,region_neighbour)

>>> percentual_overlap

array([[ 1. , 3. , 0.125 ],

[ 1. , 2. , 0.375 ],

[ 2. , 1. , 0.3 ],

[ 2. , 3. , 0.3 ],

[ 2. , 4. , 0.2 ],

[ 2. , 5. , 0.2 ],

[ 3. , 1. , 0.125 ],

[ 3. , 2. , 0.25 ],

[ 3. , 4. , 0.125 ],

[ 4. , 2. , 0.375 ],

[ 4. , 3. , 0.125 ],

[ 4. , 5. , 0.125 ],

[ 5. , 2. , 0.333333],

[ 5. , 4. , 0.166667]])

With this output it is relatively easy to merge the regions that overlap more than 35% (regions 1 and 2; regions 4 and 2). After the region merging the new array will look like this:

You can calculate the perimeter of each region by applying the function of pv..

Answer

Take a look at this Count cells of adjacent numpy regions for inspiration. Deciding how to merge based on such information is a problem with multiple answers I think; it may not have a unique solution depending on the order in which you proceed...

```
import numpy_indexed as npi
neighbors = np.concatenate([x[:, :-1].flatten(), x[:, +1:].flatten(), x[+1:, :].flatten(), x[:-1, :].flatten()])
centers = np.concatenate([x[:, +1:].flatten(), x[:, :-1].flatten(), x[:-1, :].flatten(), x[+1:, :].flatten()])
border = neighbors != centers
(neighbors, centers), counts = npi.count((neighbors[border], centers[border]))
region_group = group_by(centers)
regions, neighbors_per_region = region_group.sum(counts)
fractions = counts / neighbors_per_region[region_group.inverse]
for result in zip(centers, neighbors, fractions):
print(result)
```

Source (Stackoverflow)