Oscar Smith - 8 months ago 74

Python Question

I am making a music recognition program, and as part of it, I need to find the largest connected areas of a numpy array from a png (2200x1700 pixels). My current solution is the following.

`labels, nlabels = ndimage.label(blobs)`

cutoff = len(blobs)*len(blobs[0]) / nlabels

blobs_found = 0

x = []

t1 = time()

for n in range(1, nlabels+1):

squares = np.where(labels==n)

if len(squares[0]) < cutoff:

blobs[squares] = 0

else:

blobs_found += 1

blobs[squares] = blobs_found

x.append(squares - np.amin(squares, axis=0, keepdims=True))

nlabels = blobs_found

print(time() - t1)

This works, but it takes ~6.5 seconds to run. Is there a way I could remove the loop from this code (or otherwise speed it up)?

Answer

You can get the size (in pixels) of each labelled region with:

```
unique_labels = numpy.unique(labels)
label_sizes = scipy.ndimage.measurement.sum(numpy.ones_like(blobs), labels, unique_labels)
```

The largest will then be:

```
unique_labels[label_size == numpy.max(label_size)]
```