user - 1 month ago 4

Python Question

How do you optimize this code?

At the moment it is running to slow for the amount of data that goes through this loop. This code runs 1-nearest neighbor. It will predict the label of the training_element based off the p_data_set

`# [x] , [[x1],[x2],[x3]], [l1, l2, l3]`

def prediction(training_element, p_data_set, p_label_set):

temp = np.array([], dtype=float)

for p in p_data_set:

temp = np.append(temp, distance.euclidean(training_element, p))

minIndex = np.argmin(temp)

return p_label_set[minIndex]

Answer

Use a *k*-D tree for fast nearest-neighbour lookups, e.g. `scipy.spatial.cKDTree`

:

```
from scipy.spatial import cKDTree
# I assume that p_data_set is (nsamples, ndims)
tree = cKDTree(p_data_set)
# training_elements is also assumed to be (nsamples, ndims)
dist, idx = tree.query(training_elements, k=1)
predicted_labels = p_label_set[idx]
```

Source (Stackoverflow)

Comments