user - 2 months ago 13

Python Question

How do you optimize this code?

At the moment it is running to slow for the amount of data that goes through this loop. This code runs 1-nearest neighbor. It will predict the label of the training_element based off the p_data_set

`# [x] , [[x1],[x2],[x3]], [l1, l2, l3]`

def prediction(training_element, p_data_set, p_label_set):

temp = np.array([], dtype=float)

for p in p_data_set:

temp = np.append(temp, distance.euclidean(training_element, p))

minIndex = np.argmin(temp)

return p_label_set[minIndex]

Answer

Use a *k*-D tree for fast nearest-neighbour lookups, e.g. `scipy.spatial.cKDTree`

:

```
from scipy.spatial import cKDTree
# I assume that p_data_set is (nsamples, ndims)
tree = cKDTree(p_data_set)
# training_elements is also assumed to be (nsamples, ndims)
dist, idx = tree.query(training_elements, k=1)
predicted_labels = p_label_set[idx]
```

Source (Stackoverflow)

Comments