In order to make clustering a more feasible task, I want to remove items from an array if they have another item which is within some threshold in n-dimensional euclidean space. The input data into this truncation is an array of pixel-wise feature vectors. My first thought was to compute the pairwise euclidean distance matrix between all the items and then operate on them as such:
indices = list(range(len(X)))
dist_matrix = euclidean_distances(X,X)
index = 0
deletion = np.where(dist_matrix[index]<=threshold)
indices = [i for i in indices if i==index or i not in deletion]
index = indices[indices.index(index) + 1]
dictionary = 
for index in indices:
Depending on the number of dimensions n, number of points N, the size of the problem in each dimension L, and your acceptable separation distance d, one option would be to grid your space into boxes of dimension d and retain at most one point within each grid box. Memory requirement would change from O(N^2) to O((L/d)^n), and running time would change from O(N^2) to O(N + (L/d)^n), so it might be more efficient if L/d and n are not too large.
Alternately, it might be practical to use the following algorithm
for each point p in points for each point q in points if p <> q and p.dist(q) < Dmin q.delete
This should be O(N^2) running time and O(0) extra memory.