user2996994 - 4 months ago 21

R Question

I have a data frame, nearest_neighbour, which lists the nearest neighbours of a point. So for point 1, the 1st nearest neighbour is point 2, the second nearest neighbour is point 3, and so on.

What is the quickest way to loop through this and check if 4 points all share the same nearest neighbours?

Eg. Point 1's three nearest neighbours are 2, 3 and 4. Point 2's nearest neighbours are 1, 3 and 4 etc.

`which.1 which.2 which.3`

1 2 3 4

2 1 4 3

3 1 4 2

4 3 1 2

5 2 4 6

6 7 5 2

I can do it easily with if statements for just two neighbours:

`count <- 0`

for (j in 1:length(nearest_neighbour[[1]])){

if(nearest_neighbour[[1]][nearest_neighbour[[1]][j]] == j){

count <- count + 1

}

}

However this method seems silly for more than 2 as there ends up being a lot of if statements.

Answer

Here is a base R method using `factor`

and `apply`

```
groups <- factor(apply(cbind(df, seq_len(nrow(df))), 1,
function(i) paste(sort(i), collapse="_")))
groups
1 2 3 4 5 6
1_2_3_4 1_2_3_4 1_2_3_4 1_2_3_4 2_4_5_6 2_5_6_7
Levels: 1_2_3_4 2_4_5_6 2_5_6_7
```

The inner function sorts a vector and collapses the result into a string separated by underscores. This function is applied to each row of a modified version of the data frame where the current row number (element ID) is added.