user2996994 user2996994 - 24 days ago 7
R Question

R checking if the same numbers occur in multiple rows of a data frame

I have a data frame, nearest_neighbour, which lists the nearest neighbours of a point. So for point 1, the 1st nearest neighbour is point 2, the second nearest neighbour is point 3, and so on.

What is the quickest way to loop through this and check if 4 points all share the same nearest neighbours?
Eg. Point 1's three nearest neighbours are 2, 3 and 4. Point 2's nearest neighbours are 1, 3 and 4 etc.

which.1 which.2 which.3
1 2 3 4
2 1 4 3
3 1 4 2
4 3 1 2
5 2 4 6
6 7 5 2


I can do it easily with if statements for just two neighbours:

count <- 0
for (j in 1:length(nearest_neighbour[[1]])){
if(nearest_neighbour[[1]][nearest_neighbour[[1]][j]] == j){
count <- count + 1
}
}


However this method seems silly for more than 2 as there ends up being a lot of if statements.

lmo lmo
Answer

Here is a base R method using factor and apply

groups <- factor(apply(cbind(df, seq_len(nrow(df))), 1,
                       function(i) paste(sort(i), collapse="_")))

groups
      1       2       3       4       5       6 
1_2_3_4 1_2_3_4 1_2_3_4 1_2_3_4 2_4_5_6 2_5_6_7 
Levels: 1_2_3_4 2_4_5_6 2_5_6_7

The inner function sorts a vector and collapses the result into a string separated by underscores. This function is applied to each row of a modified version of the data frame where the current row number (element ID) is added.