DataTx DataTx - 3 months ago 8
R Question

Using dplyr to return rows where first character in two columns match and the two rows DO NOT match

I have the following data frame:

df <- structure(list(traffic_Count_Street = c("16th St", "17th St",
"Agnes St", "Ayers St", "Ayers St", "Ayers St", "Ayers St", "Baldwin Blvd",
"Baldwin Blvd", "Baldwin Blvd","S Brahma Blvd"),
unit_Street = c("Baldwin Blvd", "Baldwin Blvd", "Baldwin Blvd", "Baldwin Blvd", "Baldwin Blvd",
"Baldwin Blvd", "Baldwin Blvd", "Baldwin Blvd", "Baldwin Blvd",
"Baldwin Blvd","S 14th St")), .Names = c("traffic_Count_Street", "unit_Street"
), row.names = c(NA, 11L), class = "data.frame")

traffic_Count_Street unit_Street
1 16th St Baldwin Blvd
2 17th St Baldwin Blvd
3 Agnes St Baldwin Blvd
4 Ayers St Baldwin Blvd
5 Ayers St Baldwin Blvd
6 Ayers St Baldwin Blvd
7 Ayers St Baldwin Blvd
8 Baldwin Blvd Baldwin Blvd
9 Baldwin Blvd Baldwin Blvd
10 Baldwin Blvd Baldwin Blvd
11 S Brahma Blvd S 14th St


and I would like to return rows where either the the two columns do not match for each row OR just the first character of each column do match

The result would look like:

traffic_Count_Street unit_Street
1 S Brahma Blvd S 14th St


I have the following but I am not sure if its correct.

require(dplyr)
result = df%>%
filter(traffic_Count_Street != unit_Street & traffic_Count_Street[1] == unit_Street[1])

Answer

We can use substr to extract the first character of each column, compare (==) and filter the rows along with the other comparison in the OP's code.

df %>% 
    filter(substr(traffic_Count_Street, 1, 1) == substr(unit_Street, 1, 1) & 
            traffic_Count_Street != unit_Street)
#  traffic_Count_Street unit_Street
#1        S Brahma Blvd   S 14th St

Or using data.table

setDT(df)[df[,Reduce(`!=`, .SD) & substr(.SD[[1]],1,1) == substr(.SD[[2]], 1, 1)]]
#   traffic_Count_Street unit_Street
#1:        S Brahma Blvd   S 14th St

Or using base R

subset(df, substr(traffic_Count_Street, 1, 1) == substr(unit_Street, 1, 1) &              
            traffic_Count_Street != unit_Street)