Sharath Sharath - 3 months ago 14
R Question

Update a column in df2 by matching patterns in columns in df1 & df2 using R

I have 2 data frames like this

TEAM <- c("PE","PE","MPI","TDT","HPT","ATD")
CODE <- c(NA,"F","A","H","G","D")
df1 <- data.frame(TEAM,CODE)

CODE <- c(NA,"F100","A234","D664","H435","G123","A666","D345","G324",NA)
TEAM <- c(NA,NA,NA,NA,NA,NA,NA,NA,NA,NA)
df2 <- data.frame(CODE,TEAM)


I am trying to update the TEAM in df2 by matching the first letter in code column in df1 with the code column in df2

My desired output for df2

CODE TEAM
1 NA PE
2 F100 PE
3 A234 MPI
4 D664 ATD
5 H435 TDT
6 G123 HPT
7 A666 MPI
8 D345 ATD
9 G324 HPT
10 NA PE


I am trying this way with sqldf but it is not right

library(sqldf)
df2 <- sqldf(c("update df2 set TEAM =
case
when CODE like '%F%' then 'PE'
when CODE like '%A%' then 'MPI'
when CODE like '%D%' then 'ATD'
when CODE like '%G%' then 'HPT'
when CODE like '%H%' then 'TDT'
else 'NA'
end"))


Can someone help me provide some directions on achieving this without sqldf?

Answer

Using match and substr (both in base R):

df2$TEAM = df1$TEAM[match(substr(df2$CODE, 1, 1), df1$CODE)]

df2
#    CODE TEAM
# 1  <NA>   PE
# 2  F100   PE
# 3  A234  MPI
# 4  D664  ATD
# 5  H435  TDT
# 6  G123  HPT
# 7  A666  MPI
# 8  D345  ATD
# 9  G324  HPT
# 10 <NA>   PE

This is expedient for a single case - if you're doing things like this frequently I would encourage you to just extract the first letter of code into its own column, CODE_1, and then do a regular merge or join.