pyne pyne - 3 months ago 9
R Question

r matching string data by column and rows

I'm trying to match two columns of string data where one column has more than the other.

Current data look like:

df <- data.frame("var1" = c('x','a', 'y','b','c','d', 'z'),
"var2" = c('x', 'y', 'z', '', '', '', ''))
df
var1 var2
1 x x
2 a y
3 y z
4 b
5 c
6 d
7 z


And I would like the row orders in var2 to match var1 where values are the same, but be filled with
0
when they don't match as follows:

Desired output:

df

var1 var2
1 x x
2 a 0
3 y y
4 b 0
5 c 0
6 d 0
7 z z


What would be the most efficient way to go about doing this? Thanks.

Answer

You can create a new variable based on if var1 is within var2

library(data.table)
dt = setDT(df)
dt[var1 %in% var2, var3 := var1][is.na(var3), var3 := "0"]

dt
#   var1 var2 var3
#1:    x    x    x
#2:    a    y    0
#3:    y    z    y
#4:    b         0
#5:    c         0
#6:    d         0
#7:    z         z

Or use ifelse:

dt[,var2 := ifelse(var1 %in% var2, var1, "0")]
dt
#    var1 var2
# 1:    x    x
# 2:    a    0
# 3:    y    y
# 4:    b    0
# 5:    c    0
# 6:    d    0
# 7:    z    z

Data:

df <- data.frame("var1" = c('x','a', 'y','b','c','d', 'z'), 
                 "var2" = c('x', 'y', 'z', '', '', '', ''), stringsAsFactors = F)