pyne - 1 year ago 55

R Question

I'm trying to match two columns of string data where one column has more than the other.

Current data look like:

`df <- data.frame("var1" = c('x','a', 'y','b','c','d', 'z'),`

"var2" = c('x', 'y', 'z', '', '', '', ''))

df

var1 var2

1 x x

2 a y

3 y z

4 b

5 c

6 d

7 z

And I would like the row orders in var2 to match var1 where values are the same, but be filled with

`0`

Desired output:

`df`

var1 var2

1 x x

2 a 0

3 y y

4 b 0

5 c 0

6 d 0

7 z z

What would be the most efficient way to go about doing this? Thanks.

Answer Source

You can create a new variable based on if `var1`

is within `var2`

```
library(data.table)
dt = setDT(df)
dt[var1 %in% var2, var3 := var1][is.na(var3), var3 := "0"]
dt
# var1 var2 var3
#1: x x x
#2: a y 0
#3: y z y
#4: b 0
#5: c 0
#6: d 0
#7: z z
```

Or use `ifelse`

:

```
dt[,var2 := ifelse(var1 %in% var2, var1, "0")]
dt
# var1 var2
# 1: x x
# 2: a 0
# 3: y y
# 4: b 0
# 5: c 0
# 6: d 0
# 7: z z
```

*Data*:

```
df <- data.frame("var1" = c('x','a', 'y','b','c','d', 'z'),
"var2" = c('x', 'y', 'z', '', '', '', ''), stringsAsFactors = F)
```