Geep Geep - 2 years ago 100
R Question

Picking LHS column and RHS column of data.table assignment using other column values in R

Here is the code to produce a sample dataset:

require(data.table)
testdata <- data.table(
X = rep(sample(1:3),5),
Y = rep(sample(1:3),5),
X1 = rnorm(15),
X2 = rnorm(15),
X3 = rnorm(15),
Y1 = NA_character_,
Y2 = NA_character_,
Y3 = NA_character_
)


Initial data table:

X Y X1 X2 X3 Y1 Y2 Y3
1: 3 3 -0.7098927 0.63342935 0.94470612 NA NA NA
2: 1 2 0.3008547 -1.40043977 1.53781754 NA NA NA
3: 2 1 0.3423140 0.34897695 -0.38402565 NA NA NA
4: 3 3 -0.5726456 -2.24526957 -1.10947867 NA NA NA
5: 1 2 -1.3239474 -0.53924617 -0.04103982 NA NA NA
6: 2 1 0.2493801 0.85806647 0.96488021 NA NA NA
7: 3 3 -2.0653505 0.05481703 1.75161043 NA NA NA
8: 1 2 -1.3919774 0.34282832 0.50834289 NA NA NA
9: 2 1 0.5928025 -1.11899399 0.35967102 NA NA NA
10: 3 3 -0.4704720 0.64004313 -0.17343794 NA NA NA
11: 1 2 0.3056093 2.14544631 0.43740447 NA NA NA
12: 2 1 -0.1568971 1.05091249 1.18884487 NA NA NA
13: 3 3 -1.3078670 1.07482123 -0.65367957 NA NA NA
14: 1 2 0.4622123 -0.60308532 -1.11104235 NA NA NA
15: 2 1 -0.7894978 0.33018926 -0.04700393 NA NA NA


Here is the action I want to perform:
In each row,

if X = 2 and Y = 3 then Y3 <- X2


Expected Output:

X Y X1 X2 X3 Y1 Y2 Y3
1: 3 3 -0.7098927 0.63342935 0.94470612 NA NA 0.94470612
2: 1 2 0.3008547 -1.40043977 1.53781754 NA 0.3008547 NA
3: 2 1 0.3423140 0.34897695 -0.38402565 0.34897695 NA NA
4: 3 3 -0.5726456 -2.24526957 -1.10947867 NA NA -1.10947867
5: 1 2 -1.3239474 -0.53924617 -0.04103982 NA -1.3239474 NA
6: 2 1 0.2493801 0.85806647 0.96488021 0.85806647 NA NA
7: 3 3 -2.0653505 0.05481703 1.75161043 NA NA 1.75161043
8: 1 2 -1.3919774 0.34282832 0.50834289 NA -1.3919774 NA
9: 2 1 0.5928025 -1.11899399 0.35967102 -1.11899399 NA NA
10: 3 3 -0.4704720 0.64004313 -0.17343794 NA NA -0.17343794
11: 1 2 0.3056093 2.14544631 0.43740447 NA 0.3056093 NA
12: 2 1 -0.1568971 1.05091249 1.18884487 1.05091249 NA NA
13: 3 3 -1.3078670 1.07482123 -0.65367957 NA NA -0.65367957
14: 1 2 0.4622123 -0.60308532 -1.11104235 NA 0.4622123 NA
15: 2 1 -0.7894978 0.33018926 -0.04700393 0.33018926 NA NA


How can I achieve this using simple data.table syntax? I have tried get, eval(parse) etc but running into trouble each time.

Note that my actual dataset is quite large(100 plus columns) so I require a solution that doesn't rely on column numbers. I can possible write a large number of if statements as well but it looks like a bad way to do this for about 30 odd columns that need to be assigned in a similar way.

data.table version is 1.10.4 and the R version is 3.3.2

Edit: I solved it using a function. Not sure if this is the best way though

populateY <- function(input_table) {

for(i in 1:nrow(input_table)) {
k <- X
j <- Y
tempX <- paste0("input_table$X",k,"[i]")
tempY <- paste0("input_table$Y",j,"[i]")
eval(parse(text = paste0(tempY," <- ",tempX)))
}
return(input_table)
}

Answer Source

If you're open to using the tidyverse and tibble data frames, I would do it this way.

require(tibble)
testdata <- as_tibble(testdata)

testdata <- testdata %>%
  mutate(Y3 = ifelse(X == 2 & Y == 3, X2, NA))

You can then add all the lines you need easily and legibly in the mutate function.

Else if you're going to use data.tables for sure, then I'd go with akrun's suggestion, though you'll need change the data type of column Y3 to double, or just not have it exist when you run that code.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download