hannes101 hannes101 - 3 months ago 9
R Question

Change values for a random selection of a data.table subset

Basically an extension to this question, because I noticed, that if you are subsetting for a second time it's not possible to change a value of a column.

random.length <- sample(x = 15:30, size = 1)
dt <- data.table(city=sample(c("Cape Town", "New York", "Pittsburgh", "Tel Aviv", "Amsterdam"), size=random.length, replace = TRUE), score = sample(x=1:10, size = random.length, replace=TRUE))
set.seed(1)
dt[sample(.N,3), score :=9999]
set.seed(1)
dt[sample(.N,3),]


This works as expected and changes the score to 9999 for the three randomly selected cities. Although if you subset in a first step and then do the sampling and try to assign a new score value it's not possible.

set.seed(1)
dt[city == "New York",][sample(.N,1), score := 55555]
set.seed(1)
dt[city == "New York",][sample(.N,1)]


What I would like to achieve is that I can change a value of some column, which is part of certain subset and gets randomly selected from this subset.

Answer

You can also sample the index ( which can be calculated using which function ) besides all the suggestions above:

dt[sample(which(city == "New York"), 1), score:=555L]
dt
#           city score
#  1:   Tel Aviv     8
#  2:  Amsterdam     3
#  3:  Cape Town    10
#  4:   New York     1
#  5:  Cape Town    10
#  6: Pittsburgh     2
#  7: Pittsburgh     8
#  8:  Amsterdam    10
#  9:  Amsterdam     8
# 10:  Amsterdam     4
# 11:   Tel Aviv     7
# 12:  Amsterdam     2
# 13: Pittsburgh     1
# 14:  Amsterdam     3
# 15: Pittsburgh     2
# 16:   New York     7
# 17:   Tel Aviv    10
# 18:   New York    10
# 19:  Cape Town     1
# 20:  Amsterdam     7
# 21:  Amsterdam     3
# 22:   New York   555
# 23:  Cape Town     6
# 24:   New York     1
# 25:   Tel Aviv    10
#           city score
Comments