dracodoc dracodoc - 1 year ago 67
R Question

select data.table R rows based on row number and condition

In data.table, we can select rows based on row number or condition:

> x <- data.table(letters[1:4], 1:4)
> x
V1 V2
1: a 1
2: b 2
3: c 3
4: d 4
> x[2]
V1 V2
1: b 2
> x[V1 == "d"]
V1 V2
1: d 4

However I can't select with both row number and condition:

> x[!2 & V2 > 1]
Empty data.table (0 rows) of 2 cols: V1,V2

This is probably because the
is not interpreted as row number in this format. I knew I can chain the two conditions:

> x[!2][V2 > 1]
V1 V2
1: c 3
2: d 4

However I wanted to assign new column values for this subset

x[!2][V2 > 1, V3 := "more"]

Now it only created column for the intermediate chained data.table. I can save the intermediate table then merge back to the original table, but that will be cumbersome.

Actually I often feel
need a proper row number.
is a dynamic number depend on group, but I wanted a unique id that can identify every row, this unique id can be very useful in merge/join (often the data don't have a unique id). If
is the row number, I can use

x[(.i != 2) & (V2 >1), V3 := "more"]

I can simulate this by creating a row number column explicitly first.

Another method is to make the modifications on subset data.table apply back the original table. Suppose we have x as original table, x[!2] as a subset, then if modifications on
actually modify x, my question will be solved too. Of course this kind of subset need to be created in different way, for example
x[!2, refOriginal = TRUE]

Answer Source

Here are my two attempts toward the solution: The first one uses the summarize syntax of data.table to calculate a logical vector using row number .I and condition at position i for subsetting and updating columns; the second one uses which and setdiff to remove certain row numbers from the condition, if on the other hand you need an and operation of row number and condition, setdiff can be replaced with union:

x[x[, .I != 2 & V2 > 2], V3 := "more"]
#    V1 V2   V3
# 1:  a  1   NA
# 2:  b  2   NA
# 3:  c  3 more
# 4:  d  4 more

x[setdiff(which(V2 > 2), c(2)), V3 := "more"]
#    V1 V2   V3
# 1:  a  1   NA
# 2:  b  2   NA
# 3:  c  3 more
# 4:  d  4 more