dracodoc - 1 year ago 103
R Question

# select data.table R rows based on row number and condition

In data.table, we can select rows based on row number or condition:

``````> x <- data.table(letters[1:4], 1:4)
> x
V1 V2
1:  a  1
2:  b  2
3:  c  3
4:  d  4
> x[2]
V1 V2
1:  b  2
> x[V1 == "d"]
V1 V2
1:  d  4
``````

However I can't select with both row number and condition:

``````> x[!2 & V2 > 1]
Empty data.table (0 rows) of 2 cols: V1,V2
``````

This is probably because the
`!2`
is not interpreted as row number in this format. I knew I can chain the two conditions:

``````> x[!2][V2 > 1]
V1 V2
1:  c  3
2:  d  4
``````

However I wanted to assign new column values for this subset

``````x[!2][V2 > 1, V3 := "more"]
``````

Now it only created column for the intermediate chained data.table. I can save the intermediate table then merge back to the original table, but that will be cumbersome.

Actually I often feel
`data.table`
need a proper row number.
`.I`
is a dynamic number depend on group, but I wanted a unique id that can identify every row, this unique id can be very useful in merge/join (often the data don't have a unique id). If
`.i`
is the row number, I can use

``````x[(.i != 2) & (V2 >1), V3 := "more"]
``````

I can simulate this by creating a row number column explicitly first.

Another method is to make the modifications on subset data.table apply back the original table. Suppose we have x as original table, x[!2] as a subset, then if modifications on
`x[!2]`
actually modify x, my question will be solved too. Of course this kind of subset need to be created in different way, for example
`x[!2, refOriginal = TRUE]`
.

Here are my two attempts toward the solution: The first one uses the summarize syntax of `data.table` to calculate a logical vector using row number `.I` and condition at position `i` for subsetting and updating columns; the second one uses `which` and `setdiff` to remove certain row numbers from the condition, if on the other hand you need an `and` operation of row number and condition, `setdiff` can be replaced with `union`:

``````x[x[, .I != 2 & V2 > 2], V3 := "more"]
x
#    V1 V2   V3
# 1:  a  1   NA
# 2:  b  2   NA
# 3:  c  3 more
# 4:  d  4 more

x[setdiff(which(V2 > 2), c(2)), V3 := "more"]
x
#    V1 V2   V3
# 1:  a  1   NA
# 2:  b  2   NA
# 3:  c  3 more
# 4:  d  4 more
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download