dracodoc - 9 months ago 53

R Question

In data.table, we can select rows based on row number or condition:

`> x <- data.table(letters[1:4], 1:4)`

> x

V1 V2

1: a 1

2: b 2

3: c 3

4: d 4

> x[2]

V1 V2

1: b 2

> x[V1 == "d"]

V1 V2

1: d 4

However I can't select with both row number and condition:

`> x[!2 & V2 > 1]`

Empty data.table (0 rows) of 2 cols: V1,V2

This is probably because the

`!2`

`> x[!2][V2 > 1]`

V1 V2

1: c 3

2: d 4

However I wanted to assign new column values for this subset

`x[!2][V2 > 1, V3 := "more"]`

Now it only created column for the intermediate chained data.table. I can save the intermediate table then merge back to the original table, but that will be cumbersome.

Actually I often feel

`data.table`

`.I`

`.i`

`x[(.i != 2) & (V2 >1), V3 := "more"]`

I can simulate this by creating a row number column explicitly first.

Another method is to make the modifications on subset data.table apply back the original table. Suppose we have x as original table, x[!2] as a subset, then if modifications on

`x[!2]`

`x[!2, refOriginal = TRUE]`

Answer

Here are my two attempts toward the solution: The first one uses the summarize syntax of `data.table`

to calculate a logical vector using row number `.I`

and condition at position `i`

for subsetting and updating columns; the second one uses `which`

and `setdiff`

to remove certain row numbers from the condition, if on the other hand you need an `and`

operation of row number and condition, `setdiff`

can be replaced with `union`

:

```
x[x[, .I != 2 & V2 > 2], V3 := "more"]
x
# V1 V2 V3
# 1: a 1 NA
# 2: b 2 NA
# 3: c 3 more
# 4: d 4 more
x[setdiff(which(V2 > 2), c(2)), V3 := "more"]
x
# V1 V2 V3
# 1: a 1 NA
# 2: b 2 NA
# 3: c 3 more
# 4: d 4 more
```

Source (Stackoverflow)