M.Dimo - 11 months ago 79

R Question

Given a sortkey, is there a data.table shortcut to duplicate the

`first`

`last`

The pedestrian approach below flags the first record of a group.

Given the elegance of data.table (with which I'm slowly getting familiar), I'm assuming there's a shortcut using a self join &

`mult`

Here's the example:

`require(data.table)`

set.seed(123)

n <- 17

DT <- data.table(x=sample(letters[1:3],n,replace=T),

y=sample(LETTERS[1:3],n,replace=T))

sortkey <- c("x","y")

setkeyv(DT,sortkey)

key <- paste(DT$x,DT$y,sep="-")

nw <- c( T , key[2:n]!=key[1:(n-1)] )

DT$first <- 1*nw

DT

Answer Source

Here are couple of solutions using `data.table`

:

```
## Option 1 (cleaner solution, added 2016-11-29)
uDT <- unique(DT)
DT[, c("first","last"):=0L]
DT[uDT, first:=1L, mult="first"]
DT[uDT, last:=1L, mult="last"]
## Option 2 (original answer, retained for posterity)
DT <- cbind(DT, first=0L, last=0L)
DT[DT[unique(DT),,mult="first", which=TRUE], first:=1L]
DT[DT[unique(DT),,mult="last", which=TRUE], last:=1L]
head(DT)
# x y first last
# [1,] a A 1 1
# [2,] a B 1 1
# [3,] a C 1 0
# [4,] a C 0 1
# [5,] b A 1 1
# [6,] b B 1 1
```

There's obviously a lot packed into each of those lines. The key construct, though, is the following, which returns the row index of the first record in each group:

```
DT[unique(DT),,mult="first", which=TRUE]
# [1] 1 2 3 5 6 7 11 13 15
```