andrasz - 5 months ago 38

R Question

I have a data.frame / data.table in R as follows:

`df <- data.frame(`

ID=c(rep("A", 20)),

year=c(1968, 1971, 1972, 1973, 1974, 1976, 1978, 1980, 1982, 1984, 1985,

1986, 1987, 1988, 1990, 1991, 1992, 1993, 1994, 1995))

I'd like to subset the df in order to keep only those entries which have

Answer

A compact solution using `diff`

and `cumsum`

:

```
setDT(df)[, grp := cumsum(c(0, diff(year)) > 1), by = ID
][, if (.N > 4) .SD, by = grp][, grp := NULL][]
```

which gives the desired result:

```
ID year
1: A 1984
2: A 1985
3: A 1986
4: A 1987
5: A 1988
6: A 1990
7: A 1991
8: A 1992
9: A 1993
10: A 1994
11: A 1995
```

A compareble approach in base R:

```
i <- with(df, ave(year, ID, FUN = function(x) {
r <- rle(cumsum(c(0, diff(year)) > 1));
rep(r$lengths, r$lengths)
} ))
df[i > 4,] # or df[which(i > 4),]
```

which will get you the same result.