vino88 vino88 - 2 months ago 7
R Question

Sequence a column based on two other columns with a restarting sequence

I'm working a dataframe similar to the one below where I want to count or have an id based on the name and year, but that restarts when name changes. I've looked at a lot of posts (1, 2, 3), but most want a non-restarting identifier/counter.

I thought group_by and seq_along would work, but seq_along only takes one variable so this doesn't work

df1 <- df %>%
group_by(name, year) %>%
arrange(year) %>%
mutate(
exp = seq_along(c(name, year)))


I've also tried to use various data.table sequences with .GRP such as the following to no avail

df1 <- data.table(df, key="name,year")
df1 <- df1[, Year_id :=.GRP, by=key(df1)]


Here's a sample dataframe.

df <- data.frame(
name = rep(c("A","B"), each=5),
year=rep(2000:2001, times=5),
stringsAsFactors=FALSE)


Desired output:

name year Year_id
1 A 2000 1
2 A 2000 1
3 A 2000 1
4 A 2001 2
5 A 2001 2
6 B 2000 1
7 B 2000 1
8 B 2000 1
9 B 2001 2
10 B 2001 2

Answer

1) dplyr Create a factor and extract its levels:

library(dplyr)
df %>% 
   arrange(name, year) %>% 
   group_by(name) %>%
   mutate(Year_id = as.numeric(factor(year))) %>%
   ungroup()

giving:

# A tibble: 10 x 3
    name  year Year_id
   <chr> <int>   <dbl>
1      A  2000       1
2      A  2000       1
3      A  2000       1
4      A  2001       2
5      A  2001       2
6      B  2000       1
7      B  2000       1
8      B  2001       2
9      B  2001       2
10     B  2001       2

1a) The mutate could alternately be written as mutate(Year_id = match(year, unique(year))) as per @nicola's comment.

2) no packages Without package it could be written:

o <- with(df, order(name, year))
transform(df[o, ], Year_id = ave(year, name, FUN = function(x) as.numeric(factor(x))))

or using match.