Problem: I need to make a unique ID field for data that has two levels of grouping. In the example code here, it is
dat <- data.frame(Emp = c("A","A","A","B","B","C"),
Color = c("Red","Green","Green","Orange","Yellow","Brown"),
stringsAsFactors = FALSE)
ID <- c("A.01.001", "A.02.001", "A.02.002", "B.01.001", "B.02.001", "C.01.001")
 "A.01.001" "A.02.001" "A.02.002" "B.01.001" "B.02.001" "C.01.001"
group_by(dat, Emp, Color) %>%
mutate(suffix = str_pad(row_number(), width=3, side="left", pad="0"))
library(data.table) setDT(dat)[, ID := sprintf('%s.%02d.%03d', Emp, rleid(Color), rowid(rleid(Color))), by = Emp]
> dat Emp Color ID 1: A Red A.01.001 2: A Green A.02.001 3: A Green A.02.002 4: B Orange B.01.001 5: B Yellow B.02.001 6: C Brown C.01.001
How this works:
ID-variable with the
sprintfyou paste several vector easily together according to a specified format.
:=means that the
data.tableis updated by reference.
%sindicates that a string is to be used in the first part (which is
%03dindicates that a number needs to have two or three digits with a leading zero(s) when needed. The dots in between will taken literally and thus in cluded in the resulting string.
Adressing the comment of @jsta, if the values in the
Color-column are not sequential you can use:
setDT(dat)[, r := as.integer(factor(Color, levels = unique(Color))), by = Emp ][, ID := sprintf('%s.%02d.%03d', Emp, r, rowid(r)), by = Emp][, r:= NULL]
This will also maintain the order in which the
Color column is presented. Instead of
as.integer(factor(Color, levels = unique(Color))) you can also use
match(Color, unique(Color)) as shown by akrun.