Bonono Bonono - 2 months ago 5
R Question

Replace element in vector based on first letter of character string

Consider the vectors below:

ID <- c("A1","B1","C1","A2","B2","C2","Av1")

names <- c("ALPHA","BRAVO","CHARLIE","AVOCADO")


I want to replace the first character of each element in vector
ID
with vector
names
based on the first letter of vector
names
. I also want to add a
_0
before each number.

Note that the elements
Av1
and
AVOCADO
throw things off a bit, especially with the lowercase
v
in
Av1
.

The result should look like this:

res <- c("ALPHA_01","BRAVO_01","CHARLIE_01","ALPHA_02","BRAVO_02","CHARLIE_02", "AVOCADO_01")


I know it should be done with
regex
but I've been trying for 2 days now and haven't got anywhere.

Answer

We can use gsubfn.

library(gsubfn)
#remove the number part from 'ID' (using `sub`) and get the unique elements
nm1 <- unique(sub("\\d+", "", ID))
#using gsubfn, replace the non-numeric elements with the matching 
#key/value pair in the replacement
#finally format to add the "_" with sub
sub("(\\d+)$", "_0\\1", gsubfn("(\\D+)", as.list(setNames(names, nm1)), ID))
#[1] "ALPHA_01"   "BRAVO_01"   "CHARLIE_01" "ALPHA_02" 
#[5] "BRAVO_02"   "CHARLIE_02" "AVOCADO_01"

The (\\d+) indicates one or more numeric elements, and (\\D+) is one or more non-numeric elements. We are wrapping it within the brackets to capture as a group and replace it with the backreference (\\1 - as it is the first backreference for the captured group).