Alexander - 1 year ago 67
R Question

# Mutate rows based on maching to user defined strings that works universally

I have a data like this

`````` clas=c("CD_1","X.2_2","K\$2_3","12k3_4",".A_5","xy_6")
df <- data.frame(clas)
> df
clas
1   CD_1
2  X.2_2
3  K\$2_3
4 12k3_4
5   .A_5
6   xy_6
``````

and I would like to change some rows that match this condition

if the strings after
`_`
are 4,5 and 6 replace the strings before the
`_`
with string
`B`
. So the output should like this;

``````    clas
1   CD_1
2  X.2_2
3  K\$2_3
4 12kB_4
5   .B_5
6   xB_6
``````

Thanks!

EDIT::

SO If I have data like this:

``````    clas
1   CD_1
2  X.2_2
3  K\$2_3
4 12k3_4
5   .A_5
6  xy_11
``````

Then applying your solution,

``````df %>% mutate(clas = str_replace(clas, "(.)(_[4511])", "B\\2"))

clas
1   CB_1
2  X.2_2
3  K\$2_3
4 12kB_4
5   .B_5
6  xB_11
``````

But I only want to match
`11`
not
`1`
. How can we do that ?

``````library(dplyr)
library(stringr)

clas <- c("CD_1","X.2_2","K\$2_3","12k3_4",".A_5","xy_6")
df <- data.frame(clas)

df %>% mutate(clas = str_replace(clas, "(.)(_[456])", "B\\2"))
``````

Here putting the matching pattern creates a match with 3 groups, the first containing the whole expression match `._[456]`, the second containing the `.` part and the third containing the `_[456]` part.

`\\2` accesses the third group (0 indexing) and so you replace the whole pattern `._[456]` with `B` followed by whatever matched `_[456]` where `[456]` is a character matching any of the options inside the brackets.

EDIT:

Each character inside of `[]` is treated individually, so `[1111]` is no different from `[1]` because that pattern only matches a single character that is either a 1 or 1 or 1 or 1. Instead you need to use `|` so you have `(.)(_[45]|_11)`. This matches `_4` or `_5` or `_11` in the second pattern group. Also if you want to match 1-9 but not 11 or 15 you need to use `(.)(_[45])\$` where `\$` is the end-of-string indicator. Go look at the cheatsheet and test these out on RegExr.

