Alexander Alexander - 2 years ago 77
R Question

Mutate rows based on maching to user defined strings that works universally

I have a data like this

clas=c("CD_1","X.2_2","K$2_3","12k3_4",".A_5","xy_6")
df <- data.frame(clas)
> df
clas
1 CD_1
2 X.2_2
3 K$2_3
4 12k3_4
5 .A_5
6 xy_6


and I would like to change some rows that match this condition

if the strings after
_
are 4,5 and 6 replace the strings before the
_
with string
B
. So the output should like this;

clas
1 CD_1
2 X.2_2
3 K$2_3
4 12kB_4
5 .B_5
6 xB_6


Thanks!

EDIT::

SO If I have data like this:

clas
1 CD_1
2 X.2_2
3 K$2_3
4 12k3_4
5 .A_5
6 xy_11


Then applying your solution,

df %>% mutate(clas = str_replace(clas, "(.)(_[4511])", "B\\2"))

clas
1 CB_1
2 X.2_2
3 K$2_3
4 12kB_4
5 .B_5
6 xB_11


But I only want to match
11
not
1
. How can we do that ?

Answer Source
library(dplyr)
library(stringr)

clas <- c("CD_1","X.2_2","K$2_3","12k3_4",".A_5","xy_6")
df <- data.frame(clas)

df %>% mutate(clas = str_replace(clas, "(.)(_[456])", "B\\2"))

Here putting the matching pattern creates a match with 3 groups, the first containing the whole expression match ._[456], the second containing the . part and the third containing the _[456] part.

\\2 accesses the third group (0 indexing) and so you replace the whole pattern ._[456] with B followed by whatever matched _[456] where [456] is a character matching any of the options inside the brackets.

EDIT:

Each character inside of [] is treated individually, so [1111] is no different from [1] because that pattern only matches a single character that is either a 1 or 1 or 1 or 1. Instead you need to use | so you have (.)(_[45]|_11). This matches _4 or _5 or _11 in the second pattern group. Also if you want to match 1-9 but not 11 or 15 you need to use (.)(_[45])$ where $ is the end-of-string indicator. Go look at the cheatsheet and test these out on RegExr.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download