Nitin - 7 months ago 42

R Question

I have a vector a as follows:

`a <- c("Rs. 360 Rs. 540 [-33% ]", "Rs. 213 Rs. 250 [-15% ]", "Rs. 430 Rs. 1030 [-58% ]")`

Need answer as below:

a should have

`Rs.360, Rs.213, Rs.430`

I have used:

`a <- gsub(" Rs*", "", a)`

Answer

You may use a regex with capturing groups that will grab the parts you need and using backreferences in the replacement pattern you may insert them back into the result:

```
sub("^\\s*(Rs\\.)\\s*(\\d+).*", "\\1\\2", a)
```

See the regex demo

The regex matches:

`^`

- start of string`\\s*`

- zero or more whitespaces`(Rs\\.)`

- Group 1 capturing`Rs.`

sequence`\\s*`

- 0+ whitespaces`(\\d+)`

- Group 2 caprturing 1 or more digits`.*`

- the rest of the string to its end

Tested code:

```
> a <- c("Rs. 360 Rs. 540 [-33% ]", "Rs. 213 Rs. 250 [-15% ]", "Rs. 430 Rs. 1030 [-58% ]")
> sub("^\\s*(Rs\\.)\\s*(\\d+).*", "\\1\\2", a)
[1] "Rs.360" "Rs.213" "Rs.430"
```

**Update**

For an input like `a <- c(" 360 540", " 213 250")`

, use `sub("^\\D*(\\d+).*", "\\1", a)`

.

```
> a <- c(" 360 540", " 213 250")
> sub("^\\D*(\\d+).*", "\\1", a)
[1] "360" "213"
```

The `^\\D*(\\d+).*`

matches any amount of non-digit symbols at the start of the string, then captures 1+ digits into Group 1, and then `.*`

matches the rest of the string.