hans glick hans glick - 1 month ago 12
R Question

Map substring without the help of libraries and without using regex

I got a bunch of strings like

string_example="hi 5eme elephant 4eme dark I am"


I want to map those values

values_to_map=c("2eme","3eme","4eme","5eme")


to those ones

new_values=c("2e","3e","4e","5e")


Here is a solution that works great, but I found it a little tedious and I wonder if there is an easier way to do it without the help from other libraries (for some reasons, I need only to use RBase except the fastmatch library). I do not want to use regex as well because I got dozens of millions strings and it pretty slow.

library (fastmatch)
string_example="hi 5eme elephant 4eme dark I am"
string_example=str_split(string_example,pattern = " ")[[1]]
to_change=fmatch(string_example,values_to_map)

index=which(!is.na(to_change))
values=new_values[to_change[!is.na(to_change)]]
string_example[index]=values

Answer

I don't grok the need for fastmatch:

`%||%` <- function (x, y) { if (is.na(x)) { y } else { x } }

string_example <- "hi 5eme elephant 4eme dark I am"
values_to_map <- c("2eme","3eme","4eme","5eme")
new_values <- c("2e","3e","4e","5e")
new_values <- setNames(new_values, values_to_map)

spl <- strsplit(string_example, " ")[[1]] 
spl <- unname(sapply(spl, function(x) {
  new_values[x] %||% x
})) 
spl <- paste0(spl, collapse=" ")
spl
## [1] "hi 5e elephant 4e dark I am"

This is pretty fragile and makes quite a number of assumptions, mostly due to a very vague question with inane requirements. If it's homework, point your instructor here so they can see just how poor their instructing is.

This:

vapply(string_example, function(spl) {
  spl <- strsplit(spl, " ")[[1]] 
  spl <- unname(vapply(spl, function(x) {
    new_values[x] %||% x
  }, character(1))) 
  paste0(spl, collapse=" ")
}, character(1), USE.NAMES=FALSE)

will be marginally faster and work over a character vector.