hans glick hans glick - 1 year ago 62
R Question

Map substring without the help of libraries and without using regex

I got a bunch of strings like

string_example="hi 5eme elephant 4eme dark I am"

I want to map those values


to those ones


Here is a solution that works great, but I found it a little tedious and I wonder if there is an easier way to do it without the help from other libraries (for some reasons, I need only to use RBase except the fastmatch library). I do not want to use regex as well because I got dozens of millions strings and it pretty slow.

library (fastmatch)
string_example="hi 5eme elephant 4eme dark I am"
string_example=str_split(string_example,pattern = " ")[[1]]


Answer Source

I don't grok the need for fastmatch:

`%||%` <- function (x, y) { if (is.na(x)) { y } else { x } }

string_example <- "hi 5eme elephant 4eme dark I am"
values_to_map <- c("2eme","3eme","4eme","5eme")
new_values <- c("2e","3e","4e","5e")
new_values <- setNames(new_values, values_to_map)

spl <- strsplit(string_example, " ")[[1]] 
spl <- unname(sapply(spl, function(x) {
  new_values[x] %||% x
spl <- paste0(spl, collapse=" ")
## [1] "hi 5e elephant 4e dark I am"

This is pretty fragile and makes quite a number of assumptions, mostly due to a very vague question with inane requirements. If it's homework, point your instructor here so they can see just how poor their instructing is.


vapply(string_example, function(spl) {
  spl <- strsplit(spl, " ")[[1]] 
  spl <- unname(vapply(spl, function(x) {
    new_values[x] %||% x
  }, character(1))) 
  paste0(spl, collapse=" ")
}, character(1), USE.NAMES=FALSE)

will be marginally faster and work over a character vector.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download