alwaysaskingquestions alwaysaskingquestions - 9 months ago 38
R Question

Keep only numbers before the FIRST hyphen AND the hyphen itself

I am trying to get rid of all the numbers/characters coming in AFTER the FIRST hyphen.
here are some examples:


My desired output:


I've read through posts like these:

  1. Using gsub to extract character string before white space in R

  2. truncate string from a certain character in R

  3. Truncating the end of a string in R after a character that can be present zero or more times

But they are not what I'm looking for as the methods mentioned in those will get rid of my hyphen as well (leaving me only the first 2 or 3 numbers).

Here's what I've tried so far:

gsub(pattern = '[0-9]*-$', replacement = "", x = data$id)
grep(pattern = '[0-9]*-', replacement = "", x = data$id)
regexpr(pattern = '[0-9]*-', text = data$id)

but not really working as I expected.


Several ways to achieve this, here is one:

have <- c("15-103025-01", "800-40170-02", "68-4974-01")
want <- sub(pattern = "(^\\d+\\-).*", replacement = "\\1", x = have)

So in your regular expression, you'll have one group created with ()'s, which matches the start of the string (^) followed by one or more numbers (\\d+) and the hyphen (\\-). Outside the group is any other character(s) that follow (.*).

In the replacement part, you specify \\1 to refer to the first (and only) group of the regular expression. Not adding anything else means dropping all the rest.