user6340762 user6340762 - 1 month ago 10
R Question

Use R to extract numeric values from string in a column

I have a table with the following column structure :

Name Type
Urgent Care (Revenue Code: 0456) Per Case
IV Therapy (Revenue Codes 0260, 0269) Per Visit
Oncology Treatment (Revenue Codes: 0280, 0289) Per Visit


and I want to extract the numeric revenue codes from the names column so the table would look like :

Name Rev Code Type
Urgent Care 0456 Per Case
IV Therapy 0260, 0269 Per Visit
Oncology Treatment 0280, 0289 Per Visit


the original data is inconsistent in the Name column as the word "Code" is followed by a ";" , blank space, "-" etc. So I was trying to use regular expression to search for the first number and then split the column there.

I have tried using regex to search for the first digit and separate() from the tidyr package :

library(tidyr)
separate(mydata, Name, into = c("Name", "Rev Code"), sep = "[[:digit:]]")


which splits the column at the right location but the column "Rev Code" ends up blank?
I'm relatively new to R and would definitely appreciate any help!

Data:



structure(list(
Name = c("Urgent Care (Revenue Code: 0456)", "IV Therapy (Revenue Codes 0260, 0269)",
"Oncology Treatment (Revenue Codes: 0280, 0289)"),
Type = c("Per Case", "Per Visit", "Per Visit")),
.Names = c("Name", "Type"), row.names = 1:3, class = "data.frame")

Answer

We can try with extract

library(tidyr)
extract(df1, Name, into = c("Name", "RevCode"), "([^(]+)\\s*[^0-9]+([0-9].*).")

#               Name    RevCode      Type
#1        Urgent Care       0456  Per Case
#2         IV Therapy 0260, 0269 Per Visit
#3 Oncology Treatment 0280, 0289 Per Visit

As the OP commented that other patterns exist,

extract(df2, Name, into = c("Name", "RevCode"), "([^(]+)\\s*[^0-9]+([0-9].*).")
#                 Name         RevCode      Type
#1        Urgent Care             0456  Per Case
#2         IV Therapy       0260, 0269 Per Visit
#3 Oncology Treatment       0280, 0289 Per Visit
#4     Speech Therapy  0440-0444, 0449 Per Visit

data

df2 <- structure(list(Name = c("Urgent Care (Revenue Code: 0456)", 
 "IV Therapy (Revenue Codes 0260, 0269)", 
"Oncology Treatment (Revenue Codes: 0280, 0289)", 
"Speech Therapy (Revenue Codes: 0440-0444, 0449)"
), Type = c("Per Case", "Per Visit", "Per Visit", "Per Visit"
)), .Names = c("Name", "Type"), class = "data.frame", row.names = c(NA, 
-4L))
Comments