user3084100 user3084100 - 1 month ago 7
R Question

Regular Expression Matching Issue (using R)

I have string vectors with a product name followed 6 numbers corresponding to 6 different columns of data.

I want to be able to extract the product name or the 6 final numbers (which can then be split as they are always separated by a whitespace). The final 6 numbers always have a dot in the middle.

I have tested a regex expression (below) which can extract 6 numbers in succession from the string, however, it fails when the product name also has a number in it (e.g. example 1 below has "£2.00" in its name).

I need to change this so it can begin the search at the end of the string so only the last 6 numbers are selected.

Current Regular Expression:

([\d]+[.][\d]+[\s]?){6}


Example strings:

[1] "white coffee tall £2.00 6.00 6.00 6.00 0.00 1.00 5.00"
[2] "product group:0 Total 6.00 6.00 6.00 0.00 1.00 5.00"
[3] "£1.45 CAKE 12.00 17.40 17.40 0.00 2.90 14.50"
[4] "95P CAKE £5.00 32.00 30.40 30.40 0.00 5.07 25.33"
[5] "Complementary hot beverage 11.00 0.00 0.00 0.00 0.00 0.00"
[6] "Flat white Large 5.00 5.00 5.00 0.00 0.83 4.17"
[7] "Flat white Small 8.00 5.20 5.20 0.00 0.87 4.33"
[8] "Go ahead Bar 5.00 4.25 4.25 0.00 0.71 3.54"
[9] "Graze Box 2.00 3.20 3.20 0.00 0.00 3.20"
[10] "Joe & Seph popcorn 2.00 1.90 1.90 0.00 0.32 1.58"
[11] "Kit kat 4 finger 6.00 3.00 3.00 0.00 0.50 2.50"


String to extract (first two examples):

[1] "6.00 6.00 6.00 0.00 1.00 5.00"
[2] "6.00 6.00 6.00 0.00 1.00 5.00"

Answer

1) It seems you want the last 6 fields of the input character vector s pasted together:

sapply(strsplit(s, " "), function(x) paste(tail(x, 6), collapse = " "))

2) Also note that a minor tweak to the regular expression in the question works.

sub(".*(([\\d]+[.][\\d]+[\\s]?){6})$", "\\1", s, perl = TRUE)

Note: The input s in reproducible form is:

s <- c("white coffee tall £2.00 6.00 6.00 6.00 0.00 1.00 5.00",
"product group:0 Total 6.00 6.00 6.00 0.00 1.00 5.00",                 
"£1.45 CAKE 12.00 17.40 17.40 0.00 2.90 14.50",                             
"95P CAKE £5.00 32.00 30.40 30.40 0.00 5.07 25.33",                               
"Complementary hot beverage 11.00 0.00 0.00 0.00 0.00 0.00",
"Flat white Large 5.00 5.00 5.00 0.00 0.83 4.17",                 
"Flat white Small 8.00 5.20 5.20 0.00 0.87 4.33",                          
"Go ahead Bar 5.00 4.25 4.25 0.00 0.71 3.54",                          
"Graze Box 2.00 3.20 3.20 0.00 0.00 3.20",                              
"Joe & Seph popcorn 2.00 1.90 1.90 0.00 0.32 1.58",
"Kit kat 4 finger 6.00 3.00 3.00 0.00 0.50 2.50")