DeltaIV DeltaIV - 11 months ago 39
R Question

Associate each elements of a numeric vector to the "most similar" level of a factor vector

I have a numeric vector:

x <-c(-18.695, -18.695, 19.477, 0.000, 55.000, 19.477, -18.695, 48.476, 55.000, 37.798, -18.695, 19.477, 37.798, 0.000, -18.695)

and a factor vector, whose levels, as returned from the
function, are:

y <- c("IV-18_7", "IV00", "IV00orig", "IV19_5", "IV37_8", "IV37_8_yp", "IV48_5", "IV48_5_yp", "IV55")

I need to build a new factor vector
, of the same length as
, but having the levels listed in
, and such that the i-th element of
is the "most similar" element of
to the corresponding element of
. In other words:

z <-factor(c("IV-18_7", "IV-18_7", "IV19_5", "IV00", "IV55", "IV19_5", "IV-18_7", "IV48_5", "IV55", "IV37_8", "IV-18_7", "IV19_5", "IV37_8", "IV00", "IV-18_7"), levels = y)

The example should make the meaning of "most similar" fairly obvious, anyway the idea is to take an element
and then look for the element of
which is obtained by adding a "IV" prefix, then adding a string which is "similar" to the roundoff of
(but not exactly equal, unfortunately), and finally without any suffix after the numeric part. I don't know how to code this efficiently in R, can you help me?

lmo lmo
Answer Source

Here is a one-liner that should get you pretty close.

paste0("IV", sub(".", "_", sub("\\.0$", "", sprintf("%04.1f", round(x, 1))), fixed=TRUE))

[1] "IV18_7"  "IV-18_7" "IV19_5"  "IV00"    "IV55"    "IV19_5"  "IV-18_7" "IV48_5"  "IV55" 
[10] "IV37_8"  "IV-18_7" "IV19_5"  "IV37_8"  "IV00"    "IV-18_7"

It works as follows. The original vector, x is rounded to the first significant digit. Then sprintf with the formatting "%04.1f" pads the result with a leading "0" if the number of characters is less than 4. This result is fed to sub which drops any instances of dots (periods) followed by "0". Finally, the outer sub replaces the dot with an underscore.