DeltaIV DeltaIV - 21 days ago 5
R Question

Associate each elements of a numeric vector to the "most similar" level of a factor vector

I have a numeric vector:

x <-c(-18.695, -18.695, 19.477, 0.000, 55.000, 19.477, -18.695, 48.476, 55.000, 37.798, -18.695, 19.477, 37.798, 0.000, -18.695)


and a factor vector, whose levels, as returned from the
levels
function, are:

y <- c("IV-18_7", "IV00", "IV00orig", "IV19_5", "IV37_8", "IV37_8_yp", "IV48_5", "IV48_5_yp", "IV55")


I need to build a new factor vector
z
, of the same length as
x
, but having the levels listed in
y
, and such that the i-th element of
z
,
z[i]
is the "most similar" element of
y
to the corresponding element of
x
,
x[i]
. In other words:

z <-factor(c("IV-18_7", "IV-18_7", "IV19_5", "IV00", "IV55", "IV19_5", "IV-18_7", "IV48_5", "IV55", "IV37_8", "IV-18_7", "IV19_5", "IV37_8", "IV00", "IV-18_7"), levels = y)


The example should make the meaning of "most similar" fairly obvious, anyway the idea is to take an element
x[i]
and then look for the element of
y
which is obtained by adding a "IV" prefix, then adding a string which is "similar" to the roundoff of
x[i]
(but not exactly equal, unfortunately), and finally without any suffix after the numeric part. I don't know how to code this efficiently in R, can you help me?

lmo lmo
Answer

Here is a one-liner that should get you pretty close.

paste0("IV", sub(".", "_", sub("\\.0$", "", sprintf("%04.1f", round(x, 1))), fixed=TRUE))

[1] "IV18_7"  "IV-18_7" "IV19_5"  "IV00"    "IV55"    "IV19_5"  "IV-18_7" "IV48_5"  "IV55" 
[10] "IV37_8"  "IV-18_7" "IV19_5"  "IV37_8"  "IV00"    "IV-18_7"

It works as follows. The original vector, x is rounded to the first significant digit. Then sprintf with the formatting "%04.1f" pads the result with a leading "0" if the number of characters is less than 4. This result is fed to sub which drops any instances of dots (periods) followed by "0". Finally, the outer sub replaces the dot with an underscore.