agenis - 2 months ago 8
R Question

# How to check equality between number and string converted to number (vectorized)

I want to find the index of the outlier spotted by the

`grubbs.test`
function of the
`outliers`

``````where = function(x) which(x==as.numeric(strsplit(grubbs.test(x)\$alternative," ")[[1]][3]))
``````

It works by retrieving the number in the text displayed by the grubbs result. It's kind of a hack but it works well, let's say, for round numbers:

``````df=c(0, 3, rnorm(10))
where(df) #[1] 2
``````

When it gets to decimal numbers, the text doesn't match all the times with the digits of the actual number:

``````df=c(0, sqrt(10), rnorm(10))
where(df) # integer(0)
``````

Someone has an idea to fix that problem? Or another way to find the index of the grubbs test biggest outlier? I'm trying to use this in a loop.

The problem is because `strsplit` returns stings instead of numbers. In your second example I get:

``````[1] "highest"          "value"            "3.16227766016838" "is"               "an"               "outlier"
``````

but the third element is not really the character version of the number `3.16227766016838`. In fact the real number returned from `grubbs.test` might have a lot more decimal places and this is why the `==` operator does not 'catch' it as an equality. This can be seen clearly here:

``````a<-sqrt(10)
> a == as.numeric(as.character(a))
[1] FALSE
``````

Is there a solution to this?

YES there is.

In order to tackle this problem just use the `almost.equal` function that I took the liberty to copy from this R-help post:

``````almost.equal <- function (x, y, tolerance=.Machine\$double.eps^0.5,
na.value=TRUE)
{
test <- !is.na(x)
answer[test] <- abs(x[test] - y) < tolerance
}
``````

The above function is a vectorized form of the `all.equal` function which checks for an 'approximate' equality so that it captures cases like yours.

``````where = function(x) {
which(almost.equal(x, as.numeric(strsplit(grubbs.test(x)\$alternative," ")[[1]][3])))
}
``````

And let's check it now:

``````> df=c(0, 3, rnorm(10))
> where(df)
[1] 2
``````

And:

``````> df=c(0, sqrt(10), rnorm(10))
> where(df)
[1] 2
``````

And you have a solution that works well with decimal numbers too!!