mina - 8 months ago 49

R Question

I am trying to detect outliers in my dataframe and replace the outliers by NAs.

I have slighty modified the function provided in here: How to repeat the Grubbs test and flag the outliers. When trying the function for a vector it works great, but my problem is when I use it on a dataframe. The function detects outliers but I do not know how to get the results as dataframe.

What I want as a result is my original dataframe replaced by

`NA`

`NA`

This is what I have tried until now:

`library(outliers)`

data("rock")

# Function to detect outliers with Grubbs test in a vector

grubbs.flag <- function(vector) {

outliers <- NULL

test <- vector

grubbs.result <- grubbs.test(test)

pv <- grubbs.result$p.value

# throw an error if there are too few values for the Grubb's test

if (length(test) < 3 ) stop("Grubb's test requires > 2 input values")

while(pv < 0.05) {

outliers <- c(outliers,as.numeric(strsplit(grubbs.result$alternative," ")[[1]][3]))

test <- vector[!vector %in% outliers]

# stop if all but two values are flagged as outliers

if (length(test) < 3 ) {

warning("All but two values flagged as outliers")

break

}

grubbs.result <- grubbs.test(test)

pv <- grubbs.result$p.value

idx.outlier <- which(vector %in% outliers)

na.vect <- replace(vector, idx.outlier, NA)

}

return(na.vect)

}

# Function to detect outliers with Grubbs test in a dataframe

Grubbs.df <- function(data){

grubbs.data <- (as.vector(unlist(apply(data, grubbs.flag))))

return(grubbs.data)

}

Any idea how to make this work?

Answer

You should add this before the while loop:

```
na.vect <- test
```

Because if it breaks beforehand, your na.vect won't exist and will thus throw an error. And then just run it on your dataframe like this:

```
apply(rock,2,grubbs.flag)
```

The second argument 2 tells to apply it to the columns of the dataframe. Use 1 for rows.

Source (Stackoverflow)