Sid Sid - 11 days ago 6
R Question

I am trying to solve an assignment on coursera..its called best hospital

I was trying to solve an assignment on courser....It is to find the best hospital in USA.
I made a small dataset on the basis of the original dataset..
when I the function on small dataset it gives correct answer. but when I run the function on bigger dataset, for some values I am getting result but for other I am getting the following error:

[1]
Levels:
Warning message:
In best("TX", "heart attack") : NAs introduced by coercion

Here is my code:

##THE best hospital problem
best <- function(state, outcome) {
setwd("C:/Users/Praveen/Documents/R/COURSERA/R-programming/Week 3/Programming_Assignment")

###Reading the dataset
x <- read.csv("outcome-of-care-measures.csv" , header =TRUE)
##vector of unique states in the data set
statevector <- unique(x$State)
## vector of outcomes
outcomevector <- c("heart attack" , "heart failure" , "pneumonia")
## checking validity of arguments
if(!(state %in% statevector)){
stop("Invalid State")
} else if(!(outcome %in% outcomevector)){
stop("Invalid Outcome")
} else {
message("OK") }
## Sub setting the data and getting relevant data set
X <- subset(x, x$State== state)
## if outcome is "heart attack", then calculate minimum value in 11th column in ##the data subset; Again sub setting the data on the basis of minimum value in ##11th column
if(outcome == outcomevector[1]){
y <- as.numeric(as.character(X[,11]))
z <- min(y, na.rm=TRUE)
z
subsetx <- subset(X, X[,11]==z)
answer <- subsetx[2]
answer
## if outcome is "heart failure", then calculate minimum value in 17th column ##in the data subset; Again sub setting the data on the basis of minimum value ##in 17th column
} else if (outcome == outcomevector[2]){
y <- as.numeric(as.character(X[,17]))
z <- min(y, na.rm=TRUE)
z
subsetx <- subset(X, X[,17]==z)
answer <- subsetx[2]
answer
## if outcome is "heart attack", then calculate minimum value in 23rd column in ##the data subset; Again sub setting the data on the basis of minimum value in ##23rd column
} else {
y <- as.numeric(as.character(X[,23]))
z <- min(y, na.rm=TRUE)
z
subsetx <- subset(X, X[,23]==z)
answer <- subsetx[2]
answer}
##if there are two or more equal minimum values, then sort alphabetically and ##select the hospital which comes first alphabetivcally
FA <- answer[with(answer, order(Hospital.Name)), ]
FFA <- FA[1]
FFANS <- droplevels(FFA)
FFANS
}

Answer

There are multiple issues with the code, but the direct problem is that you are being bitten by the factor bug. Compare these values:

class(z)
#[1] "numeric"

class(X[,11])
#[1] "factor"

So when you run this command, subsetx <- subset(X, X[,11]==z), you will not get a match even though one exists. Try this instead:

subset(X, as.numeric(X[,11])==z)

The vector was wrapped in the function as.numeric to give this output.

best("TX", "heart attack")
#OK
#[1] CYPRESS FAIRBANKS MEDICAL CENTER
#Levels: CYPRESS FAIRBANKS MEDICAL CENTER
#Warning message:
#In best("TX", "heart attack") : NAs introduced by coercion

You will still get a warning because you did not eliminate factors from the beginning. It's difficult to tell where to start fixing the approach, but it may get you through the assignment.

Update

You can start on the right foot by adding two arguments to read.csv, we will set stringsAsFactors to FALSE so strings remain as characters. And na.strings to Not Available. This tells R what to look for in the file to determine missing values.

x <- read.csv(file , header =TRUE, stringsAsFactors=F, na.strings="Not Available")

With this corrective step added, you can now take out all of the as.numeric and as.character parts. Look what I did with the heart attack section:

if("heart attack" == outcomevector[1]){
      y <- X[,11]
      z <- min(y, na.rm=TRUE)

    subsetx <- subset(X, X[,11] %in% z)
    answer <- subsetx[2]
    answer

Now y can just take the value of X[,11] directly. And subsetx doesn't need any special treatment either.

Towards the bottom, you can now take out the last lines that drop the factor levels. I changed the ending to:

FA <- answer[with(answer, order(Hospital.Name)), ]
FA[1]

Now when the code is run it works without warnings:

best("TX", "heart attack")
#OK
#[1] "CYPRESS FAIRBANKS MEDICAL CENTER"

Update 2

Here is a shortened code that will work:

best2 <- function(state, outcome) {
      setwd("C:/Users/Praveen/Documents/R/COURSERA/R-programming/Week 3/Programming_Assignment")    
      x <- read.csv("outcome-of-care-measures.csv" , header =TRUE, stringsAsFactors=F)          
      outcomevector <- c("heart attack" , "heart failure" , "pneumonia")         
      if(!(state %in% unique(x$State))) stop("Invalid State")
      if(!(outcome %in% outcomevector)) stop("Invalid Outcome")

      X <- x[x$State== state,]
      names(X)[c(11, 17, 23)] <- outcomevector
      answer <- X[X[,outcome] == min(X[,outcome]), ][2]    
      FA <- answer[with(answer, order(Hospital.Name)), ]
      FA[1]   
    }