Sid - 7 months ago 44
R Question

I am trying to solve an assignment on coursera..its called best hospital

I was trying to solve an assignment on courser....It is to find the best hospital in USA.
I made a small dataset on the basis of the original dataset..
when I the function on small dataset it gives correct answer. but when I run the function on bigger dataset, for some values I am getting result but for other I am getting the following error:

[1]
Levels:
Warning message:
In best("TX", "heart attack") : NAs introduced by coercion

Here is my code:

`````` ##THE best hospital problem
best <- function(state, outcome) {
setwd("C:/Users/Praveen/Documents/R/COURSERA/R-programming/Week 3/Programming_Assignment")

##vector of unique states in the data set
statevector <- unique(x\$State)
## vector of outcomes
outcomevector <- c("heart attack" , "heart failure" , "pneumonia")
## checking validity of arguments
if(!(state %in% statevector)){
stop("Invalid State")
} else if(!(outcome %in% outcomevector)){
stop("Invalid Outcome")
} else {
message("OK")  }
## Sub setting the data and getting relevant data set
X <- subset(x, x\$State== state)
## if outcome is "heart attack", then calculate minimum value in 11th column in ##the data subset; Again sub setting  the data on the basis of minimum value in ##11th column
if(outcome == outcomevector[1]){
y <- as.numeric(as.character(X[,11]))
z <- min(y, na.rm=TRUE)
z
subsetx <- subset(X, X[,11]==z)
## if outcome is "heart failure", then calculate minimum value in 17th column ##in the data subset; Again sub setting  the data on the basis of minimum value ##in 17th column
} else if (outcome == outcomevector[2]){
y <- as.numeric(as.character(X[,17]))
z <- min(y, na.rm=TRUE)
z
subsetx <- subset(X, X[,17]==z)
## if outcome is "heart attack", then calculate minimum value in 23rd column in ##the data subset; Again sub setting  the data on the basis of minimum value in ##23rd column
} else {
y <- as.numeric(as.character(X[,23]))
z <- min(y, na.rm=TRUE)
z
subsetx <- subset(X, X[,23]==z)
##if there are two or more equal minimum values, then sort alphabetically and ##select the hospital which comes first alphabetivcally
FFA <- FA[1]
FFANS <- droplevels(FFA)
FFANS
}
``````

There are multiple issues with the code, but the direct problem is that you are being bitten by the factor bug. Compare these values:

``````class(z)
#[1] "numeric"

class(X[,11])
#[1] "factor"
``````

So when you run this command, `subsetx <- subset(X, X[,11]==z)`, you will not get a match even though one exists. Try this instead:

``````subset(X, as.numeric(X[,11])==z)
``````

The vector was wrapped in the function `as.numeric` to give this output.

``````best("TX", "heart attack")
#OK
#[1] CYPRESS FAIRBANKS MEDICAL CENTER
#Levels: CYPRESS FAIRBANKS MEDICAL CENTER
#Warning message:
#In best("TX", "heart attack") : NAs introduced by coercion
``````

You will still get a warning because you did not eliminate factors from the beginning. It's difficult to tell where to start fixing the approach, but it may get you through the assignment.

Update

You can start on the right foot by adding two arguments to `read.csv`, we will set `stringsAsFactors` to `FALSE` so strings remain as characters. And `na.strings` to `Not Available`. This tells R what to look for in the file to determine missing values.

``````x <- read.csv(file , header =TRUE, stringsAsFactors=F, na.strings="Not Available")
``````

With this corrective step added, you can now take out all of the `as.numeric` and `as.character` parts. Look what I did with the heart attack section:

``````if("heart attack" == outcomevector[1]){
y <- X[,11]
z <- min(y, na.rm=TRUE)

subsetx <- subset(X, X[,11] %in% z)
``````

Now `y` can just take the value of `X[,11]` directly. And `subsetx` doesn't need any special treatment either.

Towards the bottom, you can now take out the last lines that drop the factor levels. I changed the ending to:

``````FA <- answer[with(answer, order(Hospital.Name)), ]
FA[1]
``````

Now when the code is run it works without warnings:

``````best("TX", "heart attack")
#OK
#[1] "CYPRESS FAIRBANKS MEDICAL CENTER"
``````

Update 2

Here is a shortened code that will work:

``````best2 <- function(state, outcome) {
setwd("C:/Users/Praveen/Documents/R/COURSERA/R-programming/Week 3/Programming_Assignment")
x <- read.csv("outcome-of-care-measures.csv" , header =TRUE, stringsAsFactors=F)
outcomevector <- c("heart attack" , "heart failure" , "pneumonia")
if(!(state %in% unique(x\$State))) stop("Invalid State")
if(!(outcome %in% outcomevector)) stop("Invalid Outcome")

X <- x[x\$State== state,]
names(X)[c(11, 17, 23)] <- outcomevector
answer <- X[X[,outcome] == min(X[,outcome]), ][2]