dnsko dnsko - 4 months ago 24
R Question

Getting average rating for certain metropolitan area

I have data about several metropolitan area's, with other data applied to them, one of the rows is the rating for that area. Only problem I'm having is NA values in a rows.

The data looks somewhat like this:

"ID", "Name", "Type", "Amount", "Rating", "Date"
1,"Location A", "SomeType", 8000, 9.2, "2015-04-10"
2,"Location B", "SomeType", 2300, 7.4, "2015-04-10"
3,"Location C", "SomeType", 5400, NA, "2015-04-10"
4,"Location A", "SomeType", 4300, 8.5, "2015-04-10"
5,"Location B", "SomeType", 8670, 6.9, "2015-04-10"
6,"Location A", "SomeType", 7600, NA, "2015-04-10"
7,"Location A", "SomeType", 3400, 8.2, "2015-04-10"
8,"Location B", "SomeType", 6500, NA, "2015-04-10"
9,"Location C", "SomeType", 7800, 9.2, "2015-04-10"

Ultimately I want to have it like this

Name Average Rating
Location A {average rating}
Location B {average rating}
Location C {average rating}

Obviously with the ratings for each location, but it keeps going NULL with the NA values. The data is directly read from a CSV. How would I approach this to get a average rating for each location excluding the NA values?

I have tried it with
, but it returns NULL now:

mean_ratings = ddply(data, .(Name), summarize, Rating=mean(Rating))

dt = data.table("Name"=c("Location A","Location B","Location C","Location A","Location B",
                     "Location A","Location A","Location B","Location C"), 
            "Rating"=c(9.2, 7.4, NA, 8.5,6.9,NA,8.2,NA,9.2))

> dt
         Name Rating
1: Location A    9.2
2: Location B    7.4
3: Location C     NA
4: Location A    8.5
5: Location B    6.9
6: Location A     NA
7: Location A    8.2
8: Location B     NA
9: Location C    9.2

dt[, mean(Rating, na.rm = T),by = "Name"]
        Name       V1
1: Location A 8.633333
2: Location B 7.150000
3: Location C 9.200000

the plyr solution :

ddply(dt, "Name", function(x) mean(x$Rating,na.rm = T))