Vikash B Vikash B - 3 months ago 8
R Question

R summarizing by group of two variables

Consider a simplified dataset (the real one has more columns and rows):

df
tp tf weight
1 FWD RF 78.86166
2 MF LF 81.04566
3 DEF LF 80.70527
4 DEF LF 82.96071
5 DEF RF 78.42544
6 GK LF 79.37686
7 DEF RF 78.79928
8 MF RF NA
9 MF RF 78.93815
10 DEF RF 80.00284


I want to fill the missing values in weight by the grouped median of tp and tf combined

What i have tried till now is the following (I have used dlpyr)

temp <- df %>% group_by(tp,tf) %>% summarise(mvalue = median(weight,na. rm = TRUE))


This gives temp to be :

temp
Source: local data frame [6 x 3]
Groups: tp [?]

tp tf mvalue
<fctr> <fctr> <dbl>
1 DEF LF 81.83299
2 DEF RF 78.79928
3 FWD RF 78.86166
4 GK LF 79.37686
5 MF LF 81.04566
6 MF RF 78.93815


Now i am unable to figure out how to fill the missing values in df with the corresponding group median.

In my simple case there is only one NA corresponding to tp = MF and tf = RF,
the median value if you look up at temp is 78.93815

How do i do this in general? Do suggest if you have a better approach than my initial one.

EDIT:
The actual dataframe has a unique Id variable if that makes any difference or can help.

Answer

You can try,

library(dplyr)
df %>% 
   group_by(tp, tf) %>% 
   mutate(weight = replace(weight, is.na(weight), median(weight, na.rm = TRUE)))
Comments