88 88 88 88 - 2 months ago 30
R Question

Is there any code using apply to optimize this?

Load library



library(engsoccerdata)
library(dplyr)
library(lubridate)


Extract Liverpool data from England league data



england$Date <- ymd(england$Date)
Liverpool.home <- england %>% filter(Date > '2001-08-01', home == 'Liverpool')
Liverpool.away <- england %>% filter(Date > '2001-08-01', visitor == 'Liverpool')


Make variable points



Liverpool.home$points = 0

for(i in 1:nrow(Liverpool.home)){

if(Liverpool.home[i,]$result == 'H'){
Liverpool.home[i,]$points = 3
}
else if(Liverpool.home[i,]$result == 'D'){
Liverpool.home[i,]$points = 1
}

}


I know How to use apply function is really bored and common question in stackoverflow, however I cannot solve this problem using apply function.
Is there any method? :)

Answer

dplyr

The function case_when ("a vectorised set of if and else ifs") from dplyr equivalent of the SQL CASE WHEN statement. We need to use .$ inside mutate.

library(dplyr)
Liverpool.home %>% 
  mutate(points = case_when(.$result == 'H' ~ 3,
                            .$result == 'D' ~ 1,
                            TRUE ~ 0))

sqldf

The CASE WHEN statement in SQL from sqldf:

library(sqldf)
df <- sqldf('SELECT result, 
                     CASE WHEN result = "H" THEN 3 
                          WHEN result = "D" THEN 1
                          ELSE 0
                     END AS points
             FROM [Liverpool.home]')
head(df)

Output:

  result points
1      A      0
2      A      0
3      H      3
4      D      1
5      H      3
6      H      3
Comments