Viitama Viitama - 3 months ago 21
R Question

Football form table with R

I would like to create last 5 (x) games form table in R from football results. Here is example where I'm currently at.

Load data

#Last seasons premier league results
S1516 <- read.csv("http://www.football-data.co.uk/mmz4281/1516/E0.csv")

#transform Date to date value
S1516$Date <- as.Date(S1516$Date, format="%d/%m/%y")


Here I have created function, to create league table for certain date, using these two posts as example. http://opisthokonta.net/?p=18 and https://learnr.wordpress.com/2009/03/19/data-manipulation-football-league-table/

Here is example, what my current
league.table
function will give. All the 19 games played until "2015-31-12" are calculated, but I would like to get only last 5 games for each team. Any help on modifying current function is apreciated.

> head(league.table(S1516, "2015-12-31"), 10)
PLD HW HD HL AW AD AL GF GA GD PTS
Arsenal 19 6 2 1 6 1 3 33 18 15 39
Leicester 19 5 3 1 6 3 1 37 25 12 39
Man City 19 8 0 2 3 3 3 37 20 17 36
Tottenham 19 5 4 1 4 4 1 33 15 18 35
Crystal Palace 19 4 2 4 5 2 2 23 16 7 31
Man United 19 4 4 1 4 2 4 22 16 6 30
Liverpool 19 4 3 2 4 3 3 22 22 0 30
West Ham 19 3 4 2 4 4 2 28 23 5 29
Watford 19 4 2 4 4 3 2 24 20 4 29
Stoke 19 4 1 4 4 4 2 20 19 1 29


My current league table function:

league.table <- function(df, fdate="2100-01-01"){

df <- subset(df, Date <= fdate)

#points awarded for a match outcome
winPts <- 3
drawPts <- 1
loosePts <- 0

numMatches <- length(df$HomeTeam)

teams <- levels(factor(c(as.character(df$HomeTeam), as.character(df$AwayTeam))))
numTeams <- length(teams)

#vector with outcome of a match (H, D or A)
home <- df$FTHG > df$FTAG
away <- df$FTAG > df$FTHG
draws <- df$FTHG == df$FTAG

results <- character(length(df$FTHG))
results[draws] <- "D"
results[home] <- "H"
results[away] <- "A"

#for output
homeWins <- numeric(numTeams)
homeDraws <- numeric(numTeams)
homeLoss <- numeric(numTeams)
awayWins <- numeric(numTeams)
awayDraws <- numeric(numTeams)
awayLoss <- numeric(numTeams)
goalsFor <- numeric(numTeams)
goalsAgainst <- numeric(numTeams)
goalsDifference <- numeric(numTeams)
playedMatches <- numeric(numTeams)
pts <- numeric(numTeams)

for (t in 1:numTeams) {
#mathc results for a given team
homeResults <- results[df$HomeTeam == teams[t]]
awayResults <- results[df$AwayTeam == teams[t]]

playedMatches[t] <- length(homeResults) + length(awayResults)

goalsForH <- sum(df$FTHG[df$HomeTeam == teams[t]])
goalsForA <- sum(df$FTAG[df$AwayTeam == teams[t]])
goalsFor[t] <- goalsForA + goalsForH
goalsAgainstH <- sum(df$FTAG[df$HomeTeam == teams[t]])
goalsAgainstA <- sum(df$FTHG[df$AwayTeam == teams[t]])
goalsAgainst[t] <- goalsAgainstA + goalsAgainstH
goalsDifference[t] <- goalsFor[t] - goalsAgainst[t]

homeWins[t] <- sum(homeResults == "H")
homeDraws[t] <- sum(homeResults == "D")
homeLoss[t] <- sum(homeResults == "A")
awayWins[t] <- sum(awayResults == "A")
awayDraws[t] <- sum(awayResults == "D")
awayLoss[t] <- sum(awayResults == "H")

totWins <- homeWins[t] + awayWins[t]
totDraws <- homeDraws[t] + awayDraws[t]
totLoss <- homeLoss[t] + awayLoss[t]

pts[t] <- (winPts * totWins) + (drawPts * totDraws) + (loosePts * totLoss)

}

table <- data.frame(cbind(playedMatches, homeWins, homeDraws,
homeLoss, awayWins, awayDraws, awayLoss,
goalsFor, goalsAgainst, goalsDifference, pts),
row.names=teams)


names(table) <- c("PLD", "HW", "HD", "HL", "AW", "AD", "AL", "GF", "GA", "GD", "PTS")
ord <- order(-table$PTS, -table$GD, -table$GF)
table <- table[ord, ]

return(table)

}

Answer

Give this a try.

I made one change to your code:

S1516 <- read.csv("http://www.football-data.co.uk/mmz4281/1516/E0.csv", stringsAsFactors = TRUE)

The following function will build a new data set based on the last date and the last number of games. It can probably be made more efficient but it solves the problem.

leagueInfo <- function(data, lastDate, lastNumberOfGames){
    # Create a list of teams
    teams <- sort(unique(data$HomeTeam))

    # Find the last N dates that each team played.
    bFirst <- TRUE
    for (team in teams)
    {
        tempSubset <- subset(data, data$HomeTeam == team | data$AwayTeam == team)

        theDates <- sort(unique(tempSubset$Date))
        lastNDates <- tail(subset(theDates, theDates<=lastDate), n=lastNumberOfGames)

        # Data for this team for the last N dates.
        tempSubset2 <- subset(tempSubset, tempSubset$Date %in% lastNDates)

        # Row bind each subset to a new data set.
        if (bFirst)
        {
            newData <- tempSubset2
            bFirst <- FALSE
        }
        else
        {
            newData <- rbind(tempSubset2, newData)
        }
    }
    # Remove duplicates
    newData<-unique(newData)
    return(newData)
}

Run the following:

# Last 5 games played until 12/31/2015
lastDate <-"2015-12-31"
lastNumberOfGames <- 5
newData <- leagueInfo(S1516, lastDate, lastNumberOfGames)
league.table(newData, lastDate )

Results:

               PLD HW HD HL AW AD AL GF GA GD PTS
Arsenal          5  3  0  0  1  0  1  9  6  3  12
Tottenham        5  1  0  1  2  1  0  9  4  5  10
Watford          5  2  0  1  1  1  0  9  4  5  10
Stoke            5  2  0  1  1  1  0  9  5  4  10
Leicester        5  1  1  0  2  0  1  8  4  4  10
Bournemouth      5  1  1  0  2  0  1  5  4  1  10
Crystal Palace   5  1  1  0  1  2  0  4  2  2   9
Man City         5  2  0  0  0  1  2  7  6  1   7
Newcastle        5  1  1  1  1  0  1  5  4  1   7
West Ham         5  1  1  0  0  3  0  3  2  1   7
Norwich          5  1  1  0  1  0  2  5  7 -2   7
Liverpool        5  1  1  0  1  0  2  4  7 -3   7
Chelsea          5  1  1  1  0  1  1  6  6  0   5
Everton          5  0  1  2  1  1  0  8  9 -1   5
West Brom        5  1  1  1  0  1  1  5  6 -1   5
Swansea          5  1  1  1  0  1  1  2  5 -3   5
Southampton      5  1  1  1  0  0  2  6  6  0   4
Aston Villa      5  0  1  1  0  2  1  3  7 -4   3
Man United       5  0  2  1  0  0  2  2  6 -4   2
Sunderland       5  0  0  2  0  0  3  3 12 -9   0
Comments