Matthew Atherton Matthew Atherton - 1 year ago 76
R Question

Cumulative count of names from two separate columns

I have a data set in chronological order which I have imported to R using:

mydata <- read.csv(file="test.csv",stringsAsFactors=FALSE)


Two of the columns in the data set are 'winner' and loser'. Each row in the data is a tennis match.

What I am looking to do is to add two columns which give me a cumulative count of the total matches the player in the 'winner' column has played up to and including the match on that row. And the same count for the 'loser' in that row.

So for example it would look like this:

winner loser winner_matches loser_matches
tom andy 1 1
andy greg 2 1
greg tom 2 2


I hope that makes sense.

I have tried using the following code but can't get it to work across both columns:

ave(mydata$winner_name==mydata$winner_name, mydata$winner_name, FUN=cumsum)


So the data below is the first 10 rows of around 20,000.

this is how mydata looks

Answer Source

You're really close to getting ave to work. The cumsum function doesn't know how to handle text so I created a dummy column that's equal to 1 for each row. That gives cumsum something to count.

Here's a sample dataframe.

mydata <-
  data.frame(
    winner = c("tom", "andy", "greg", "tom", "gary"),
    loser = c("andy", "greg", "tom", "gary", "tom"),
    stringsAsFactors = FALSE
  )

And here's the code to add the two new columns.

library(tidyverse)

mydata <- mutate(mydata, one = 1) # Add dummy column

# Use ave() to calculate both the wins and losses
mydata$winner_matches <- ave(x = mydata$one, mydata$winner, FUN = cumsum)
mydata$loser_matches  <- ave(x = mydata$one, mydata$loser, FUN = cumsum)

mydata <- select(mydata, -one) # Remove dummy column
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download