Matthew Atherton - 2 years ago 109
R Question

# Cumulative count of names from two separate columns

I have a data set in chronological order which I have imported to R using:

``````mydata <- read.csv(file="test.csv",stringsAsFactors=FALSE)
``````

Two of the columns in the data set are 'winner' and loser'. Each row in the data is a tennis match.

What I am looking to do is to add two columns which give me a cumulative count of the total matches the player in the 'winner' column has played up to and including the match on that row. And the same count for the 'loser' in that row.

So for example it would look like this:

``````winner loser winner_matches loser_matches
tom    andy        1             1
andy   greg        2             1
greg   tom         2             2
``````

I hope that makes sense.

I have tried using the following code but can't get it to work across both columns:

``````ave(mydata\$winner_name==mydata\$winner_name, mydata\$winner_name, FUN=cumsum)
``````

So the data below is the first 10 rows of around 20,000.

Answer Source

You're really close to getting `ave` to work. The `cumsum` function doesn't know how to handle text so I created a dummy column that's equal to 1 for each row. That gives `cumsum` something to count.

Here's a sample dataframe.

``````mydata <-
data.frame(
winner = c("tom", "andy", "greg", "tom", "gary"),
loser = c("andy", "greg", "tom", "gary", "tom"),
stringsAsFactors = FALSE
)
``````

And here's the code to add the two new columns.

``````library(tidyverse)

mydata <- mutate(mydata, one = 1) # Add dummy column

# Use ave() to calculate both the wins and losses
mydata\$winner_matches <- ave(x = mydata\$one, mydata\$winner, FUN = cumsum)
mydata\$loser_matches  <- ave(x = mydata\$one, mydata\$loser, FUN = cumsum)

mydata <- select(mydata, -one) # Remove dummy column
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download