Ali Zia Ali Zia - 29 days ago 11
R Question

Widening a dataframe to get monthly sums of revenue for all unique values of catogorical columns in R

I have a df which has data like this:

sub = c("X001","X002", "X001","X003","X002","X001","X001","X003","X002","X003","X003","X002")
month = c("201506", "201507", "201506","201507","201507","201508", "201508","201507","201508","201508", "201508", "201508")
tech = c("mobile", "tablet", "PC","mobile","mobile","tablet", "PC","tablet","PC","PC", "mobile", "tablet")
brand = c("apple", "samsung", "dell","apple","samsung","apple", "samsung","dell","samsung","dell", "dell", "dell")

revenue = c(20, 15, 10,25,20,20, 17,9,14,12, 9, 11)

df = data.frame(sub, month, brand, tech, revenue)


I want to use sub and month as key and get one row for every subscriber per month which displays the sum of revenues for the unique values in tech and brand for that subscriber for that month. This example is simplified and with less columns as I have a huge data set I decided to try doing it with
data.table
.

I have managed to do this for one catagorical column, either tech or brand using this:

df1 <- dcast(df, sub + month ~ tech, fun=sum, value.var = "revenue")


but I want to do it for two or more caqtogorical columns, so far I've tried this:

df2 <- dcast(df, sub + month ~ tech+brand, fun=sum, value.var = "revenue")


and it just concatenates the unique values of both catogorical columns and sums for that but I do not want that. I wan seperate columns for each unique value of all catogorical columns.

I'm new to R and would really appreciate any help.

Answer

(I will assume that df is a data.table rather a data.frame like in your example.)

One possible solution for this is to first melt the data while keeping sub, month and revenue as keys. This way, brand and tech will be converted to a single variable with a value corresponding to each existing combination of the keys. This way we will be able to easily dcast it back as we will be operating against a single column- like in your first example

dcast(melt(df, c(1:2, 5)), sub + month ~ value, sum, value.var = "revenue")
#     sub  month PC apple dell mobile samsung tablet
# 1: X001 201506 10    20   10     20       0      0
# 2: X001 201508 17    20    0      0      17     20
# 3: X002 201507  0     0    0     20      35     15
# 4: X002 201508 14     0   11      0      14     11
# 5: X003 201507  0    25    9     25       0      9
# 6: X003 201508 12     0   21      9       0      0