J Kang J Kang - 17 days ago 6
R Question

Ragged rowSums in R

I am trying to do a rowsum for the actuals columns. However, I would like to include the valuese up to the UpTo date for certain observations. Here is the data frame:

Company UpTo Actual.Q1 Actual.Q2 Forecast.Q3 Forecast.Q4
1 ABC NA 100 50 80 90
2 DEF Q2 80 75 50 80
3 XYZ Q3 100 50 80 100


For company ABC, since there is no UpTo date, it will just be Actual.Q1 + Actual.Q2, which is 150.

For company DEF, since the UpTo date is Q2, it will be Actual.Q1 + Actual.Q2, which is 155.

For company XYZ, since the UpTo date is Q3, it will be Actual.Q1 + Actual.Q2 + Forecast.Q3, which is 230.

The resulting data frame would look like this:

Company UpTo Actual.Q1 Actual.Q2 Forecast.Q3 Forecast.Q4 SumRecent
1 ABC NA 100 50 80 90 150
2 DEF Q2 80 75 50 80 155
3 XYZ Q3 100 50 80 100 230


I have tried to use the
rowSums
function. However, it does not take into effect the variable
UpTo
. Any help is appreciated. Thanks!

Answer

Here is a possibility:

df$SumRecent <- sapply(1:nrow(df), function(x) {sum(df[x,3:ifelse(is.na(grep(df[x,2], colnames(df))[1]), 4, grep(df[x,2], colnames(df))[1])])})


#   Company UpTo Actual.Q1 Actual.Q2 Forecast.Q3 Forecast.Q4 SumRecent
# 1     ABC <NA>       100        50          80          90       150
# 2     DEF   Q2        80        75          50          80       155
# 3     XYZ   Q3       100        50          80         100       230

We are looking with the use of grep for a match of the value in the column UpTo (df[x,2]) in the column names of df (colnames(df)). If we find it we get the sum, if we don't find it we just sum the values in columns 3 and 4.