Chandan Kumar Chandan Kumar -3 years ago 131
R Question

How to apply a function row-wise having if-else condition using data.table?

I need to apply a function on a data frame below. 'day' is sales values. I need to shift the sales to right based on offset value. For example in row 1 offset is 1, I need to shift sales by 1 day, if 0 then no shift and henceforth.

id <- c('a', 'b', 'c', 'd', 'e', 'f')
offset <- c(1,2,3,0,0,2)
day1 <- c(1,2,3,4,5, 0)
day2 <- c(1,2,3,4,5, 2)
day3 <- c(1,2,3,4,5, 6)
day4 <- c(1,2,3,4,5, 6)
day5 <- c(1,2,0,4,5, 0)
day6 <- c(1,0,0,0,5, 0)
day7 <- c(0,0,0,0,0, 0)
df <- data.frame(id, offset, day1, day2, day3, day4, day5, day6, day7)

> df
id offset day1 day2 day3 day4 day5 day6 day7
a 1 1 1 1 1 1 1 0
b 2 2 2 2 2 2 0 0
c 3 3 3 3 3 0 0 0
d 0 4 4 4 4 4 0 0
e 0 5 5 5 5 5 5 0
f 2 0 2 6 6 0 0 0


The result should be like below:

> result
id offset day1 day2 day3 day4 day5 day6 day7
a 1 0 1 1 1 1 1 1
b 2 0 0 2 2 2 2 2
c 3 0 0 0 3 3 3 3
d 0 4 4 4 4 4 0 0
e 0 5 5 5 5 5 5 0
f 2 0 0 0 2 6 6 0


I intended to use following pseudo function row-wise in data.table:

shiftSales = function(df){
if (start > 0)
{
then_no_shift
}
else
{
offset_by_offset_value
}
return(shift_df)
}
result <- df(,shiftSales(df), by = "id")


Note: if it is possible without data.table, I am ok. But my data is large so I thought data.table approach would be faster.

Please help me out. Thanks in advance.

Answer Source

The OP has requested to shift data row-wise across columns by an offset which is specific for each row. (Unfortunately, the title mentioning if-else is somewhat misleading).

All solutions posted so far, are using the t() function (matrix transpose) which indicates that the way the data are stored is not particularly well suited for this type of operation.

The solution below uses melt() to reshape the data from wide to long form before applying a shift() operation on columns:

library(data.table)
# reshape from wide to long
melt(setDT(df), measure.vars = patterns("^day"))[
  # shift values for each id by its individual offset
  , value := shift(value, offset, fill = 0), by = id][
    # reshape to wide format agian for comparison
    , dcast(.SD, id + offset ~ variable)]
   id offset day1 day2 day3 day4 day5 day6 day7
1:  a      1    0    1    1    1    1    1    1
2:  b      2    0    0    2    2    2    2    2
3:  c      3    0    0    0    3    3    3    3
4:  d      0    4    4    4    4    4    0    0
5:  e      0    5    5    5    5    5    5    0
6:  f      2    0    0    0    2    6    6    0

Caveat: It is assumed that id is unique. Otherwise, an additional row number needs to be introduced.


As already mentioned, I suggest to reconsider the way the data is stored.

Currently, longitudinal data, i.e., day1, day2, day3, ..., are stored in a data.frame in wide format, i.e., in separate columns. This is not ideal as it requires row-wise operations to be performed across columns.

Instead, the data could be stored as a matrix (with id as row names and offset stored in a separate vector). Or, if there are other, undisclosed columns, in a data.frame in long format.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download