Chandan Kumar -3 years ago 131
R Question

# How to apply a function row-wise having if-else condition using data.table?

I need to apply a function on a data frame below. 'day' is sales values. I need to shift the sales to right based on offset value. For example in row 1 offset is 1, I need to shift sales by 1 day, if 0 then no shift and henceforth.

``````id <- c('a', 'b', 'c', 'd', 'e', 'f')
offset <- c(1,2,3,0,0,2)
day1 <-   c(1,2,3,4,5, 0)
day2 <-   c(1,2,3,4,5, 2)
day3 <-   c(1,2,3,4,5, 6)
day4 <-   c(1,2,3,4,5, 6)
day5 <-   c(1,2,0,4,5, 0)
day6 <-   c(1,0,0,0,5, 0)
day7 <-   c(0,0,0,0,0, 0)
df <- data.frame(id, offset, day1, day2, day3, day4, day5, day6, day7)

> df
id offset day1 day2 day3 day4 day5 day6 day7
a      1    1    1    1    1    1    1    0
b      2    2    2    2    2    2    0    0
c      3    3    3    3    3    0    0    0
d      0    4    4    4    4    4    0    0
e      0    5    5    5    5    5    5    0
f      2    0    2    6    6    0    0    0
``````

The result should be like below:

``````> result
id offset day1 day2 day3 day4 day5 day6 day7
a      1    0    1    1    1    1    1    1
b      2    0    0    2    2    2    2    2
c      3    0    0    0    3    3    3    3
d      0    4    4    4    4    4    0    0
e      0    5    5    5    5    5    5    0
f      2    0    0    0    2    6    6    0
``````

I intended to use following pseudo function row-wise in data.table:

``````shiftSales = function(df){
if (start > 0)
{
then_no_shift
}
else
{
offset_by_offset_value
}
return(shift_df)
}
result <- df(,shiftSales(df), by = "id")
``````

Note: if it is possible without data.table, I am ok. But my data is large so I thought data.table approach would be faster.

The OP has requested to shift data row-wise across columns by an offset which is specific for each row. (Unfortunately, the title mentioning if-else is somewhat misleading).

All solutions posted so far, are using the `t()` function (matrix transpose) which indicates that the way the data are stored is not particularly well suited for this type of operation.

The solution below uses `melt()` to reshape the data from wide to long form before applying a `shift()` operation on columns:

``````library(data.table)
# reshape from wide to long
melt(setDT(df), measure.vars = patterns("^day"))[
# shift values for each id by its individual offset
, value := shift(value, offset, fill = 0), by = id][
# reshape to wide format agian for comparison
, dcast(.SD, id + offset ~ variable)]
``````
``````   id offset day1 day2 day3 day4 day5 day6 day7
1:  a      1    0    1    1    1    1    1    1
2:  b      2    0    0    2    2    2    2    2
3:  c      3    0    0    0    3    3    3    3
4:  d      0    4    4    4    4    4    0    0
5:  e      0    5    5    5    5    5    5    0
6:  f      2    0    0    0    2    6    6    0
``````

Caveat: It is assumed that `id` is unique. Otherwise, an additional row number needs to be introduced.

As already mentioned, I suggest to reconsider the way the data is stored.

Currently, longitudinal data, i.e., `day1`, `day2`, `day3`, ..., are stored in a data.frame in wide format, i.e., in separate columns. This is not ideal as it requires row-wise operations to be performed across columns.

Instead, the data could be stored as a matrix (with `id` as row names and `offset` stored in a separate vector). Or, if there are other, undisclosed columns, in a data.frame in long format.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download