Tareva Tareva - 1 month ago 15
R Question

Implementing sequential counter of decreasing values in R

I need to implement a counter that decrements

dec_cnt
by 1 based on certain conditions.

Below is my dataframe
df
.

ID A
1 0
2 0
3 0
4 1
5 1
6 0
7 0
8 0
9 0
10 0
11 0
12 0
13 0
14 0
15 0
16 -1
17 1
18 0
19 1
20 0
21 -1
22 0
23 0
24 -1
25 0
26 0
27 0
28 0
29 0
30 0
31 0
32 0
33 0
34 0


The conditions are

a. The counter should start from the data point where the
A==1 or -1
and start decrementing the counter for next
16
values,for example value of
A == 1
at
ID 4
, so from
ID == 4
till
ID==19
the decrement counter should be implemented starting from value
15
till counter is
0
. Also to note that if there exists any
A== 1/-1
in between this range it should be ignored.
b. I also need to implement
retain_A
column which retains the value of
A
through out the
counter
.

Below is my expected output.

ID A retain_A dec_cnt
1 0 NA NA
2 0 NA NA
3 0 NA NA
4 1 1 15
5 1 1 14
6 0 1 13
7 0 1 12
8 0 1 11
9 0 1 10
10 0 1 9
11 0 1 8
12 0 1 7
13 0 1 6
14 0 1 5
15 0 1 4
16 -1 1 3
17 1 1 2
18 0 1 1
19 1 1 0
20 0 NA NA
21 -1 -1 15
22 0 -1 14
23 0 -1 13
24 -1 -1 12
25 0 -1 11
26 0 -1 10
27 0 -1 9
28 0 -1 8
29 0 -1 7
30 0 -1 6
31 0 -1 5
32 0 -1 4
33 0 -1 3
34 0 -1 2


The similar kind of question had been posted couple of days ago where the solution uses
for loop
, Also the
loop
fails to execute if the data points are more than
35
. I wanted to avoid
for loop
because its execution time will be more if we are dealing with huge amount of data.

The data frame is take from the question posted here

below is the script that I tried using the above referenced post.

dec_cnt <- 0
Retain_A <- NA
for (i in seq_along(df$A)) {
if (dec_cnt == 0) {
if (df$A[i] == 0) next
dec_cnt <- 15
Retain_A <- df$A[i]
df$Retain_A[i] <- df$A[i]
df$dec_cnt[i] <- dec_cnt
} else {
dec_cnt <- dec_cnt - 1
df$Retain_A[i] <- Retain_A
df$dec_cnt[i] <- dec_cnt
}
}

Answer Source

I don't think it's realistic to avoid any kind of loop, for or otherwise. Perhaps a more realistic goal would be to avoid loops that iterate over every single value, regardless of whether it is relevant.

Starting from your 2-column input, let's pre-set the empty columns:

dat$retain_A <- NA
dat$dec_cnt  <- NA

Here's where we can gain some efficiency: instead of repeatedly making comparisons, we can know if it matches -1/1 now:

ind <- which(dat$A %in% c(-1,1))
last_match <- 0
ind
# [1]  4  5 16 17 19 21 24

The trick is to keep track of the last_match and discard any indices between it and the next 15 entries.

ind <- ind[ind > last_match]
while (length(ind) > 0) {
  i <- seq(ind[1], min(ind[1] + 15, nrow(dat)))
  dat$dec_cnt[i] <- head(15:0, n = length(i))
  dat$retain_A[i] <- dat$A[ ind[1] ]
  last_match <- ind[1] + 15
  ind <- ind[ind > last_match]
}
dat
#    ID  A retain_A dec_cnt
# 1   1  0       NA      NA
# 2   2  0       NA      NA
# 3   3  0       NA      NA
# 4   4  1        1      15
# 5   5  1        1      14
# 6   6  0        1      13
# 7   7  0        1      12
# 8   8  0        1      11
# 9   9  0        1      10
# 10 10  0        1       9
# 11 11  0        1       8
# 12 12  0        1       7
# 13 13  0        1       6
# 14 14  0        1       5
# 15 15  0        1       4
# 16 16 -1        1       3
# 17 17  1        1       2
# 18 18  0        1       1
# 19 19  1        1       0
# 20 20  0       NA      NA
# 21 21 -1       -1      15
# 22 22  0       -1      14
# 23 23  0       -1      13
# 24 24 -1       -1      12
# 25 25  0       -1      11
# 26 26  0       -1      10
# 27 27  0       -1       9
# 28 28  0       -1       8
# 29 29  0       -1       7
# 30 30  0       -1       6
# 31 31  0       -1       5
# 32 32  0       -1       4
# 33 33  0       -1       3
# 34 34  0       -1       2

You'll find that your initial loop iterates once per loop whereas this solution iterates only once per non-zero.