Connor J - 1 year ago 66
R Question

# How to remove rows matching criteria and rows adjacent to them

I have the following sample data:

``````data <- data.table(ID = c(1,1,1,1,1,1,2,2,2,2,2,2,3,3,3,3,3,3,4,4,4,4,4,4),
date = c(1,2,3,4,5,6,1,2,3,4,5,6,1,2,3,4,5,6,1,2,3,4,5,6),
score = c(4,3,2,2,4,1,5,5,5,2,1,4,2,1,5,5,5,3,5,5,5,2,4,5))

ID date score
1:  1    1     4
2:  1    2     3
3:  1    3     2
4:  1    4     2
5:  1    5     4
6:  1    6     1
7:  2    1     5
8:  2    2     5
9:  2    3     5
10:  2    4     2
11:  2    5     1
12:  2    6     4
13:  3    1     2
14:  3    2     1
15:  3    3     5
16:  3    4     5
17:  3    5     5
18:  3    6     3
19:  4    1     5
20:  4    2     5
21:  4    3     5
22:  4    4     2
23:  4    5     4
24:  4    6     5
ID date score
``````

I want to eliminate certain rows, and change others, based in part on their position in the table. I have two criteria, for each
`ID`
:

1. If a row has
`date == 1`
and
`score == 5`
, I want to remove that row AND all subsequent rows that have
`score==5`
that follow immediately after that row, until
`score`
is not 5. (so, for example, for
`I == 4`
, I want to keep the data for dates 4,5,6).

2. For all other dates where
`score == 5`
, I want to replace their score with the average of their preceding two scores (or just their previous score, if they only have one prior score).

So, the table I want to end up with is:

``````   ID date score
1:  1    1   4.0
2:  1    2   3.0
3:  1    3   2.0
4:  1    4   2.0
5:  1    5   4.0
6:  1    6   1.0
7:  2    4   2.0
8:  2    5   1.0
9:  2    6   4.0
10:  3    1   2.0
11:  3    2   1.0
12:  3    3   1.5
13:  3    4   1.5
14:  3    5   1.5
15:  3    6   3.0
16:  4    4   2.0
17:  4    5   4.0
18:  4    6   3.0
``````

What is the best way to go about this? I imagine it's some combination of
`shift`
and
`.I`
but I haven't been able to put it together.

Answer Source
``````# find rows satisfying 1st condition
torm = data[, if(score[1] == 5 & date[1] == 1) .I
, by = .(ID, rleid(score), cumsum(date == 1))]\$V1

library(caTools) # for running mean

data[-torm    # remove the extra rows
# add a running mean
][, mn := runmean(score, 2, endrule = 'keep', align = 'right'), by = ID
# compute the new score - a little care needed here in case we only have 5's in a group
][, new.score := ifelse(score == 5, mn[which(score != 5)[1]], score)
, by = .(ID, cumsum(score != 5))][]
#    ID date score  mn new.score
# 1:  1    1     4 4.0       4.0
# 2:  1    2     3 3.5       3.0
# 3:  1    3     2 2.5       2.0
# 4:  1    4     2 2.0       2.0
# 5:  1    5     4 3.0       4.0
# 6:  1    6     1 2.5       1.0
# 7:  2    4     2 2.0       2.0
# 8:  2    5     1 1.5       1.0
# 9:  2    6     4 2.5       4.0
#10:  3    1     2 2.0       2.0
#11:  3    2     1 1.5       1.0
#12:  3    3     5 3.0       1.5
#13:  3    4     5 5.0       1.5
#14:  3    5     5 5.0       1.5
#15:  3    6     3 4.0       3.0
#16:  4    4     2 2.0       2.0
#17:  4    5     4 3.0       4.0
#18:  4    6     5 4.5       3.0
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download