user2716568 user2716568 - 3 months ago 7
R Question

How can I delete every n-th row from a dataframe in R, according to grouping variable?

I wish to take every second measurement from a data.frame according to a grouping variable. For example, in the data.frame

Input
, take every second
Sample
for each
ID
:

head(Input, 10)
Sample X ID
15918 1 -1.326285 EABE_D5
15919 2 -1.315783 EABE_D5
15920 3 -1.313245 EABE_D5
15921 4 -1.304670 EABE_D5
15922 5 -1.309060 EABE_D5
15923 1 -1.292412 EABE_D4
15924 2 -1.294728 EABE_D4
15925 3 -1.282006 EABE_D4
15926 4 -1.287245 EABE_D4
15927 5 -1.278444 EABE_D4


and create a new data.frame named
Output
:

Output
Sample X ID
15919 2 -1.315783 EABE_D5
15921 4 -1.304670 EABE_D5
15924 2 -1.294728 EABE_D4
15926 4 -1.287245 EABE_D4


Is this possible? Thank you.

Answer

We can use dplyr. After grouping by 'ID', we slice the rows based on the even index returned by seq

library(dplyr)
Input %>%
   group_by(ID) %>%
   slice(seq(2, n(), by =2))
#  Sample         X      ID
#   <int>     <dbl>   <chr>
#1      2 -1.294728 EABE_D4
#2      4 -1.287245 EABE_D4
#3      2 -1.315783 EABE_D5
#4      4 -1.304670 EABE_D5

Or we can use data.table for efficiency

library(data.table)
setDT(Input)[Input[, .I[seq(2, .N, by = 2)], by = ID]$V1]

Or with ave from base R, we group by 'ID', apply the modulo operator %% with y as 2, convert to logical by negating (!) and with this logical vector, we subset the rows.

Input[with(Input, !ave(Sample, ID, FUN = function(x) x %%2)),]
#      Sample         X      ID
#15919      2 -1.315783 EABE_D5
#15921      4 -1.304670 EABE_D5
#15924      2 -1.294728 EABE_D4
#15926      4 -1.287245 EABE_D4
Comments