Daniel Falbel Daniel Falbel - 20 days ago 6
R Question

filtering data.frame based on row_number()

UPDATE: dplyr has been updated since this question was asked and now performs as the OP wanted

I´m trying to get the second to the seventh line in a

data.frame
using
dplyr
.

I´m doing this:

require(dplyr)
df <- data.frame(id = 1:10, var = runif(10))
df <- df %>% filter(row_number() <= 7, row_number() >= 2)


But this throws an error.

Error in rank(x, ties.method = "first") :
argument "x" is missing, with no default


I know i could easily make:

df <- df %>% mutate(rn = row_number()) %>% filter(rn <= 7, rn >= 2)


But I would like to understand why my first try is not working.

Answer

The row_number() function does not simply return the row number of each element and so can't be used like you want:

• ‘row_number’: equivalent to ‘rank(ties.method = "first")’

You're not actually saying what you want the row_number of. In your case:

df %>% filter(row_number(id) <= 7, row_number(id) >= 2)

works because id is sorted and so row_number(id) is 1:10. I don't know what row_number() evaluates to in this context, but when called a second time dplyr has run out of things to feed it and you get the equivalent of:

> row_number()
Error in rank(x, ties.method = "first") : 
  argument "x" is missing, with no default

That's your error right there.

Anyway, that's not the way to select rows.

You simply need to subscript df[2:7,], or if you insist on pipes everywhere:

> df %>% "["(.,2:7,)
  id        var
2  2 0.52352994
3  3 0.02994982
4  4 0.90074801
5  5 0.68935493
6  6 0.57012344
7  7 0.01489950