R.S. - 1 year ago 74
R Question

# R- rationale for recycling boolean indices for selection

The title is self explaining. I would like to know why R has chosen to recycle boolean values for selection/subsetting?
The documentation for

`"["`
states
`Such vectors are recycled if necessary to match the corresponding extent. i, j`

Are there any advantages of doing this? I could think of one as mentioned below, but I'd think the disadvantages might outweigh the benefits of ease of use.

``````df<- data.frame(C1=1:10,c2=101:110)
class(unclass(df)[1]) # df is a list of two lists, each a column of df
df
df[1] # selects 1st list (ie, first column)
df[2]

# However, indices are recycled if we use Logical indices
df[TRUE] # selects both columns
df[c(T,T),] # recycled row indices
df[c(T,T,F),] # recycled row indices
df[FALSE]

# For example, this has only 7 index elements instead of 10,
# but it's quite possible to miss out on the fact that these are being recycled
df[c(T,F,T,T,F,F,F),]
``````

The only use of this recycling feature that I could think of was in
`skipping alternate rows`

``````df[c(T,F),]
``````

The context for asking this question is another one I saw on SO yesterday. It was later deleted as someone had pointed out the difference e between
`|`
and
`||`
. I wonder if they realised they were also dealing with recycling here.

``````   # An erronous use of &&  can land you in soup too
df [df\$C1 >0 && df\$c2 <102, ] #returns TRUE, will select all rows
``````

Are there any other well known pitfalls of this nature that one should be wary of?

Lets you select every nth row of column in a vector or data.frame or matrix:

``````> m <- matrix(1:20, 4)
> m[c(TRUE,FALSE), ]
[,1] [,2] [,3] [,4] [,5]
[1,]    1    5    9   13   17
[2,]    3    7   11   15   19
> m[, c(TRUE,FALSE) ]
[,1] [,2] [,3]
[1,]    1    9   17
[2,]    2   10   18
[3,]    3   11   19
[4,]    4   12   20
``````

Every third column:

``````> m[, c(TRUE,FALSE,FALSE) ]
[,1] [,2]
[1,]    1   13
[2,]    2   14
[3,]    3   15
[4,]    4   16
``````

The cited disadvantage is really an incorrect use of the `&&` operator (which I think you do actually realize). That operator only ever returns a length-1 vector and is generally inappropriate when trying to do indexing. That was probably the confusion exhibited by the questioner who used the `||` operator.

Ultimately the answer is because the authors liked it that way. R is a clone in most semantics of S and it was built around the dawn of high level languages in the AT&T think-tank.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download