Vincent - 7 months ago 164

R Question

Normally I wonder where mysterious errors come from but now my question is where a mysterious lack of error comes from.

Let

`numbers <- c(1, 2, 3)`

frame <- as.data.frame(numbers)

If I type

`subset(numbers, )`

(so I want to take some subset but forget to specify the subset-argument of the subset function) then R reminds me (as it should):

Error in subset.default(numbers, ) :

argument "subset" is missing, with no default

However when I type

`subset(frame,)`

(so the same thing with a

`data.frame`

What is going on here? Why don't I get my well deserved error message?

Answer

R has a couple of object-oriented systems built-in. The simplest and most common is called S3. This OO programming style implements what Wickham calls a "generic-function OO." Under this style of OO, an object called a generic function looks at the class of an object and the applies the proper method to the object. (this is a brief sketch of S3. To get a better idea of how it works you might check out the relevant portion of the Advanced R site).

The `subset`

function works on this principle. If the first argument to subset is an object with the data.frame class, then R uses the function `subset.data.frame`

. It is defined as below:

```
subset.data.frame
function (x, subset, select, drop = FALSE, ...)
{
r <- if (missing(subset))
rep_len(TRUE, nrow(x))
else {
e <- substitute(subset)
r <- eval(e, x, parent.frame())
if (!is.logical(r))
stop("'subset' must be logical")
r & !is.na(r)
}
vars <- if (missing(select))
TRUE
else {
nl <- as.list(seq_along(x))
names(nl) <- names(x)
eval(substitute(select), nl, parent.frame())
}
x[r, vars, drop = drop]
}
```

Note that if the subset argument is missing, the first lines

```
r <- if (missing(subset))
rep_len(TRUE, nrow(x))
```

produce a vector of TRUES of the same length as the data.frame, and the last line

```
x[r, vars, drop = drop]
```

feeds this vector into the row argument. This means that if you did not include a subset argument, then the `subset`

function will return all of the rows of the data.frame.

As your error

Error in subset.default(numbers, )

shows, when you apply `subset`

to a vector, R calls the `subset.default`

method which is defined as

```
subset.default
function (x, subset, ...)
{
if (!is.logical(subset))
stop("'subset' must be logical")
x[subset & !is.na(subset)]
}
```

Here, an error is thrown when the subset argument is missing.