Erdogan CEVHER Erdogan CEVHER - 23 days ago 11
R Question

Pass the additional argument "nrow" (no. of rows) to the as.data.frame function in R?

(reproducible example given) How to pass the additional argument

nrow
to
as.data.frame
in R?

In
?as.data.frame
, it is given:

as.data.frame(x, row.names = NULL, optional = FALSE, ...)

... additional arguments to be passed to or from methods.


With the co-worker
matrix(..., nrow)
, it is:

set.seed(1)
df <- as.data.frame(matrix(c(rnorm(5),rnorm(5), rnorm(5)), nrow=5, byrow=TRUE))
df
# V1 V2 V3
# 1 -0.6264538 0.1836433 -0.8356286
# 2 1.5952808 0.3295078 -0.8204684
# 3 0.4874291 0.7383247 0.5757814
# 4 -0.3053884 1.5117812 0.3898432
# 5 -0.6212406 -2.2146999 1.1249309


Without
matrix(..., nrow)
simulator, it is:

set.seed(1)
df <- as.data.frame(c(rnorm(5),rnorm(5), rnorm(5)))
df
# c(rnorm(5), rnorm(5), rnorm(5))
# 1 -0.6264538
# 2 0.1836433
# ..................................
# 15 1.1249309


I want to pass
nrow
as an argument to
as.data.frame
that will replace the job of
matrix(...,nrow)
. The help file of
as.data.frame
seems to say it is achievable. But how?

Answer

c(rnorm(5),rnorm(5), rnorm(5)) is just a vector. (And, btw, would be simpler to write as rnorm(15).) When you call as.data.frame on a vector, S3 dispatch will end up using as.data.frame.vector. Your question assumes that internally as.data.frame.vector converts the input to a matrix before putting it into a data frame. This is an incorrect assumption.

Because as.data.frame.vector would only ever be called on a single vector, it knows it only has one column to deal with so it has a relatively simple job. You can look at the code by typing as.data.frame.vector and you will see that no matrices are used and that, in this method, ... is also not used in the function body.

You have code that works, as.data.frame(matrix(your_vector, nrow = your_nrow)). It's a good solution. Be content.

It makes sense for matrix or as.matrix to have an nrow argument because all elements of a matrix must have the same type. Thus it is common for a vector (in which all elements must also have the same type) gets turned into a matrix with rows and columns. A data.frame allows each column to be of different types, so "wrapping" input data from one column to the next is unusual - it's not assumed that the next column is a continuation of the previous. Given your example, it's worth asking if you even want a data frame - computations with matrices are much faster as it is a simpler data structure.


There are many ways to create the data frame you want. The following will all work (only the column names will differ, the data values are the same). How you generate the input vector is up to you.

set.seed(1)
d1 = as.data.frame(matrix(rnorm(15), nrow = 5))

set.seed(1)
d2 = data.frame(replicate(3, rnorm(5)))

set.seed(1)
d3 = data.frame(rnorm(5), rnorm(5), rnorm(5))

set.seed(1)
my_vectors = list(rnorm(5), rnorm(5), rnorm(5))
d4 = as.data.frame(do.call(cbind, my_vectors))