microbe microbe - 28 days ago 5
R Question

Should I use a data.frame or a matrix?

When should one use a

data.frame
, and when is it better to use a
matrix
?

Both keep data in a rectangular format, so sometimes it's unclear.

Are there any general rules of thumb for when to use which data type?

Answer Source

Part of the answer is contained already in your question: You use data frames if columns (variables) can be expected to be of different types (numeric/character/logical etc.). Matrices are for data of the same type.

Consequently, the choice matrix/data.frame is only problematic if you have data of the same type.

The answer depends on what you are going to do with the data in data.frame/matrix. If it is going to be passed to other functions then the expected type of the arguments of these functions determines the choice.

Also:

Matrices are more memory efficient:

m = matrix(1:4, 2, 2)
d = as.data.frame(m)
object.size(m)
# 216 bytes
object.size(d)
# 792 bytes

Matrices are a necessity if you plan to do any linear algebra-type of operations.

Data frames are more convenient if you frequently refer to their columns by name (via the compact $ operator).

Data frames are also IMHO better for reporting (printing) tabular information as you can apply formatting to each column separately.