Gregor Sturm Gregor Sturm - 2 months ago 19
R Question

How are columns named when creating a dataframe from different dataframe columns?

Assume, I have a dataframe

df1 = data.frame(df1.a=1:3, df1.b=1:3, df1.c=1:3)

df1.a df1.b df1.c
1 1 1 1
2 2 2 2
3 3 3 3

And create a second one from the first one using different selectors:

df2 = data.frame(df2.a=df1$df1.a, df2.b=df1[,"df1.b"], df2.c=df1["df1.c"])

Why does the column name of the third column get overridden by the original column name and the others don't?

df2.a df2.b df1.c <-- why is this not df2.c??
1 1 1 1
2 2 2 2
3 3 3 3


That is because df1["df1.a"] is a data.frame of one column, whereas df1[,"df1.a"] is a vector.


> class(df1[,"df1.a"])
[1] "integer"
> class(df1["df1.a"])
[1] "data.frame"

According to the documentation:

For a named or unnamed matrix/list/data frame argument that contains a single column, the column name in the result is the column name in the argument.

Therefore, the argument name in

data.frame(…, df2.c=df1["df1.c"])

is "ignored" and the call treated as

data.frame(…, df1.c=df1$df1.c)

Of course, the argument name is technically not ignored.

As to why that is—the column naming is complex:

How the names of the data frame are created is complex, and the rest of this paragraph is only the basic story.

For example, try

data.frame(df2.x = df1[c("df1.a", "df1.b")])
  df2.x.df1.a df2.x.df1.b
1           1           1
2           2           2
3           3           3

(Thanks to Roman for pointing to a better part of the documentation.)