Amit Kohli Amit Kohli - 4 months ago 14
R Question

Import excel with duplicate columns that end with a number (r xlsx)

I have an xlsx file that contains let's say 4 columns:

Thingie1
Thingie1
Thingie2
Thingie2


(I know it's stupid... it's because I am combining data from 2 different databases, and the
1
and
2
denote a difference between survey 1 and survey 2 and chose to dump out into a xlsx file.

When I import this into R using the xlsx package, it changes the column titles to:

Thingie1
Thingie3
Thingie2
Thingie4


Obviously my real-world example is much more complex and it then becomes very difficult to identify what is what. I would like to have something like:

Thingie1
Thingie11
Thingie2
Thingie21


Now, I can resolve this by opening the file in excel and changing the columns there... so I can solve my immediate problem, but I hate breaking my workflow... would there be some way to import this data in a better way into R itself?

Answer

I think you should read the first line (containing column names) alone of your file with header=FALSE for proper naming of columns. Then use make.unique function upon it. Later read the whole file with header=TRUE and set the column names using colnames function.

x <- c("Thingie1", "Thingie1", "Thingie2", "Thingie2")
x
#[1] "Thingie1" "Thingie1" "Thingie2" "Thingie2"
make.unique(x)
#[1] "Thingie1"   "Thingie1.1" "Thingie2"   "Thingie2.1"