Waldir Leoncio Waldir Leoncio - 1 month ago 6
Linux Question

Reading Rdata file with different encoding

I have an .RData file to read on my Linux (UTF-8) machine, but I know the file is in Latin1 because I've created them myself on Windows. Unfortunately, I don't have access to the original files or a Windows machine and I need to read those files on my Linux machine.

To read an Rdata file, the normal procedure is to run

load("file.Rdata")
. Functions such as
read.csv
have an
encoding
argument that you can use to solve those kind of issues, but
load
has no such thing. If I try
load("file.Rdata", encoding = latin1)
, I just get this (expected) error:


Error in load("file.Rdata", encoding = "latin1") :
unused argument (encoding = "latin1")


What else can I do? My files are loaded with text variables containing accents that get corrupted when opened in an UTF-8 environment.

Answer

Thanks to 42's comment, I've managed to write a function to recode the file:

fix.encoding <- function(df, originalEncoding = "latin1") {
  numCols <- ncol(df)
  for (col in 1:numCols) Encoding(df[, col]) <- originalEncoding
  return(df)
}

The meat here is the command Encoding(df[, col]) <- "latin1", which takes column col of dataframe df and converts it to latin1 format. Unfortunately, Encoding only takes column objects as input, so I had to create a function to sweep all columns of a dataframe object and apply the transformation.

Of course, if your problem is in just a couple of columns, you're better off just applying the Encoding to those columns instead of the whole dataframe (you can modify the function above to take a set of columns as input). Also, if you're facing the inverse problem, i.e. reading an R object created in Linux or Mac OS into Windows, you should use originalEncoding = "UTF-8".