mropa mropa - 2 months ago 11
R Question

How to trim leading and trailing whitespace in R?

I am having some troubles with leading and trailing whitespace in a data.frame.
Eg I like to take a look at a specific

row
in a
data.frame
based on a certain condition:

> myDummy[myDummy$country == c("Austria"),c(1,2,3:7,19)]

[1] codeHelper country dummyLI dummyLMI dummyUMI
[6] dummyHInonOECD dummyHIOECD dummyOECD
<0 rows> (or 0-length row.names)


I was wondering why I didn't get the expected output since the country Austria obviously existed in my
data.frame
. After looking through my code history and trying to figure out what went wrong I tried:

> myDummy[myDummy$country == c("Austria "),c(1,2,3:7,19)]
codeHelper country dummyLI dummyLMI dummyUMI dummyHInonOECD dummyHIOECD
18 AUT Austria 0 0 0 0 1
dummyOECD
18 1


All I have changed in the command is an additional whitespace after Austria.

Further annoying problems obviously arise. Eg when I like to merge two frames based on the country column. One
data.frame
uses
"Austria "
while the other frame has
"Austria"
. The matching doesn't work.


  1. Is there a nice way to 'show' the whitespace on my screen so that i am aware of the problem?

  2. And can I remove the leading and trailing whitespace in R?



So far I used to write a simple
Perl
script which removes the whitespace but it would be nice if I can somehow do it inside R.

Answer

Probably the best way is to handle the trailing whitespaces when you read your data file. If you use read.csv or read.table you can set the parameterstrip.white=TRUE.

If you want to clean strings afterwards you could use one of these functions:

# returns string w/o leading whitespace
trim.leading <- function (x)  sub("^\\s+", "", x)

# returns string w/o trailing whitespace
trim.trailing <- function (x) sub("\\s+$", "", x)

# returns string w/o leading or trailing whitespace
trim <- function (x) gsub("^\\s+|\\s+$", "", x)

To use one of these functions on myDummy$country:

 myDummy$country <- trim(myDummy$country)

To 'show' the whitespace you could use:

 paste(myDummy$country)

which will show you the strings surrounded by quotation marks (") making whitespaces easier to spot.