user2006697 user2006697 - 10 months ago 78
R Question

R: combine several gsub() function ina pipe

To clean some messy data I would like to start using pipes %>%, but I fail to get the R code working if gsub() is not at the beginning of the pipe, should occur late (Note: this question is not concerned with proper import, but with data cleansing)

Simple example:

df <- c("2.187,78 ", "5.491,28 ", "7.000,32 "), B = c("A","B","C"))

Column A contains characters (in this case numbers, but this also could be string) and need to be cleaned.
The steps are

df$D <- gsub("\\.","",df$A)
df$D <- str_trim(df$D)
df$D <- as.numeric(gsub(",", ".",df$D))

One easily could pipe this

df$D <- gsub("\\.","",df$A) %>%
str_trim() %>%
as.numeric(gsub(",", ".")) %>%

The problem is the second gsub because it asks for the Input .... which actually the result of the previous line.

Please, could anyone explain how to use functions like gsub() further down the pipeline?
Thanks a lot!

system: R 3.2.3, Windows

Answer Source

Try this:


df$D <- df$A %>%
  { gsub("\\.","", .) } %>%
  str_trim() %>%
  { as.numeric(gsub(",", ".", .)) }

With pipe your data are passed as a first argument to the next function, so if you want to use it somethere else you need to wrap the next line in {} and use . as a data "marker".