Jubbles Jubbles - 3 months ago 23
R Question

R - concatenate row-wise across specific columns of dataframe

I have a data frame with columns that, when concatenated (row-wise) as a string, would allow me to partition the data frame into a desired form.

> str(data)
'data.frame': 680420 obs. of 10 variables:
$ A : chr "2011-01-26" "2011-01-26" "2011-02-09" "2011-02-09" ...
$ B : chr "2011-01-26" "2011-01-27" "2011-02-09" "2011-02-10" ...
$ C : chr "2011-01-26" "2011-01-26" "2011-02-09" "2011-02-09" ...
$ D : chr "AAA" "AAA" "BCB" "CCC" ...
$ E : chr "A00001" "A00002" "B00002" "B00001" ...
$ F : int 9 9 37 37 37 37 191 191 191 191 ...
$ G : int NA NA NA NA NA NA NA NA NA NA ...
$ H : int 4 4 4 4 4 4 4 4 4 4 ...


For each row, I would like to concatenate the data in columns F, E, D, and C into a string (with the underscore character as separator). Below is my unsuccessful attempt at this:

data$id <- sapply(as.data.frame(cbind(data$F,data$E,data$D,data$C)), paste, sep="_")


And below is the undesired result:

> str(data)
'data.frame': 680420 obs. of 10 variables:
$ A : chr "2011-01-26" "2011-01-26" "2011-02-09" "2011-02-09" ...
$ B : chr "2011-01-26" "2011-01-27" "2011-02-09" "2011-02-10" ...
$ C : chr "2011-01-26" "2011-01-26" "2011-02-09" "2011-02-09" ...
$ D : chr "AAA" "AAA" "BCB" "CCC" ...
$ E : chr "A00001" "A00002" "B00002" "B00001" ...
$ F : int 9 9 37 37 37 37 191 191 191 191 ...
$ G : int NA NA NA NA NA NA NA NA NA NA ...
$ H : int 4 4 4 4 4 4 4 4 4 4 ...
$ id : chr [1:680420, 1:4] "9" "9" "37" "37" ...
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr "V1" "V2" "V3" "V4"


Any help would be greatly appreciated.

Answer

Try

 data$id <- paste(data$F, data$E, data$D, data$C, sep="_")

instead. The beauty of vectorized code is that you do not need row-by-row loops, or loop-equivalent *apply functions.

Edit Even better is

 data <- within(data,  id <- paste(F, E, D, C, sep="")
Comments