user9292 user9292 - 1 month ago 11
R Question

Create a new data with rows aggregation in R

The data frame I have contains two colum: ID and type (character). See below:

set.seed(123)
ID <- seq(1,25)
type <- sample(letters[1:26], 25, replace=TRUE)

df <- data.frame(ID, type)


I need to create a new data frame that contain only one column. The first observation will be the first
three letters in column type, the second observation is the second three letters, and soon on.

The new data looks like

ndf <- data.frame(ntype=c("huk", "wyb", "nxo", "lyl", "roc", "xgb", "iyx", "sqz", "r"))

Answer Source

We create a grouping variable with gl and then with tapply, paste the elements together

n <- 3 
ndf <- data.frame(ntype = with(df, unname(tapply(type, as.integer(gl(nrow(df), n, 
         nrow(df))), FUN =paste, collapse=""))), stringsAsFactors= FALSE)
ndf$ntype
#[1] "huk" "wyb" "nxo" "lyl" "roc" "xgb" "iyx" "sqz" "r"  

Or another option is to paste the whole column together and then split

strsplit(paste(df$type, collapse=""), "(?<=.{3})", perl = TRUE)[[1]]
#[1] "huk" "wyb" "nxo" "lyl" "roc" "xgb" "iyx" "sqz" "r"  

Or another option is substring with paste

substring(paste(df$type, collapse=""), seq(1, nrow(df), by = 3),
        c(seq(3, nrow(df), by = 3), nrow(df)))
#[1] "huk" "wyb" "nxo" "lyl" "roc" "xgb" "iyx" "sqz" "r"  

Note: All the above are base R solutions