David Z David Z - 3 months ago 9
R Question

Merge columns within a dataframe that is very wide in R

Suppose I have a data frame such like:

set.seed(123)
df<-data.frame(ID=letters[1:10],
A1=rnorm(10),A2=rnorm(10,1,3),
B1=rnorm(10),B2=rnorm(10,1,3),
C1=rnorm(10),C2=rnorm(10,1,3),
D1=rnorm(10),D2=rnorm(10,1,3),
E1=rnorm(10),E2=rnorm(10,1,3))


What I wanted is to add up
*1
and
*2
columns from A-E. It seems easy to realize but the key issue for my case is the df is very wide and I prefer to not type A-E with a desired output with column names
c("ID", "A", "B", "C", "D", "E")
.

Answer

We can split the column names of 'df' except the first one ('ID') by the substring of the column names created by removing the numeric part (using sub), loop through the list elements with sapply, subset the columns of 'df' based on the index, get the rowSums and cbind with the first column (df[1])

cbind(df[1], sapply(split(names(df)[-1], sub("\\d+", "", names(df)[-1])),
                              function(x) rowSums(df[x])))
# ID          A          B          C           D           E
#1   a  4.1117697  1.2115690  1.0652486 -0.09345402  3.98627575
#2   b  1.8492640 -0.1031894  0.7064425 -6.42983008  3.03047128
#3   c  3.7610227  2.6593725 -0.3940077  3.68400819  1.34553517
#4   d  1.4025565  2.9055092  7.2747628 -2.14617767 -0.23934168
#5   e -0.5382357  2.8397040  1.5306490 -2.13581708  4.86147078
#6   f  8.0758044  1.3792275  4.4263032  4.38024275 -0.46899680
#7   g  2.9544676  3.4995400 -4.0491432  0.59389076  8.65883799
#8   h -6.1649127  0.9676380  2.2871859 -2.60914891  6.03301337
#9   i  2.4172149 -1.0560249  2.1515278  2.46617791 -0.03303266
#10  j -0.8640362  1.1124019  1.5644556  2.63341060 -0.93045508