Neil Neil - 3 years ago 185
R Question

how to combine multiple columns with grep and sum the values in r

I have following dataframe in r

Engine General Ladder.winch engine.phe subm.gear.box aux.engine pipeline.maintain pipeline pipe.line engine.mpd
1 12 22 2 4 2 4 5 6 7


and so on with more than 10000 rows.

Now,I want to combine columns and add values to reduce the columns into broader categories. e.g
Engine,engine.phe,aux.engine,engine.mpd
should be combined into
Engine
category and all the values to be added. likewise
pipeline.maintain,pipeline,pipe.line
to be combined into
Pipeline
And rest columns to be added under
General
Category.

Desired dataframe would be

Engine Pipeline General
12 15 38


How can I do it in r?

Answer Source

Here is an option by extracting the concerned words from the names of the column, and using tapply to get the sum. The str_extract_all returns a list ('lst'). Replace those elements which are having zero length with 'GENERAL', Then, using a group by function i.e. tapply, unlist the dataset, and use the grouping variables i.e replicated 'lst' and the row of 'df1' get the sum

library(stringr)
lst <- str_extract_all(toupper(sub("(pipe)\\.", "\\1", names(df1))),
          "ENGINE|PIPELINE|GENERAL")
lst[lengths(lst)==0] <- "GENERAL"
t(tapply(unlist(df1), list(unlist(lst)[col(df1)], row(df1)), FUN = sum))
#   ENGINE  GENERAL PIPELINE 
#1      12       38       15 
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download