Bhabani Mohapatra Bhabani Mohapatra - 1 month ago 9
R Question

Subset Columns based on partial matching of column names in the same data frame

I would like to understand how to subset multiple columns from same data frame by matching the first 5 letters of the column names with each other and if they are equal then subset it and store it in a new variable.

Here is a small explanation of my required output. It is described below,

Lets say the data frame is

eatable


fruits_area fruits_production vegetable_area vegetable_production

12 100 26 324
33 250 40 580
66 510 43 581

eatable <- data.frame(c(12,33,660),c(100,250,510),c(26,40,43),c(324,580,581))
names(eatable) <- c("fruits_area", "fruits_production", "vegetables_area",
"vegetable_production")


I was trying to write a function which will match the strings in a loop and will store the subset columns after matching first 5 letters from the column names.

checkExpression <- function(dataset,str){
dataset[grepl((str),names(dataset),ignore.case = TRUE)]
}

checkExpression(eatable,"your_string")


The above function checks the string correctly but I am confused how to do matching among the column names in the dataset.

Edit:- I think regular expressions would work here.

Answer

You could try:

v <- unique(substr(names(eatable), 0, 5))
lapply(v, function(x) eatable[grepl(x, names(eatable))])

Or using dplyr

library(dplyr)
lapply(v, function(x) select_(eatable, ~matches(x)))

Which gives:

#[[1]]
#  fruits_area fruits_production
#1          12               100
#2          33               250
#3         660               510
#
#[[2]]
#  vegetables_area vegetable_production
#1              26                  324
#2              40                  580
#3              43                  581

Should you want to make it into a function:

checkExpression <- function(df, l = 5) {
  v <- unique(substr(names(df), 0, l))
  lapply(v, function(x) df[grepl(x, names(df))])
}

Then simply use:

checkExpression(eatable, 5)
Comments