bikhaab bikhaab - 1 month ago 15
R Question

Changing column names in dataframe using gsub

I have an atomic vector like:

col_names_to_be_changed <- c("PRODUCTIONDATE", "SPEEDRPM", "PERCENTLOADATCURRENTSPEED", sprintf("SENSOR%02d", 1:18))


I'd like to have
_
between words, have them all lower case, except first letters of words (following R Style for dataframes from advanced R). I'd like to have something like this:

new_col_names <- c("Production_Date", "Percent_Load_At_Current_Speed", sprintf("Sensor_%02d", 1:18))


Assume that my words are limited to this list:

list_of_words <- c('production', 'speed', 'percent', 'load', 'at', 'current', 'sensor')


I am thinking of an algorithm that uses
gsub
, puts
_
wherever it finds a word from the above list and then Capitalizes the first letter of each word. Although I can do this manually, I'd like to learn how this can be done more beautifully using
gsub
. Thanks.

Answer

You can take the list of words and paste them with a look-behind ((?<=)). I added the (?=.{2,}) because this will also match the "AT" in "DATE" since "AT" is in the list of words, so whatever is in the list of words will need to be followed by 2 or more characters to be split with an underscore.

The second gsub just does the capitalization

list_of_words <- c('production', 'speed', 'percent', 'load', 'at', 'current', 'sensor')
col_names_to_be_changed <- c("PRODUCTIONDATE", "SPEEDRPM", "PERCENTLOADATCURRENTSPEED", sprintf("SENSOR%02d", 1:18))


(pattern <- sprintf('(?i)(?<=%s)(?=.{2,})', paste(list_of_words, collapse = '|')))
# [1] "(?i)(?<=production|speed|percent|load|at|current|sensor)(?=.{2,})"

(split_words <- gsub(pattern, '_', tolower(col_names_to_be_changed), perl = TRUE))
# [1] "production_date"               "speed_rpm"                     "percent_load_at_current_speed"
# [4] "sensor_01"                     "sensor_02"                     "sensor_03"                    

gsub('(?<=^|_)([a-z])', '\\U\\1', split_words, perl = TRUE)
# [1] "Production_Date"               "Speed_Rpm"                     "Percent_Load_At_Current_Speed"
# [4] "Sensor_01"                     "Sensor_02"                     "Sensor_03"