matandked matandked - 19 days ago 8
R Question

Sort vector of strings based on part of string (date) in R

I have vector of strings in R! which stores file names.

File names contains date stored in following format: 'YYYYMMDD'.
Sample file names are as follows:

"ext-SM_OPER_MIR_CLF31A_20150506T000000_20150506T235959_300_002_7_1.DBL.nc"
"ext-SM_RE04_MIR_CLF31A_20150505T000000_20150505T235959_300_001_7_1.DBL.nc"

I would like to sort list using date from file name (so that files for the earliest date will be the first in the vector).
Unfortunately, sort function in R! doesn't have any 'regex' sorting criteria parameter. How should I do that?

My sample data:

files <- c("ext-SM_OPER_MIR_CLF31A_20150506T000000_20150506T235959_300_002_7_1.DBL.nc",
"SMAP_L3_SM_AP_20150422_R13080_001.h5.tif","SMAP_L3_SM_AP_20150606_R13080_001.h5.tif",
"ext-SM_OPER_MIR_CLF31A_20150530T000000_20150530T235959_300_003_7_1.DBL.nc",
"ext-SM_RE04_MIR_CLF31A_20150418T000000_20150418T235959_300_001_7_1.DBL.nc",
"ext-SM_RE04_MIR_CLF31A_20150419T000000_20150419T235959_300_001_7_1.DBL.nc")

Answer

This should work:

files[order(as.Date(regmatches(files,regexpr("(?<=_)[0-9]{8}",files,perl=T)),format="%Y%m%d"))]

edit: same approach as everyone. Extract the dates, turn them into a date format, then use them to reorder files.
The idea behind the regex is to extract a series of 8 numbers ([0-9]{8}) that occurs after a _ symbol ((?<=_))