Christoph Christoph - 3 months ago 5
R Question

How can I transform an array of characters with a few lines of code to a data.frame?

I have the following array

my_list <- c("Jan-01--Dec-31|00:00--24:00", "Jan-01--Jun-30|12:00--18:00",
"Jul-06--Dec-31|09:00--19:00")


What is the shortest code which results in:

x1 x2 x3
1 Jan-01 Jan-01 Jul-06
2 Dec-31 Jun-30 Dec-31


and

x2 x2 x3
1 00:00 12:00 09:00
2 24:00 18:00 19:00


At the moment I have the (not very nice) code

df <- as.data.frame(strsplit(my_list, split = "|", fixed = T),
stringsAsFactors = F)
date_list <- strsplit(as.character(df[1, ]), split = "--", fixed = T)
date_df <- as.data.frame(date_list, col.names = c(1:length(date_list)),
stringsAsFactors = F)
time_list <- strsplit(as.character(df[2, ]), split = "--", fixed = T)
time_df <- as.data.frame(time_list, col.names = c(1:length(date_list)),
stringsAsFactors = F)


The best thing I have up to now is

date_list <- sapply(strsplit(schedule$schedule, split = "|", fixed = T), "[", 1)
date_df <- t(data.frame(x1=sapply(strsplit(df1, split = "--", fixed = T), "[", 1),
x2=sapply(strsplit(df1, split = "--", fixed = T), "[", 2),
stringsAsFactors = F))
# and similarly for time_list and time_df.


Is there something more elegant?

Answer

tstrsplit from data.table package and str_split_fixed from stringr are pretty useful functions to get correct shaped data when splitting vectors of strings; The former provides transpose of the splitted string which allows you to extract the date and time separately without using apply function and the latter split strings into matrix with specified columns:

library(data.table); library(stringr)
lapply(tstrsplit(my_list, "\\|"), function(s) t(str_split_fixed(s, "--", 2)))

#[[1]]
#     [,1]     [,2]     [,3]    
#[1,] "Jan-01" "Jan-01" "Jul-06"
#[2,] "Dec-31" "Jun-30" "Dec-31"

#[[2]]
#     [,1]    [,2]    [,3]   
#[1,] "00:00" "12:00" "09:00"
#[2,] "24:00" "18:00" "19:00"
Comments