Agustin Indaco Agustin Indaco - 6 months ago 34
R Question

R: extracting characters between space and first ":"

I am trying to extract the hour (only) from a variable that has date and time. There are several questions explaining how to extract "%H:%M" from "%m/%d/%Y %H:%M" but my data is structured as "%m/%d/%Y %H:%M" for some observations and as "%m/%d/%Y %H:%M:%S" for others. Furthermore, I don't always have two diigits for hour (one digit hours are in one digit, no leading zero). So using the following, will not work:

df$hour <- format(as.POSIXct(citistation$starttime, format="%m/%d/%Y %H:%M:%S"), format="%H")

Sample of my data:

date <- c("1/1/2013 0:01","12/31/2013 21:49:19")

I am leaning towards something that extracts numbers between space and first ":". Any suggestions? Thanks.


We can use sub. Match one or more non-white space (\\S+) followed by one or more space (\\s+), capture the one or more characters that are not a : (([^:]+)) followed by a : and characters until the end of the string, replace it with the the backreference (\\1) of the capture group.

sub("\\S+\\s+([^:]+):.*", "\\1", date)
#[1] "0"  "21"

It is better to convert to 'DateTime' class and extract the hour

hour(parse_date_time(date, c('mdy_HM', 'mdy_HMS')))
#[1]  0 21


date <- c('1/1/2013 0:01','12/31/2013 21:49:19')