Agustin Indaco Agustin Indaco - 1 month ago 10
R Question

R: extracting characters between space and first ":"

I am trying to extract the hour (only) from a variable that has date and time. There are several questions explaining how to extract "%H:%M" from "%m/%d/%Y %H:%M" but my data is structured as "%m/%d/%Y %H:%M" for some observations and as "%m/%d/%Y %H:%M:%S" for others. Furthermore, I don't always have two diigits for hour (one digit hours are in one digit, no leading zero). So using the following, will not work:

df$hour <- format(as.POSIXct(citistation$starttime, format="%m/%d/%Y %H:%M:%S"), format="%H")


Sample of my data:

date <- c("1/1/2013 0:01","12/31/2013 21:49:19")


I am leaning towards something that extracts numbers between space and first ":". Any suggestions? Thanks.

Answer

We can use sub. Match one or more non-white space (\\S+) followed by one or more space (\\s+), capture the one or more characters that are not a : (([^:]+)) followed by a : and characters until the end of the string, replace it with the the backreference (\\1) of the capture group.

sub("\\S+\\s+([^:]+):.*", "\\1", date)
#[1] "0"  "21"

It is better to convert to 'DateTime' class and extract the hour

library(lubridate)
hour(parse_date_time(date, c('mdy_HM', 'mdy_HMS')))
#[1]  0 21

data

date <- c('1/1/2013 0:01','12/31/2013 21:49:19')
Comments