Ravisara Ravisara - 1 year ago 57
R Question

Time series analysis with R statistical software

I have a problem when analysing a dataset.

786650417, 3197478917, 264665794, **2016-12-01T00:00:00Z**, 1, 4, 94717580778


Above is one column in my data set.I want to extract the date and time (which is highlighted) and create a new column with that.

I have tried the following code:

sms <- sms %>%
mutate(date =str_extract_all(data , "[0-9]+\\-"))


From this code, I got the date which is separated with (-), but not correctly.

sms <- sms %>%
mutate(time =str_extract_all(data , "[0-9]+\\:"))


From this, I got the time.

Then, I removed the inverted commas using the below R code.

del <- colwise(function(sms) str_replace_all(sms, '\"', ""))
sms <- del(sms)


So, finally, it looked like this.

786650417, 3197478917, 264665794, 2016-12-01T00:00:00Z, 1, 4, 94717580778


I made two columns from the above row. Those two rows are below.

date time
c(2016-, 12-) c(00:, 00:)


But now, I cannot perform any forecasting methods since the date and time are not in the correct format.

Please help me to extract the date and time as the right type from this record.

786650417, 3197478917, 264665794, 2016-12-01T00:00:00Z, 1, 4, 94717580778


In my dataset there are 2 million records like this.

Answer Source

Let's try this -

library(stringr)
library(chron)
df <- data.frame(raw_data=c("786650417,3197478917,264665794,2016-12-01T00:00:00Z,1,4,94717580778",
                            "786650417,3197478917,264665794,2016-12-01T00:02:17Z,1,4,94717580778",
                            "786650417,3197478917,264665794,2016-12-01T00:28:10Z,1,4,94717580778"))
df$date <- as.Date(str_extract(df$raw_data, "\\d{4}-\\d{2}-\\d{2}"))
df$time <- chron(times=str_extract(df$raw_data, "\\d{2}:\\d{2}:\\d{2}"))
df$datetime <- strptime(paste(str_extract(df$raw_data, "\\d{4}-\\d{2}-\\d{2}"), str_extract(df$raw_data, "\\d{2}:\\d{2}:\\d{2}"), sep = " "), format="%Y-%m-%d %H:%M:%S")
df

note: Time can be formatted as per the real timezone of collected data- you just need to give one more parameter to strptime() e.g. tz = "GMT" if timestamp is in GMT format or else by default your local system's timezone will be selected (as in this case it's IST).


Don't forget to let us know if it solved your problem by marking it as the right answer :)

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download