Noobie Noobie - 1 month ago 18
R Question

xts: how to keep control of data types after as.xts?

Consider the following dataframe

time <-c('2016-04-13 23:07:45','2016-04-13 23:07:55','2016-04-13 23:08:45','2016-04-13 23:08:45'
,'2016-04-13 23:08:45','2016-04-13 23:07:50','2016-04-13 23:07:51')
group <-c('A','A','A','B','B','B','B')
value<- c(5,10,2,2,NA,1,4)
df=data.frame(time,group,value)

> df
time group value
1 2016-04-13 23:07:45 A 5
2 2016-04-13 23:07:55 A 10
3 2016-04-13 23:08:45 A 2
4 2016-04-13 23:08:45 B 2
5 2016-04-13 23:08:45 B NA
6 2016-04-13 23:07:50 B 1
7 2016-04-13 23:07:51 B 4


Note the missing value row
5
. Now I convert to
xts
after using
lubridate
to convert my timestamps to proper Posix types.

> df$time = ymd_hms(df$time)
> df<-as.xts(df,order.by=df$time)
> df
time group value
2016-04-13 23:07:45 "2016-04-13 23:07:45" "A" " 5"
2016-04-13 23:07:50 "2016-04-13 23:07:50" "B" " 1"
2016-04-13 23:07:51 "2016-04-13 23:07:51" "B" " 4"
2016-04-13 23:07:55 "2016-04-13 23:07:55" "A" "10"
2016-04-13 23:08:45 "2016-04-13 23:08:45" "A" " 2"
2016-04-13 23:08:45 "2016-04-13 23:08:45" "B" " 2"
2016-04-13 23:08:45 "2016-04-13 23:08:45" "B" NA


and my nice
numeric
column
value
is now a
character
!

How can I avoid that?

Thanks!

Answer

The underlying xts object for the data is a matrix which can be numeric or character type, but not both (unlike data.frame which is a list where each column can be of any atomic type in R). A rough check of seeing this happen is to try this:

> as.matrix(df)
     time                  group value
[1,] "2016-04-13 23:07:45" "A"   " 5" 
[2,] "2016-04-13 23:07:55" "A"   "10" 
[3,] "2016-04-13 23:08:45" "A"   " 2" 
[4,] "2016-04-13 23:08:45" "B"   " 2" 
[5,] "2016-04-13 23:08:45" "B"   NA   
[6,] "2016-04-13 23:07:50" "B"   " 1" 
[7,] "2016-04-13 23:07:51" "B"   " 4"

which is what coredata returns when creating an xts object:

x.df<- xts(df,order.by=df$time)
> coredata(x.df)
     time                  group value
[1,] "2016-04-13 23:07:45" "A"   " 5" 
[2,] "2016-04-13 23:07:50" "B"   " 1" 
[3,] "2016-04-13 23:07:51" "B"   " 4" 
[4,] "2016-04-13 23:07:55" "A"   "10" 
[5,] "2016-04-13 23:08:45" "A"   " 2" 
[6,] "2016-04-13 23:08:45" "B"   " 2" 
[7,] "2016-04-13 23:08:45" "B"   NA   

Remove the time and group columns when creating the xts object to get numeric data as you expect. You could map the group column types to integers. You also shouldn't include time in the xts object creation for the x argument, as your order.by is already containing the time information.

e.g.

df$group_idx <- as.numeric(as.factor(df$group))
x.df<- xts(df[, c("group_idx", "value")],order.by=df$time)
> x.df
                    group_idx value
2016-04-13 23:07:45         1     5
2016-04-13 23:07:50         2     1
2016-04-13 23:07:51         2     4
2016-04-13 23:07:55         1    10
2016-04-13 23:08:45         1     2
2016-04-13 23:08:45         2     2
2016-04-13 23:08:45         2    NA
Comments