stephan stephan - 1 month ago 6
R Question

R: How to query for date format in output from scan?

I have hear a really silly output format from observations which I've to read in with scan.

Here's a snipplet from (data.dat), where I've marked header and data blocks:

06.02.2014 # header
PNP
-0,005
00:05#587 # values
00:15#591
23:50#587
23:55#587
07.02.2014 # header
PNP
-0,005
00:10#587 # values
00:15#590
23:55#590
24:00#593
08.02.2014 # header
PNP
-0,005
00:05#590 # value
00:10#595
00:15#600
23:50#600
23:55#607


The problems are:


  • I've got date for several years in 5min resolution,

  • each day has is own header (constant length), beginning with the date and two additional entries,

  • the length of the time series (format HH:MM#value)for each day is not constant, data gaps exists (not shown in the example)



My aim is a data.frame of the form date, time, value.

So, I need a loop or something, which analyses the single list elements (output from scan(file=data.dat, what=" ") as character). Since the time blocks have different lengths, I'd like to subsetting my daily data beginning with the date, skipping some further header elements, and than strsplit the time#value elements of the list, which has been outputted by

crap <- scan(file = data.dat, what=" ") # import as list


the strsplit works well with

tmp <- strsplit(crap[4:8], split="#")
df <- data.frame(date=as.Date(crap[1],format = "%d.%m.%Y"), time=sapply(tmp, "[[", 1), W=sapply(tmp, "[[", 2))


However, I've no idea how to analyse the elements from the list (as characters), if they have an valid date format.

Cheers!

Answer

I have a solution but it may be very specific to the question you asked and what I interpreted.

First read the data and remove the PNP and -0,005 from the data.

crap     <- read.table(file = "data.dat",comment.char = " ")
a        <- as.vector(crap$V1)
a        <- a[-grep("PNP|-0,005",x = a)]

Now I extract the dates contained in the vector a

dateId   <- grep(".",x=a,fixed=T)
uniquedate    <- as.matrix(a[dateId])
> uniquedate
     [,1]        
[1,] "06.02.2014"
[2,] "07.02.2014"
[3,] "08.02.2014"

Now I create a vector of dates of same length as no. of values in the dataset by repeating the dates for the number of values present in the corresponding date.

len      <- length(dateId)
dateRepVal    <- c(diff(dateId)-1,(length(a) - dateId[len]))
dates     <- unlist(sapply(1:len,FUN = function(x){rep(uniquedate[x],dateRepVal[x])}))

All other elements expect the date in our dataset "a" are time-value pair.using this information now I get the time and val by using the strsplit function and then create the dataframe.

timeVal   <- strsplit(a[-dateId],split = "#")
time      <- sapply(timeVal, "[[", 1)
val       <- sapply(timeVal, "[[", 2)
DF        <- data.frame(date = dates,time=time,val=val)

The final required output looks like below.

>DF
         date  time val
1  06.02.2014 00:05 587
2  06.02.2014 00:15 591
3  06.02.2014 23:50 587
4  06.02.2014 23:55 587
5  07.02.2014 00:10 587
6  07.02.2014 00:15 590
7  07.02.2014 23:55 590
8  07.02.2014 24:00 593
9  08.02.2014 00:05 590
10 08.02.2014 00:10 595
11 08.02.2014 00:15 600
12 08.02.2014 23:50 600
13 08.02.2014 23:55 607

Hope this solves the problem.

Comments