sc73 sc73 - 3 months ago 8
R Question

Importing irregular data in r

Im hoping someone can help me with a data import question, I think it may be an easy fix, but haven't found the answer. I have a large number of txt files containing antenna scans and I need to import them in a uniform configuration. The problem is that they all contain an irregular number of lines of diagnostic data about the antenna before the actual data starts. I need a function that can identify when the actual data begins, so I can then import it with the correct data in the correct columns. Basically, for each file, I need to identify the number of lines of diagnostic code, so I can specify skip=" " when inputing the file with read.delim or something similar.

Heres an example of one of the files that I'm talking about:

Power OFF @ 12:05:50 02/15/13
Power ON @ 12:06:03 02/15/13
Reader #1 12:06:03 02/15/13

Reader #2 12:06:03 02/15/13

Battery Voltage = 13.35 @ 13:00:00 02/15/13
Battery Voltage = 13.42 @ 14:00:00 02/15/13
Battery Voltage = 13.32 @ 15:00:00 02/15/13
Battery Voltage = 13.55 @ 16:00:00 02/15/13

Reader #2 02:57:40 02/17/13 LA 900 226000012999

Reader #2 02:57:40 02/17/13 LA 900 226000012999

Reader #2 02:57:40 02/17/13 LA 900 226000012999

Reader #2 02:57:40 02/17/13 LA 900 226000012999

Answer

You could read the file as a block of text and use grep to identify the lines you want to get rid off. Here, I stored your block of text in test.txt. Let's say your header goes all the way to the Battery Voltage part, you could first identify the line numbers that contain Battery and then find the last instance of it. That will be the number of lines to skip.

con = file('test.txt', 'r')
text = readLines(con)
close(con)

lines_to_skip = max(grep('Battery',text))    

You should then read your data just fine.

> x = read.table('test.txt', skip=lines_to_skip, sep=' ', comment.char='')
> x
  V1     V2       V3       V4 V5  V6       V7
1 Reader #2 02:57:40 02/17/13 LA 900 2.26e+11
2 Reader #2 02:57:40 02/17/13 LA 900 2.26e+11
3 Reader #2 02:57:40 02/17/13 LA 900 2.26e+11
4 Reader #2 02:57:40 02/17/13 LA 900 2.26e+11