I'm still very new to R and I apologize if I'm not using the proper terminology. I'm interested in pulling a large amount of Unemployment Insurance Trust Fund data from the Treasury Direct online report query system (http://www.treasurydirect.gov/govt/reports/tfmp/tfmp_utf.htm) and I've successfully pulled the information using
ESAA_OCT15 <- readLines('http://www.treasurydirect.gov/govt/reports/tfmp/utf/es/dfiw01015tses.txt')
This will help you parse the information:
ESAA_OCT15 <- readLines('http://www.treasurydirect.gov/govt/reports/tfmp/utf/es/dfiw01015tses.txt') # Select lines with / z = grepl(pattern = "/",x = ESAA_OCT15) d = trimws(ESAA_OCT15[z]) dates = substr(d,0,10) sharesPar = substr(d,11,41)
What this does is first select all lines that contain a
/ character. This will even return the column titles. These are stored in
If you examine d:
 "Effective Date Shares/Par Description Code Memo Number Code Account Number"  "10/01/2015 2,313,000.0000 12-10 FUTA RECEIPTS 3305617 ESAA"  "10/01/2015 3,663,000.0000 12-10 FUTA RECEIPTS 3305618 ESAA"  "10/02/2015 4,314,000.0000 12-10 FUTA RECEIPTS 3305640 ESAA"  "10/05/2015 3,512,000.0000 12-10 FUTA RECEIPTS 3305662 ESAA"
The information is aligned neatly. This means that the data of each column ends at a precise position. To parse this you can use
substr with start and stop as shown in my script.
Of course, I did not complete all parses, I'll let you finish the rest. Once each column is parsed, create a
data.frame(dates, sharesPar, ...)