Bradley Bradley - 1 month ago 6
R Question

Reading text file: read.table versus read_table

I'm reading a text file from this webpage into R. If I read this data with

read.table
the data is parsed correctly and I get data for all 12 months:

url <- "http://academic.udayton.edu/kissock/http/Weather/gsod95-current/OHCINCIN.txt"

temp_df1 <- read.table(url,
col.names = c("Month", "Day", "Year", "Avg_Temp"),
na = "-99")

head(temp_df1)
Month Day Year Avg_Temp
1 1 1 1995 41.1
2 1 2 1995 22.2
3 1 3 1995 22.8
4 1 4 1995 14.9
5 1 5 1995 9.5
6 1 6 1995 23.8

unique(temp_df1$Month)
[1] 1 2 3 4 5 6 7 8 9 10 11 12


However, if I read this data in with
read_table
it will first appear that it is parsed correctly; however, the double digit month codes (10, 11, 12) are being stripped such that only the first digit is parsed.

temp_df2 <- read_table(url,
col_names = c("Month", "Day", "Year", "Avg_Temp"),
na = "-99")

head(temp_df2)
# A tibble: 6 × 4
Month Day Year Avg_Temp
<int> <int> <int> <dbl>
1 1 1 1995 41.1
2 1 2 1995 22.2
3 1 3 1995 22.8
4 1 4 1995 14.9
5 1 5 1995 9.5
6 1 6 1995 23.8

unique(temp_df2$Month)
[1] 1 2 3 4 5 6 7 8 9


The dimensions of the data are the same; however, I cannot figure out how to import the data with
read_table
to preserve the full Month coding.

dim(temp_df1)
[1] 7963 4

dim(temp_df2)
[1] 7963 4

Answer

read_table doesn't work as expected due to the issue mentioned in the comments by LukeA. Instead you should use the read_fwf function and specify the field lengths to avoid this issue.

temp_df2 <- read_fwf(url, 
    col_positions = fwf_widths(c(14, 14, 13, 4), col_names = c("Month", "Day", "Year", "Avg_Temp")))

Keep in mind that for read_fwf, col_names is passed as an argument to fwf_widths and not to read_fwf itself.

Additionally, with read_fwf you can even skip a step, and parse the date as a Date object while you are reading it in:

temp_df2 <- read_fwf(url,
                  col_positions = fwf_widths(c(41, 4),
                                             col_names = c("date", "Avg_Temp")), 
                  col_types = cols(col_date("%m %d %Y"), col_number()))
Comments