BigQuery has NOAA's gsod data loaded as a public dataset - starting in 1929: https://www.reddit.com/r/bigquery/comments/2ts9wo/noaa_gsod_weather_data_loaded_into_bigquery/
How can I retrieve the historical data for any city?
To retrieve the historical weather for any city, first we need to find what station reports in that city. The table
[fh-bigquery:weather_gsod.stations] contains the name of known stations, their state (if in the US), country, and other details.
So to find all the stations in Austin, TX, we would use a query like this:
SELECT state, name, lat, lon FROM [fh-bigquery:weather_gsod.stations] WHERE country='US' AND state='TX' AND name CONTAINS 'AUST' LIMIT 10
This approach has 2 problems that need to be solved:
To solve the second problem, we need to join the stations table with the actual data we are looking for. The following query looks for stations around Austin, and the column
c looks at how many days during 2015 have actual data:
SELECT state, name, FIRST(a.wban) wban, FIRST(a.stn) stn, COUNT(*) c, INTEGER(SUM(IF(prcp=99.99,0,prcp))) rain, FIRST(lat) lat, FIRST(lon) long FROM [fh-bigquery:weather_gsod.gsod2015] a JOIN [fh-bigquery:weather_gsod.stations] b ON a.wban=b.wban AND a.stn=b.usaf WHERE country='US' AND state='TX' AND name CONTAINS 'AUST' GROUP BY 1,2 LIMIT 10
That's good! We found 4 stations with data for Austin during 2015.
Note that we had to treat "rain" in a special way: When a station doesn't monitor for rain, instead of
null, it marks it as 99.99. Our query filters those values out.
Now that we know the stn and wban numbers for these stations, we can pick any of them and visualize the results:
SELECT TIMESTAMP('2015'+mo+da) day, AVG(min) min, AVG(max) max, AVG(IF(prcp=99.99,0,prcp)) prcp FROM [fh-bigquery:weather_gsod.gsod2015] WHERE stn='722540' AND wban='13904' GROUP BY 1 ORDER BY day