The Answer to Everything The Answer to Everything - 1 month ago 11
JSON Question

How to use Python to extract data from the Met Office JSON download

I am using Python 3.4.

I have started a project to download the UK Met Office Forecast data (in JSON format) and use the information as a weather compensator for my home heating system. I have succeeded in downloading the JSON datafile from the MET Office, and now I want to extract the info I need. I can do this by converting the file to a string and using

.find
and
.int
methods to extract the data, but this seems crude (but effective). As JSON is said to be a well-used data interchange format, there must be a better way to do this. I have found things like
json.load
and
json.loads
, and also
json.JSONDecoder.decode
but I haven't had any success in using these, and I really have little idea of what I am doing!

My code is:

import urllib.request
import json

#Comment: THIS IS THE CALL TO GET THE MET OFFICE FILE FROM THE INTERNET
#Comment: **** = my personal met office API key, which I had better keep to myself

response = urllib.request.urlopen('http://datapoint.metoffice.gov.uk/public/data/val/wxfcs/all/json/354037?res=3hourly&key=****')

FCData = response.read()
FCDataStr = str(FCData)

#Comment: END OF THE CALL TO GET MET OFFICE FILE FROM THE INTERNET
#Comment: Example of data extraction

ChPos = FCDataStr.find('"DV"') #Find "DV"
ChPos = FCDataStr.find('"dataDate"', ChPos, ChPos+50) #Find "dataDate"

FileDataDate = FCDataStr[ChPos+12:ChPos+22] #Extract the date of the file

#Comment: And so on


When using
json.loads(FCDataStr)
I get the following error message:


"ValueError: Expecting value: line 1 column 1 (char 0)"


By deleting the b' at the start and the ' at the end, this error goes away (see below). Printing the JSON file in string format, using
print(FCDataStr)
gives:

b'{"SiteRep":{"Wx":{"Param":[{"name":"F","units":"C","$":"Feels Like Temperature"},{"name":"G","units":"mph","$":"Wind Gust"},{"name":"H","units":"%","$":"Screen Relative Humidity"},{"name":"T","units":"C","$":"Temperature"},{"name":"V","units":"","$":"Visibility"},{"name":"D","units":"compass","$":"Wind Direction"},{"name":"S","units":"mph","$":"Wind Speed"},{"name":"U","units":"","$":"Max UV Index"},{"name":"W","units":"","$":"Weather Type"},{"name":"Pp","units":"%","$":"Precipitation Probability"}]},"DV":{"dataDate":"2014-07-29T20:00:00Z","type":"Forecast","Location":{"i":"354037","lat":"51.7049","lon":"-2.9022","name":"USK","country":"WALES","continent":"EUROPE","elevation":"43.0","Period":[{"type":"Day","value":"2014-07-29Z","Rep":[{"D":"NNW","F":"22","G":"11","H":"51","Pp":"4","S":"9","T":"24","V":"VG","W":"7","U":"7","$":"900"},{"D":"NW","F":"19","G":"16","H":"61","Pp":"8","S":"11","T":"22","V":"EX","W":"8","U":"1","$":"1080"},{"D":"NW","F":"16","G":"20","H":"70","Pp":"1","S":"11","T":"18","V":"VG","W":"2","U":"0","$":"1260"}]},{"type":"Day","value":"2014-07-30Z","Rep":[{"D":"NW","F":"13","G":"16","H":"84","Pp":"0","S":"7","T":"14","V":"VG","W":"0","U":"0","$":"0"},{"D":"WNW","F":"12","G":"13","H":"90","Pp":"0","S":"7","T":"13","V":"VG","W":"0","U":"0","$":"180"},{"D":"WNW","F":"13","G":"11","H":"87","Pp":"0","S":"7","T":"14","V":"GO","W":"1","U":"1","$":"360"},{"D":"SW","F":"18","G":"9","H":"67","Pp":"0","S":"4","T":"19","V":"VG","W":"1","U":"2","$":"540"},{"D":"WNW","F":"21","G":"13","H":"56","Pp":"0","S":"9","T":"22","V":"VG","W":"3","U":"6","$":"720"},{"D":"W","F":"21","G":"20","H":"55","Pp":"0","S":"11","T":"23","V":"VG","W":"3","U":"6","$":"900"},{"D":"W","F":"18","G":"22","H":"57","Pp":"0","S":"11","T":"21","V":"VG","W":"1","U":"2","$":"1080"},{"D":"WSW","F":"16","G":"13","H":"80","Pp":"0","S":"7","T":"16","V":"VG","W":"0","U":"0","$":"1260"}]},{"type":"Day","value":"2014-07-31Z","Rep":[{"D":"SW","F":"14","G":"11","H":"91","Pp":"0","S":"4","T":"15","V":"GO","W":"0","U":"0","$":"0"},{"D":"SW","F":"14","G":"11","H":"92","Pp":"0","S":"4","T":"14","V":"GO","W":"0","U":"0","$":"180"},{"D":"SW","F":"15","G":"11","H":"89","Pp":"3","S":"7","T":"16","V":"GO","W":"3","U":"1","$":"360"},{"D":"WSW","F":"17","G":"20","H":"79","Pp":"28","S":"11","T":"18","V":"GO","W":"3","U":"2","$":"540"},{"D":"WSW","F":"18","G":"22","H":"72","Pp":"34","S":"11","T":"20","V":"GO","W":"10","U":"5","$":"720"},{"D":"WSW","F":"18","G":"22","H":"66","Pp":"13","S":"11","T":"20","V":"VG","W":"7","U":"5","$":"900"},{"D":"WSW","F":"17","G":"22","H":"69","Pp":"36","S":"11","T":"19","V":"VG","W":"10","U":"2","$":"1080"},{"D":"WSW","F":"16","G":"16","H":"84","Pp":"6","S":"9","T":"17","V":"GO","W":"2","U":"0","$":"1260"}]},{"type":"Day","value":"2014-08-01Z","Rep":[{"D":"SW","F":"16","G":"13","H":"91","Pp":"4","S":"7","T":"16","V":"GO","W":"7","U":"0","$":"0"},{"D":"SW","F":"15","G":"11","H":"93","Pp":"5","S":"7","T":"16","V":"GO","W":"7","U":"0","$":"180"},{"D":"SSW","F":"15","G":"11","H":"93","Pp":"7","S":"7","T":"16","V":"GO","W":"7","U":"1","$":"360"},{"D":"SSW","F":"17","G":"18","H":"79","Pp":"14","S":"9","T":"18","V":"GO","W":"7","U":"2","$":"540"},{"D":"SSW","F":"17","G":"22","H":"74","Pp":"43","S":"11","T":"19","V":"GO","W":"10","U":"5","$":"720"},{"D":"SW","F":"16","G":"22","H":"81","Pp":"48","S":"11","T":"18","V":"GO","W":"10","U":"5","$":"900"},{"D":"SW","F":"16","G":"18","H":"80","Pp":"55","S":"9","T":"17","V":"GO","W":"12","U":"1","$":"1080"},{"D":"SSW","F":"15","G":"16","H":"89","Pp":"38","S":"7","T":"16","V":"GO","W":"9","U":"0","$":"1260"}]},{"type":"Day","value":"2014-08-02Z","Rep":[{"D":"S","F":"14","G":"11","H":"94","Pp":"15","S":"7","T":"15","V":"GO","W":"7","U":"0","$":"0"},{"D":"SSE","F":"14","G":"11","H":"94","Pp":"16","S":"7","T":"15","V":"GO","W":"7","U":"0","$":"180"},{"D":"S","F":"14","G":"13","H":"93","Pp":"36","S":"7","T":"15","V":"GO","W":"10","U":"1","$":"360"},{"D":"S","F":"15","G":"20","H":"84","Pp":"62","S":"11","T":"17","V":"GO","W":"14","U":"2","$":"540"},{"D":"SSW","F":"16","G":"22","H":"78","Pp":"63","S":"11","T":"18","V":"GO","W":"14","U":"5","$":"720"},{"D":"WSW","F":"16","G":"27","H":"66","Pp":"59","S":"13","T":"19","V":"VG","W":"14","U":"5","$":"900"},{"D":"WSW","F":"15","G":"25","H":"68","Pp":"39","S":"13","T":"18","V":"VG","W":"10","U":"2","$":"1080"},{"D":"SW","F":"14","G":"16","H":"80","Pp":"28","S":"9","T":"15","V":"VG","W":"0","U":"0","$":"1260"}]}]}}}}'


The result of using:

DecodedJSON = json.loads(FCDataStr)
print(DecodedJSON)


gives a very similar result to the original FCDataStr file.

How do I proceed to extract the data (such as temperature, wind speed etc for each 3 hourly forecast) from the file?

Answer

This is the problem:

FCDataStr = str(FCData)

When you call str on a bytes object, what you get is the string representation of a bytes object—in quotes, with a b prefix, and with special characters backslash-escaped.

If you wanted to decode the binary data to text, you have to do that with the decode method:

FCDataStr = FCData.decode('utf-8')

(I'm guessing UTF-8 because JSON is always supposed to be in UTF-8 unless otherwise specified.)


In more detail:

urllib.request.urlopen returns an http.client.HTTPResponse, which is a binary file-like object, (which implements io.RawIOBase).

You can't pass that to json.load because it wants a text-file-like object—something with a read method that returns str, not bytes. You could wrap your HTTPResponse in an io.BufferedReader, then wrap than in an io.TextIOBase (with encoding='utf-8'), then pass that to json.load, but that's probably more work than you want to do.

So, the simplest thing to do is exactly what you were trying to do, just using decode instead of str:

data_bytes = response.read() data_str = data_bytes.decode('utf-8') data_dict = json.loads(data_str)


Then, don't try to access the data in data_str—that's just a string, representing the JSON encoding of your data; data_dict is the actual data.

For example, to find the dataDate of the DV of the SiteRep, you just do this:

data_dict['SiteRep']['DV']['DataDate']

That will get you the string '2014-07-31T14:00:00Z'. You'll still probably want to convert to that to a datetime.datetime object (because JSON only understands a few basic types: strings, numbers, lists, and dicts). But it's still a lot better than trying to pick it out of data_str by find-ing or guessing at the offsets.


My guess is that you've found some sample code written for Python 2.x, where you can convert between byte strings and Unicode strings just by calling the appropriate constructors, without specifying an encoding, which would default to sys.getdefaultencoding(), and often (at least on Mac or most modern Linux distros) that's UTF-8, so it just happened to work despite being wrong. In which case you may want to find some better sample code to learn from…