njfrazie njfrazie - 2 months ago 24
Python Question

Python 3.5: How to read a db of JSON objects

so I'm new to working with JSON and I'm trying to work with the openrecipe database from here. The db dump you get looks like this...

{ "_id" : { "$oid" : "5160756d96cc62079cc2db16" }, "name" : "Hot Roast Beef Sandwiches", "ingredients" : "12 whole Dinner Rolls Or Small Sandwich Buns (I Used Whole Wheat)\n1 pound Thinly Shaved Roast Beef Or Ham (or Both!)\n1 pound Cheese (Provolone, Swiss, Mozzarella, Even Cheez Whiz!)\n1/4 cup Mayonnaise\n3 Tablespoons Grated Onion (or 1 Tbsp Dried Onion Flakes))\n1 Tablespoon Poppy Seeds\n1 Tablespoon Spicy Mustard\n1 Tablespoon Horseradish Mayo Or Straight Prepared Horseradish\n Dash Of Worcestershire\n Optional Dressing Ingredients: Sriracha, Hot Sauce, Dried Onion Flakes Instead Of Fresh, Garlic Powder, Pepper, Etc.)", "url" : "http://thepioneerwoman.com/cooking/2013/03/hot-roast-beef-sandwiches/", "image" : "http://static.thepioneerwoman.com/cooking/files/2013/03/sandwiches.jpg", "ts" : { "$date" : 1365276013902 }, "cookTime" : "PT20M", "source" : "thepioneerwoman", "recipeYield" : "12", "datePublished" : "2013-03-13", "prepTime" : "PT20M", "description" : "When I was growing up, I participated in my Episcopal church's youth group, and I have lots of memories of weekly meetings wh..." }
{ "_id" : { "$oid" : "5160756f96cc6207a37ff777" }, "name" : "Morrocan Carrot and Chickpea Salad", "ingredients" : "Dressing:\n1 tablespoon cumin seeds\n1/3 cup / 80 ml extra virgin olive oil\n2 tablespoons fresh lemon juice\n1 tablespoon honey\n1/2 teaspoon fine sea salt, plus more to taste\n1/8 teaspoon cayenne pepper\n10 ounces carrots, shredded on a box grater or sliced whisper thin on a mandolin\n2 cups cooked chickpeas (or one 15- ounce can, drained and rinsed)\n2/3 cup / 100 g dried pluots, plums, or dates cut into chickpea-sized pieces\n1/3 cup / 30 g fresh mint, torn\nFor serving: lots of toasted almond slices, dried or fresh rose petals - all optional (but great additions!)", "url" : "http://www.101cookbooks.com/archives/moroccan-carrot-and-chickpea-salad-recipe.html", "image" : "http://www.101cookbooks.com/mt-static/images/food/moroccan_carrot_salad_recipe.jpg", "ts" : { "$date" : 1365276015332 }, "datePublished" : "2013-01-07", "source" : "101cookbooks", "prepTime" : "PT15M", "description" : "A beauty of a carrot salad - tricked out with chickpeas, chunks of dried pluots, sliced almonds, and a toasted cumin dressing. Thank you Diane Morgan." }
{ "_id" : { "$oid" : "5160757096cc62079cc2db17" }, "name" : "Mixed Berry Shortcake", "ingredients" : "Biscuits\n3 cups All-purpose Flour\n2 Tablespoons Baking Powder\n3 Tablespoons Sugar\n1/2 teaspoon Salt\n1-1/2 stick (3/4 Cup) Cold Butter, Cut Into Pieces\n1-1/4 cup Buttermilk\n1/2 teaspoon Almond Extract (optional)\n Berries\n2 pints Mixed Berries And/or Sliced Strawberries\n1/3 cup Sugar\n Zest And Juice Of 1 Small Orange\n SWEET YOGURT CREAM\n1 package (7 Ounces) Plain Greek Yogurt\n1 cup Cold Heavy Cream\n1/2 cup Sugar\n2 Tablespoons Brown Sugar", "url" : "http://thepioneerwoman.com/cooking/2013/03/mixed-berry-shortcake/", "image" : "http://static.thepioneerwoman.com/cooking/files/2013/03/shortcake.jpg", "ts" : { "$date" : 1365276016700 }, "cookTime" : "PT15M", "source" : "thepioneerwoman", "recipeYield" : "8", "datePublished" : "2013-03-18", "prepTime" : "PT15M", "description" : "It's Monday! It's a brand new week! The birds are chirping! The coffee's brewing! Everything has such hope and promise! A..." }


I tried the following code to read in the database

import json

f = r'<file_path>\recipeitems-latest.json'

with open(f) as dfile:
data = json.load(dfile)

print(data)


With this I received the following Traceback

Traceback (most recent call last):
File "C:/Users/<redacted>/Documents/<redacted>/project/test_json.py", line 7, in <module>
data = json.load(dfile)
File "C:\Users\<redacted>\AppData\Local\Continuum\Anaconda3\Lib\json\__init__.py", line 265, in load
return loads(fp.read(),
File "C:\Users\<redacted>\AppData\Local\Continuum\Anaconda3\Lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 101915: character maps to <undefined>


The only way I could find around this error was to only have one entry in the json file. Is the db formatted incorrectly or am I reading in the data wrong?

Thanks for any help!

Answer

The file is not a json array. Each line of the file is a json document, but the whole file is not in json format.

Read the file by lines, and use json.loads:

with open('some_file') as f:
  for line in f:
     doc = json.loads(line)

You may also need to pass the encoding parameter to open(). See here.

Comments