simongraham simongraham - 5 months ago 21
JSON Question

Data exploration JSON nested data in pandas

How can I get my JSON data into a reasonable data frame? I have a deeply nested file which I aim to get into a large data frame. All is described in the Github repository below:

http://www.github.com/simongraham/dataExplore.git

Answer

With nested jsons, you will need to walk through the levels, extracting needed segments. For the nutrition segment of the larger json, consider iterating through every nutritionPortions level and each time running the pandas normalization and concatenating to final dataframe:

import pandas as pd
import json

with open('/Users/simongraham/Desktop/Kaido/Data/kaidoData.json') as f:
    data = json.load(f)

# INITIALIZE DF
nutrition = pd.DataFrame()

# ITERATIVELY CONCATENATE
for item in data[0]["nutritionPortions"]:    
    if 'ftEnergyKcal' in item.keys():      # MISSING IN 3 OF 53 LEVELS
        temp = (pd.io
            .json
            .json_normalize(item, 'nutritionNutrients',
                ['vcNutritionId','vcUserId','vcPortionId','vcPortionName','vcPortionSize',
                 'ftEnergyKcal', 'vcPortionUnit','dtConsumedDate'])
            )
        nutrition = pd.concat([nutrition, temp])

nutrition.head()

Output

   ftValue  nPercentRI    vcNutrient                  vcNutritionPortionId  \
0     0.00         0.0       alcohol  c993ac30-ecb4-4154-a2ea-d51dbb293f66   
1     0.00         0.0          bcfa  c993ac30-ecb4-4154-a2ea-d51dbb293f66   
2     7.80         6.0        biotin  c993ac30-ecb4-4154-a2ea-d51dbb293f66   
3    49.40         2.0       calcium  c993ac30-ecb4-4154-a2ea-d51dbb293f66   
4     1.82         0.0  carbohydrate  c993ac30-ecb4-4154-a2ea-d51dbb293f66   

  vcTrafficLight vcUnit       dtConsumedDate  \
0                     g  2016-04-12T00:00:00   
1                     g  2016-04-12T00:00:00   
2                   µg  2016-04-12T00:00:00   
3                    mg  2016-04-12T00:00:00   
4                     g  2016-04-12T00:00:00   

                          vcNutritionId  ftEnergyKcal  \
0  070b97a4-d562-427d-94a8-1de1481df5d1          18.2   
1  070b97a4-d562-427d-94a8-1de1481df5d1          18.2   
2  070b97a4-d562-427d-94a8-1de1481df5d1          18.2   
3  070b97a4-d562-427d-94a8-1de1481df5d1          18.2   
4  070b97a4-d562-427d-94a8-1de1481df5d1          18.2   

                               vcUserId vcPortionName vcPortionSize  \
0  fe585e3d-2863-46fe-a41f-290bf58ad169         1 mug           260   
1  fe585e3d-2863-46fe-a41f-290bf58ad169         1 mug           260   
2  fe585e3d-2863-46fe-a41f-290bf58ad169         1 mug           260   
3  fe585e3d-2863-46fe-a41f-290bf58ad169         1 mug           260   
4  fe585e3d-2863-46fe-a41f-290bf58ad169         1 mug           260   

  vcPortionId vcPortionUnit  
0           2            ml  
1           2            ml  
2           2            ml  
3           2            ml  
4           2            ml  
Comments