simongraham simongraham - 5 months ago 21
JSON Question

viewing nested JSON data into a pandas dataframe

I am currently working with nutritional data for a project, where the data is in raw JSON format, and I want to use python and pandas to obtain an understandable data frame. I understand that this is an easy task when the JSON is not nested. Here I would use:

nutrition = pd.read_json('data')


However I have nested information and I am finding it very difficult to get it into a reasonable data frame. The JSON format is as follows, where the nutritionNutrients element itself is a nested element. The nest for this element will describe the nutritional contents for a variety of different things such as alcohol and bcfa, as included. I have only included a sample because this is a large data file.

[
{
"vcNutritionPortionId": "478d1905-f264-4d9b-ab76-0ed4252193fd",
"vcNutritionId": "2476378b-79ee-4857-a81d-489661a039a1",
"vcUserId": "cc51145b-5a70-4344-9b55-1a4455f0a9d2",
"vcPortionId": "1",
"vcPortionName": "1 average pepper",
"vcPortionSize": "20",
"ftEnergyKcal": 5.2,
"vcPortionUnit": "g",
"dtConsumedDate": "2016-05-04T00:00:00",
"nutritionNutrients": [
{
"vcNutritionPortionId": "478d1905-f264-4d9b-ab76-0ed4252193fd",
"vcNutrient": "alcohol",
"ftValue": 0,
"vcUnit": "g",
"nPercentRI": 0,
"vcTrafficLight": ""
},
{
"vcNutritionPortionId": "478d1905-f264-4d9b-ab76-0ed4252193fd",
"vcNutrient": "bcfa",
"ftValue": 0,
"vcUnit": "g",
"nPercentRI": 0,
"vcTrafficLight": ""
},
{
"vcNutritionPortionId": "478d1905-f264-4d9b-ab76-0ed4252193fd",
"vcNutrient": "biotin",
"ftValue": 0,
"vcUnit": "µg",
"nPercentRI": 0,
"vcTrafficLight": ""
},
...
]
}
]


Any help would be appreciated.

Thanks.

....
....

Now that I have found out how to solve this problem using json_normalize, I return the same problem, but this time my code is nested twice. Ie:

[
{
...
}
[,
"nutritionPortions": [
{
"vcNutritionPortionId": "478d1905-f264-4d9b-ab76-0ed4252193fd",
"vcNutritionId": "2476378b-79ee-4857-a81d-489661a039a1",
"vcUserId": "cc51145b-5a70-4344-9b55-1a4455f0a9d2",
"vcPortionId": "1",
"vcPortionName": "1 average pepper",
"vcPortionSize": "20",
"ftEnergyKcal": 5.2,
"vcPortionUnit": "g",
"dtConsumedDate": "2016-05-04T00:00:00",
"nutritionNutrients": [
{
"vcNutritionPortionId": "478d1905-f264-4d9b-ab76-0ed4252193fd",
"vcNutrient": "alcohol",
"ftValue": 0,
"vcUnit": "g",
"nPercentRI": 0,
"vcTrafficLight": ""
},
{
"vcNutritionPortionId": "478d1905-f264-4d9b-ab76-0ed4252193fd",
"vcNutrient": "bcfa",
"ftValue": 0,
"vcUnit": "g",
"nPercentRI": 0,
"vcTrafficLight": ""
},
{
"vcNutritionPortionId": "478d1905-f264-4d9b-ab76-0ed4252193fd",
"vcNutrient": "biotin",
"ftValue": 0,
"vcUnit": "µg",
"nPercentRI": 0,
"vcTrafficLight": ""
},
...
}
]
}
]


When I have a JSON consisting of only nutrition data I can use:

nutrition = (pd.io
.json
.json_normalize((data, ['nutritionPortions']), 'nutritionNutrients',
['vcNutritionId','vcUserId','vcPortionId','vcPortionName','vcPortionSize',
'ftEnergyKcal','vcPortionUnit','dtConsumedDate'])
)


However, my data does not only contain nutrition information. For example it will contain activity information, and therefore the nutrition information is nested with "nutrtitionPortions" at the start. Let's assume that all other columns are not nested and they are represented by "Activity" and "Wellbeing".

If I use the code:

nutrition = (pd.io
.json
.json_normalize(data, ['nutritionPortions'])
)


I will return to the original problem where "nutritionNutrients" is nested, but I am having no success then obtaining the corresponding data frame.

Thanks

Answer

use json_normalize():

(pd.io
   .json
   .json_normalize(l, 'nutritionNutrients',
        ['vcNutritionId','vcUserId','vcPortionId','vcPortionName','vcPortionSize',
         'ftEnergyKcal','vcPortionUnit','dtConsumedDate'])
)

demo:

In [107]: (pd.io
   .....:    .json
   .....:    .json_normalize(l, 'nutritionNutrients',
   .....:         ['vcNutritionId','vcUserId','vcPortionId','vcPortionName','vcPortionSize',
   .....:          'ftEnergyKcal','vcPortionUnit','dtConsumedDate'])
   .....: )
Out[107]:
   ftValue  nPercentRI vcNutrient vcNutritionPortionId vcTrafficLight        ...        vcPortionSize  \
0        0           0    alcohol  478d1905-f264-4d...                       ...                   20
1        0           0       bcfa  478d1905-f264-4d...                       ...                   20
2        0           0     biotin  478d1905-f264-4d...                       ...                   20

         vcNutritionId vcPortionId ftEnergyKcal     vcPortionName
0  2476378b-79ee-48...           1          5.2  1 average pepper
1  2476378b-79ee-48...           1          5.2  1 average pepper
2  2476378b-79ee-48...           1          5.2  1 average pepper

[3 rows x 14 columns]

where l is your list (parsed JSON)