user1796346 user1796346 - 4 months ago 27
JSON Question

Flatten Pandas DataFrame from nested json list

perhaps somebody could help me. I tried to flat the following ist into a pandas dataframe:

[{u'_id': u'2',
u'_index': u'list',
u'_score': 1.4142135,
u'_source': {u'name': u'name3'},
u'_type': u'doc'},
{u'_id': u'5',
u'_index': u'list',
u'_score': 1.4142135,
u'_source': {u'dat': u'2016-12-12', u'name': u'name2'},
u'_type': u'doc'},
{u'_id': u'1',
u'_index': u'list',
u'_score': 1.4142135,
u'_source': {u'name': u'name1'},
u'_type': u'doc'}]


The result should look like:

|_id | _index | _score | name | dat | _type |
------------------------------------------------------
|1 |list |1.4142..| name1| nan | doc |
|2 |list |1.4142..| name3| nan | doc |
|3 |list |1.4142..| name1| 2016-12-12 | doc |


But all I tried to do is not possible to get the desired result.
I used something like this:

df = pd.concat(map(pd.DataFrame.from_dict, res['hits']['hits']), axis=1)['_source'].T


But then I loose the types wich is outside the _source field.
I also tried to work with

test = pd.DataFrame(list)
for index, row in test.iterrows():
test.loc[index,'d'] =


But I have no idea how to come to the point to use the field _source and append it to the original data frame.

Did somebody has an idea how to to that and become the desired outcome?

Answer

Use json_normalize:

from pandas.io.json import json_normalize  

df = json_normalize(data)
print (df)
  _id _index    _score _source.dat _source.name _type
0   2   list  1.414214         NaN        name3   doc
1   5   list  1.414214  2016-12-12        name2   doc
2   1   list  1.414214         NaN        name1   doc