Pylander Pylander - 3 months ago 8
JSON Question

Python Pandas Create Records from Complex Dictionary

I have processed some very complex nested json objects to get the following general dictionary format:

{'key1':'value1',
'key2':'value2',
'key3':'value3',
'key4':'value4',
'key5':[['value5', 'value6', 'value7'], ['value8', 'value9', 'value10']],
'key6':[['value5', 'value6', 'value7'], ['value8', 'value9', 'value10']]}


In the list of lists, each list indicates something that should be an "individual transaction" equivalent. Each transaction shares key1, key2, key3, key4 pairs. There can be an arbitrary number of lists. I am trying to efficiently turn these into records in a pandas dataframe like the following:

key1_field, key2_field, key3_field, key4_field, key5_or_key6_field_1, key5_or_key6_field_2, key5_or_key6_field_3, key5_or_key6_indicator
value1, value2, value3, value 4, value5, value6, value7, key5
value1, value2, value3, value 4, value5, value6, value7, key6
value1, value2, value3, value 4, value8, value9, value10, key5
value1, value2, value3, value 4, value8, value9, value10, key6


Any assistance would be sincerely appreciated! It has been a challenge enough getting this to this point. Thanks!

EDIT:

As asked, I can post how I have been trying to approach this:

import pandas as pd
import numpy as np

d = {'key1':'value1',
'key2':'value2',
'key3':'value3',
'key4':'value4',
'key5':[['value5', 'value6', 'value7'], ['value8', 'value9', 'value10']],
'key6':[['value5', 'value6', 'value7'], ['value8', 'value9', 'value10']]}

df = pd.DataFrame({k : pd.Series(v) for k, v in d.iteritems()})


My remaining issue is that the single key values are NaN after the first row.

enter image description here

Answer

Try this:

pd.DataFrame({k : pd.Series(v) for k, v in d.iteritems()}).ffill()