mitsi mitsi - 2 months ago 5
JSON Question

Convert a dict to a pandas DataFrame

My data look like this :

{u'"57e01311817bc367c030b390"': u'{"ad_since": 2016, "indoor_swimming_pool": "No", "seaside": "No", "handicapped_access": "Yes"}', u'"57e01311817bc367c030b3a8"': u'{"ad_since": 2012, "indoor_swimming_pool": "No", "seaside": "No", "handicapped_access": "Yes"}'}


I want to convert it to a pandas Dataframe. But when I try

df = pd.DataFrame(response.items())


I get a DataFrame with two columns, the first with the first key, and the second with the values of the key:

0 1
0 "57e01311817bc367c030b390" {"ad_since": 2016, "indoor_swimming_pool": "No...
1 "57e01311817bc367c030b3a8" {"ad_since": 2012, "indoor_swimming_pool": "No...


How can I get a single column for each key :
"ad_since"
,
"indoor_swimming_pool"
,
"indoor_swimming_pool"
? And keep the first column, or get the id as index.

Answer

You need convert column of type str to dict by .apply(literal_eval) or .apply(json.loads) and then use DataFrame.from_records:

import pandas as pd
from ast import literal_eval

response = {u'"57e01311817bc367c030b390"': u'{"ad_since": 2016, "indoor_swimming_pool": "No", "seaside": "No", "handicapped_access": "Yes"}', 
           u'"57e01311817bc367c030b3a8"': u'{"ad_since": 2012, "indoor_swimming_pool": "No", "seaside": "No", "handicapped_access": "Yes"}'}

df = pd.DataFrame.from_dict(response, orient='index')

print (type(df.iloc[0,0]))
<class 'str'>

df.iloc[:,0] = df.iloc[:,0].apply(literal_eval)

print (pd.DataFrame.from_records(df.iloc[:,0].values.tolist(), index=df.index))
                            ad_since handicapped_access indoor_swimming_pool  \
"57e01311817bc367c030b3a8"      2012                Yes                   No   
"57e01311817bc367c030b390"      2016                Yes                   No   

                           seaside  
"57e01311817bc367c030b3a8"      No  
"57e01311817bc367c030b390"      No  

import pandas as pd
import json

response = {u'"57e01311817bc367c030b390"': u'{"ad_since": 2016, "indoor_swimming_pool": "No", "seaside": "No", "handicapped_access": "Yes"}', 
           u'"57e01311817bc367c030b3a8"': u'{"ad_since": 2012, "indoor_swimming_pool": "No", "seaside": "No", "handicapped_access": "Yes"}'}


df = pd.DataFrame.from_dict(response, orient='index')
df.iloc[:,0] = df.iloc[:,0].apply(json.loads)


print (pd.DataFrame.from_records(df.iloc[:,0].values.tolist(), index=df.index))
                            ad_since handicapped_access indoor_swimming_pool  \
"57e01311817bc367c030b3a8"      2012                Yes                   No   
"57e01311817bc367c030b390"      2016                Yes                   No   

                           seaside  
"57e01311817bc367c030b3a8"      No  
"57e01311817bc367c030b390"      No