makambi makambi - 6 months ago 43
Python Question

Python: Pandas dataframe from Series of dict

I have my Pandas' dataframe:

type(original)
pandas.core.frame.DataFrame


My
original
has
original['user']
Series object:

type(original['user'])
pandas.core.series.Series


Now, original['user'] is actually Series of dict objects:

type(original['user'].ix[0])
dict


Each of this
dict
has lots of keys (they are the same),

original['user'].ix[0].keys()

[u'follow_request_sent',
u'profile_use_background_image',
u'profile_text_color',
u'id',
u'verified',
u'profile_location',
u'profile_image_url_https',
u'profile_sidebar_fill_color',
u'is_translator',
u'geo_enabled',
u'entities',
u'followers_count',
u'protected',
u'location',
u'default_profile_image',
u'id_str',


Other removed for clarity, basically this is a dict of
user
field in tweet from tweeter API

Now I want to make a data frame from it.

When I am trying to make data frame directly, I have only one column for each row. This column contains while dictionary.

pd.DataFrame(original['user'][:2])
user
0 {u'follow_request_sent': False, u'profile_use_...
1 {u'follow_request_sent': False, u'profile_use_..


When I am trying to create data frame from dict:

pd.DataFrame.from_dict(original['user'][:2])

user
0 {u'follow_request_sent': False, u'profile_use_...
1 {u'follow_request_sent': False, u'profile_use_..


Next thing I tried was list comprehension:

item = [[k, v] for (k,v) in users]
ValueError: too many values to unpack


When I create data frame from single row:

df = pd.DataFrame.from_dict(original['user'].ix[0])
df.reset_index()

index contributors_enabled created_at default_profile default_profile_image description entities favourites_count follow_request_sent followers_count following friends_count geo_enabled id id_str is_translation_enabled is_translator lang listed_count location name notifications profile_background_color profile_background_image_url profile_background_image_url_https profile_background_tile profile_image_url profile_image_url_https profile_link_color profile_location profile_sidebar_border_color profile_sidebar_fill_color profile_text_color profile_use_background_image protected screen_name statuses_count time_zone url utc_offset verified
0 description False Mon May 26 11:58:40 +0000 2014 True False {u'urls': []} 0 False 157


It works almost like I want it to, except it sets
description
field as default index.

Each of the dictionaries has 40 keys, I dont need them all. 10 max.
And I have 28734 rows in data frame.

How can I do it?

Answer

what I would try to do is the following:

new_df = pd.DataFrame(list(original['user']))

this will convert the series to list then pass it to pandas dataframe and it should take care of the rest.