user3177938 user3177938 - 1 month ago 13
Python Question

python, dictionary in a data frame, sorting

I have a python data frame called wiki, with the wikipedia information for some people.
Each row is a different person, and the columns are : 'name', 'text' and 'word_count'. The information in 'text' has been put in dictionary form (keys,values), to create the information in the column 'word_count'.

If I want to extract the row related to Barack Obama, then:

row = wiki[wiki['name'] == 'Barack Obama']


Now, I would like the most popular word. When I do:

adf=row[['word_count']]


I get another data frame because I see that:

type(adf)=<class 'pandas.core.frame.DataFrame'>


and if I do

adf.values


I get:

array([[ {u'operations': 1, u'represent': 1, u'office': 2, ..., u'began': 1}], dtype=object)


However, what is very confusing to me is that the size is 1

adf.size=1


Therefore, I do not know how to actually extract the keys and values. Things like
adf.values[1]
do not work

Ultimately, what I need to do is sort the information in word_count so that the most frequent words appear first.
But I would like to understand how to access a the information that is inside a dictionary, inside a data frame... I am lost about the types here. I am not new to programming, but I am relatively new to python.

Any help would be very very much appreciated

Answer

If the name column is unique, then you can change the column to the index of the DataFrame object:wiki.set_index("name", inplace=True). Then you can get the value by: wiki.at['Barack Obama', 'word_count'].

With your code:

row = wiki[wiki['name'] == 'Barack Obama']
adf = row[['word_count']]

The first line use a bool array to get the data, here is the document: http://pandas.pydata.org/pandas-docs/stable/indexing.html#boolean-indexing

wiki is a DataFrame object, and row is also a DataFrame object with only one row, if the name column is unique.

The second line get a list of columns from the row, here is the document: http://pandas.pydata.org/pandas-docs/stable/indexing.html#basics

You get a DataFrame with only one row and one column.

And here is the document of .at[]: http://pandas.pydata.org/pandas-docs/stable/indexing.html#fast-scalar-value-getting-and-setting

Comments