Jane Sully Jane Sully -4 years ago 92
Python Question

How to keep a dictionary of dictionaries (or something with similar functionality) in pandas?

So I have a large dataframe with many columns. Let us say the two main columns I am interested are messages and names. Each message will be something like a personal status and will be accompanied by the person's name. Let's say I have a word bank of emotion/feeling words which is fairly huge but a condensed version could look like ['sad', 'happy', 'relieved', 'annoyed', 'angry', 'ecstatic', 'proud', 'disappointed']. For each name (and names could be repeated since they could have multiple messages), I want to keep track of the emotion words they have used and their accompanying counts. For example (I have made this up since I have a really large dataset):

Message Name
I am really happy with my progress. Alice
I was annoyed by his inconsideration. John
I felt proud after seeing her performance. Lisa
I am ecstatic after hearing the good news. Alice
I felt disappointed by her dishonesty Lisa


If I wanted something that would essentially tell me the following (I don't necessarily care about the format as long as it's clean): {Alice: happy:1, ecstatic: 1; John: annoyed:1; Lisa: proud:1, disappointed: 1}.

What is the best/easiest way of doing this? Is a dictionary of dictionaries the best way to do this? Secondly (this is sort of an additional related question), after computing above, what if I wanted to find the most frequently used emotion word for each name. In this example everything is pretty much tied at 1, but imagine the counts were not the same.

Answer Source

If you are looking for a very easy way of accomplishing what you are looking for, I would suggest using the pandas groupby functionality in combination with the collections utility.

Initial Situation

df = pd.read_csv('messages.csv')
df['Emotion Words'] = df.Message.apply(extract_emotion_words)
print(df)

Output:

                                      Message   Name   Emotion Words
0         I am really happy with my progress.  Alice         [happy]
1      I am really happy with johns progress.  Alice         [happy]
2       I was annoyed by his inconsideration.   John       [annoyed]
3  I felt proud after seeing her performance.   Lisa         [proud]
4  I am ecstatic after hearing the good news.  Alice      [ecstatic]
5       I felt disappointed by her dishonesty   Lisa  [disappointed]

Group By Name and Apply Counter

from collections import Counter
df.groupby('Name')['Emotion Words'].sum().apply(Counter)

Output:

Name
Alice        {'happy': 2, 'ecstatic': 1}
John                      {'annoyed': 1}
Lisa     {'disappointed': 1, 'proud': 1}
dtype: object

Get Most Frequent Word

Since you also indicated that you want to get the most common emotion word, you can use the Counter.most_common(n) function where n is the number of most common words to extract (in your case 1):

df.groupby('Name')['Emotion Words'].sum().apply(
   lambda words: Counter(words).most_common(1)
)

Output:

Name
Alice           [(happy, 2)]
John          [(annoyed, 1)]
Lisa     [(disappointed, 1)]
dtype: object

For this little example, I used the following function to extract the emotion words. There sure might be better ways of doing it (perhaps you are already using the nltk, if not I would suggest you look into it).

import re

emotionWords = set(['sad', 'happy', 'relieved', 'annoyed', 'angry', 'ecstatic', 'proud', 'disappointed'])

def extract_emotion_words(message):
    tokenized = re.split(r'\W+', message.lower())
    return list(set(tokenized) & emotionWords)
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download