I'm attempting to perform sentiment analysis on a pile of text data I have scraped from the internet. I have reached a point where my Pandas DataFrame has the following columns I wish to analyse: "post_date" (in the format dd-mm-yyyy ie. 01-10-2017) & "Sentiment" (in the format "positive","neutral" or "negative").
I wish to be able to count the number of posts by each day/month/year and also the number of positive/neutral/negative posts for each day.
for example like those produced by:
print pd.value_counts(df.Sentiment)
df.groupby(df.post_date.dt.year)
Date, Postive_Posts, Negative_Posts, Neutral_Posts, Total_Posts
01/10/2017, 10, 5, 8, 23
02/10/2017, 5, 20, 5, 30
post_date, Sentiment
19/09/2017, positive
19/09/2017, positive
19/09/2017, positive
20/09/2017, negative
20/09/2017, neutral
You can use groupby
+ size
+ unstack
+ add_suffix
+ sum
:
df1 = df.groupby(['post_date','Sentiment']).size().unstack(fill_value=0).add_suffix('_Posts')
df1['Total_Posts'] = df1.sum(axis=1)
print (df1)
Sentiment negative_Posts neutral_Posts positive_Posts Total_Posts
post_date
19/09/2017 0 0 3 3
20/09/2017 1 1 0 2
One line solution is very similar - only need assign
:
df1 = (df.groupby(['post_date','Sentiment'])
.size()
.unstack(fill_value=0)
.add_suffix('_Posts')
.assign(Total_Posts=lambda x: x.sum(axis=1)))
print (df1)
Sentiment negative_Posts neutral_Posts positive_Posts Total_Posts
post_date
19/09/2017 0 0 3 3
20/09/2017 1 1 0 2
And for column from index
:
df1 = (df.groupby(['post_date','Sentiment'])
.size()
.unstack(fill_value=0)
.add_suffix('_Posts')
.assign(Total_Posts=lambda x: x.sum(axis=1))
.reset_index()
.rename_axis(None, axis=1))
print (df1)
post_date negative_Posts neutral_Posts positive_Posts Total_Posts
0 19/09/2017 0 0 3 3
1 20/09/2017 1 1 0 2