user2950747 user2950747 - 1 month ago 15
Python Question

How can I ignore empty series when using value_counts on a Pandas groupby?

I've got a DataFrame with the metadata for a newspaper article in each row. I'd like to group these into monthly chunks, then count the values of one column (called

type
):

monthly_articles = articles.groupby(pd.Grouper(freq="M"))
monthly_articles = monthly_articles["type"].value_counts().unstack()


This works fine with an annual group but fails when I try to group by month:

ValueError: operands could not be broadcast together with shape (141,) (139,)


I think this is because there are some month groups in which there are no articles. If I iterate the groups and print value_counts on each group:

for name, group in monthly_articles:
print(name, group["type"].value_counts())


I get empty series in the groups for Jan and Feb of 2006:

2005-12-31 00:00:00 positive 1
Name: type, dtype: int64
2006-01-31 00:00:00 Series([], Name: type, dtype: int64)
2006-02-28 00:00:00 Series([], Name: type, dtype: int64)
2006-03-31 00:00:00 negative 6
positive 5
neutral 1
Name: type, dtype: int64
2006-04-30 00:00:00 negative 11
positive 6
neutral 3
Name: type, dtype: int64


How can I ignore the empty groups when using
value_counts()
?

I've tried
dropna=False
without success. I think this is the same issue as this question.

Answer Source

You'd better give us data sample. Otherwise, it is a little hard to point out the problem. From your code snippet, it seems that the type data for some months is null. You can use apply function on grouped objects and then call unstack function. Here is the sample code that works for me, and the data is randomly generated

s = pd.Series(['positive', 'negtive', 'neutral'], index=[0, 1, 2])
atype = s.loc[np.random.randint(3, size=(150,))]

df = pd.DataFrame(dict(atype=atype.values), index=pd.date_range('2017-01-01',  periods=150))

gp = df.groupby(pd.Grouper(freq='M'))
dfx = gp.apply(lambda g: g['atype'].value_counts()).unstack()

In [75]: dfx
Out[75]: 
            negtive  neutral  positive
2017-01-31       13        9         9
2017-02-28       11       11         6
2017-03-31       12        6        13
2017-04-30        8       12        10
2017-05-31        9       10        11

In case there are null values:

In [76]: df.loc['2017-02-01':'2017-04-01', 'atype'] = np.nan
    ...: gp = df.groupby(pd.Grouper(freq='M'))
    ...: dfx = gp.apply(lambda g: g['atype'].value_counts()).unstack()
    ...: 

In [77]: dfx
Out[77]: 
            negtive  neutral  positive
2017-01-31       13        9         9
2017-04-30        8       12         9
2017-05-31        9       10        11

Thanks.