MMcLaughlin MMcLaughlin - 3 years ago 150
Python Question

dataframe value_counts() Shape Error

I am attempting to go through a time-series dataset and count the number of times each unique category of clothing appears for each day. Every year of my dataset works fine besides 2012. When I run my code I receive a shape broadcasting error and I cannot figure out why 2012 is causing this error but none of my other years are.

counts = test.groupby(pd.Grouper(freq='D')).value_counts()

Here is the error the code produces

ValueError Traceback (most recent call last)
<ipython-input-127-bc2dbf569e47> in <module>()
1 test=orders['Category']['2012']
----> 2 counts = test.groupby(pd.Grouper(freq='D')).value_counts()

c:\users\matthew mclaughlin\miniconda3\envs\cseclass\lib\site-packages\pandas\core\ in value_counts(self, normalize, sort, ascending, bins, dropna)
3016 # multi-index components
-> 3017 labels = list(map(rep, self.grouper.recons_labels)) + [lab[inc]]
3018 levels = [ping.group_index for ping in self.grouper.groupings] + [lev]
3019 names = self.grouper.names + []

c:\users\matthew mclaughlin\miniconda3\envs\cseclass\lib\site-packages\numpy\core\ in repeat(a, repeats, axis)
394 except AttributeError:
395 return _wrapit(a, 'repeat', repeats, axis)
--> 396 return repeat(repeats, axis)

ValueError: operands could not be broadcast together with shape (366,) (363,)

A sample output from my data looks similar to this

Order Date
2013-01-01 Outerwear
2013-01-01 Accessories
2013-01-01 First Layer Tops
2013-01-01 First Layer Tops
2013-01-01 Accessories
2013-01-01 First Layer Bottoms
2013-01-01 Kid's Sets
2013-01-01 Outerwear

2013-01-01 Outerwear

And what the code is suppose to produce after it runs looks like this.

Order Date Category
2013-01-01 Outerwear 289
First Layer Tops 230
Accessories 190
First Layer Bottoms 155
Footwear 10
Kid's Sets 3

Ultimately, I unstack this result and insert it into new columns for each category.

Answer Source

Groupby object has no attribute called .value_counts(). If you want to value counts use apply + stack i.e

df.groupby(pd.Grouper(freq='D')).apply(lambda x : x.Category.value_counts()).stack()

Output for your test_data with additional dates.

Order Date  Category           
2013-01-01  Outerwear              3
            First Layer Tops       2
            Accessories            2
            Kid's Sets             1
            First Layer Bottoms    1
2013-01-02  Outerwear              3
            First Layer Tops       2
            Accessories            2
            Kid's Sets             1
            First Layer Bottoms    1
dtype: int64

If you trying to select categories based on the year then try boolean indexing like df[df.index.year == 2012]

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download