JohnE JohnE - 1 year ago 115
Python Question

Distribution-type graphs (histogram/kde) with weighted data

In a nutshell, what is my best option for a distribution-type graphs (histogram or kde) when my data is weighted?

df = pd.DataFrame({ 'x':[1,2,3,4], 'wt':[7,5,3,1] })


That works fine but seaborn won't accept a weights kwarg, i.e.

sns.distplot( df.x, bins=4, # doesn't work like this
weights=df.wt.values ) # or with kde=False added

It would also be nice if kde would accept weights but neither pandas nor seaborn seems to allow it.

I realize btw that the data could be expanded to fake the weighting and that's easy here but not of much use with my real data with weights in the hundreds or thousand, so I'm not looking for a workaround like that.

Anyway, that's all. I'm just trying to find out what (if anything) I can do with weighted data besides the basic pandas histogram. I haven't fooled around with bokeh yet, but bokeh suggestions are also welcome.

Answer Source

You have to understand that seaborn uses the very matplotlib plotting functions that also pandas uses.

As the documentation states, sns.distplot does not accept a weights argument, however it does take a hist_kws argument, which will be sent to the underlying call to plt.hist. Thus, this should do what you want:

sns.distplot(df.x, bins=4, hist_kws={'weights':df.wt.values}) 
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download