user user - 1 month ago 10
Python Question

Creating histograms in pandas with columns with equidistant base, not proportional to the range

I am creating an histogram in pandas simply using:

train_data.hist("MY_VARIABLE", bins=[0,5, 10,50,100,500,1000,5000,10000,50000,100000])


(train_data is a pandas df).

The problem is that, since the range
[50000,100000]
is so large, I can barely see the small ranges
[0,5]
or
[5,10]
etc. I would like the histogram to have equidistant bars on the x-axis, not proportional to the range. Is this possible?

Answer

You can do it this way:

bins = [0, 5, 10,50,100,500,1000,5000,10000,50000,100000]
df.groupby(pd.cut(df.a, bins=bins, labels=bins[1:])).size().plot.bar(rot=0)

Demo:

df = pd.DataFrame(np.random.randint(0,10**5,(10**4,2)),columns=list('ab'))
bins = [0, 5, 10,50,100,500,1000,5000,10000,50000,100000]
df.groupby(pd.cut(df.a, bins=bins, labels=bins[1:])).size().plot.bar(rot=0)

enter image description here

filtering results:

threshold = 100
(df.groupby(pd.cut(df.a,
                   bins=bins, 
                   labels=bins[1:]))
   .size()
   .to_frame('count')
   .query('count > @threshold')
)

Out[84]:
        count
a
5000      396
10000     492
50000    4044
100000   4961

plotting filtered:

(df.groupby(pd.cut(df.a,
                   bins=bins, 
                   labels=bins[1:]))
   .size()
   .to_frame('count')
   .query('count > @threshold')
   .plot.bar(rot=0)
)

enter image description here

Comments