vasek1 vasek1 - 4 months ago 69
Python Question

How to give a pandas/matplotlib bar graph custom colors

I just started using pandas/matplotlib as a replacement for Excel to generate stacked bar charts. I am running into an issue

(1) there are only 5 colors in the default colormap, so if I have more than 5 categories then the colors repeat. How can I specify more colors? Ideally, a gradient with a start color and an end color, and a way to dynamically generate n colors in between?

(2) the colors are not very visually pleasing. How do I specify a custom set of n colors? Or, a gradient would also work.

An example which illustrates both of the above points is below:

4 from matplotlib import pyplot
5 from pandas import *
6 import random
7
8 x = [{i:random.randint(1,5)} for i in range(10)]
9 df = DataFrame(x)
10
11 df.plot(kind='bar', stacked=True)


And the output is this:

enter image description here

Sorry, I literally just started using matplotlib today, and couldn't figure out from the docs how to specify custom colors

Answer

You can specify the color option as a list directly to the plot function.

from matplotlib import pyplot as plt
from itertools import cycle, islice
import pandas, numpy as np  # I find np.random.randint to be better

# Make the data
x = [{i:np.random.randint(1,5)} for i in range(10)]
df = pandas.DataFrame(x)

# Make a list by cycling through the colors you care about
# to match the length of your data.
my_colors = list(islice(cycle(['b', 'r', 'g', 'y', 'k']), None, len(df)))

# Specify this list of colors as the `color` option to `plot`.
df.plot(kind='bar', stacked=True, color=my_colors)

To define your own custom list, you can do a few of the following, or just look up the Matplotlib techniques for defining a color item by its RGB values, etc. You can get as complicated as you want with this.

my_colors = ['g', 'b']*5 # <-- this concatenates the list to itself 5 times.
my_colors = [(0.5,0.4,0.5), (0.75, 0.75, 0.25)]*5 # <-- make two custom RGBs and repeat/alternate them over all the bar elements.
my_colors = [(x/10.0, x/20.0, 0.75) for x in range(len(df))] # <-- Quick gradient example along the Red/Green dimensions.

The last example yields the follow simple gradient of colors for me:

enter image description here

I didn't play with it long enough to figure out how to force the legend to pick up the defined colors, but I'm sure you can do it.

In general, though, a big piece of advice is to just use the functions from Matplotlib directly. Calling them from Pandas is OK, but I find you get better options and performance calling them straight from Matplotlib.