ScientistGirl ScientistGirl - 5 months ago 251
Python Question

Plotting Multiple Histograms in Matplotlib - Colors or side-by-side bars

Problem : When Plotting Multiple Histograms in Matplotlib, i cannot differentiate a plot from another

Problem as Image : ** Problem
**Minor Problem : The left label 'Count' is out of the image, partially. Why?


Description

I want to plot the histogram of the 3 different sets. Each set, is an array with 0's and 1's. I want the histogram of each so i can detect imbalances on the dataset.

I have them plotted separately but i wanted a graphic of them together.

It would be okay to have a different graphic with bars side-by-side or, i even googled about plotting it as 3D, but i dont know how easy would be to "read" or "look" at the graphic and understand it.

Right now, i want to plot the [train],[validation] and [test] bars at each side on the same graphic, something like this :

I want it like this

PS : My googling didnt return any code that was understandable to me.
Also, i would like if someone would check if im doing any insanity on my code.

Thanks a lot guys!

Code :

def generate_histogram_from_array_of_labels(Y=[], labels=[], xLabel="Class/Label", yLabel="Count", title="Histogram of Trainset"):
plt.figure()
plt.clf()

colors = ["b", "r", "m", "w", "k", "g", "c", "y"]

information = []
for index in xrange(0, len(Y)):
y = Y[index]

if index > len(colors):
color = colors[0]
else:
color = colors[index]

if labels is None:
label = "?"
else:
if index < len(labels):
label = labels[index]
else:
label = "?"

unique, counts = np.unique(y, return_counts=True)
unique_count = np.empty(shape=(unique.shape[0], 2), dtype=np.uint32)

for x in xrange(0, unique.shape[0]):
unique_count[x, 0] = unique[x]
unique_count[x, 1] = counts[x]

information.append(unique_count)

# the histogram of the data
n, bins, patches = plt.hist(y, unique.shape[0], normed=False, facecolor=color, alpha=0.75, range=[np.min(unique), np.max(unique) + 1], label=label)

xticks_pos = [0.5 * patch.get_width() + patch.get_xy()[0] for patch in patches]

plt.xticks(xticks_pos, unique)

plt.xlabel(xLabel)
plt.ylabel(yLabel)
plt.title(title)
plt.grid(True)
plt.legend()
# plt.show()

string_of_graphic_image = cStringIO.StringIO()

plt.savefig(string_of_graphic_image, format='png')
string_of_graphic_image.seek(0)

return base64.b64encode(string_of_graphic_image.read()), information


Edit

Following the answer of hashcode, this new code :

def generate_histogram_from_array_of_labels(Y=[], labels=[], xLabel="Class/Label", yLabel="Count", title="Histogram of Trainset"):
plt.figure()
plt.clf()

colors = ["b", "r", "m", "w", "k", "g", "c", "y"]
to_use_colors = []
information = []


for index in xrange(0, len(Y)):
y = Y[index]

if index > len(colors):
to_use_colors.append(colors[0])
else:
to_use_colors.append(colors[index])

unique, counts = np.unique(y, return_counts=True)
unique_count = np.empty(shape=(unique.shape[0], 2), dtype=np.uint32)

for x in xrange(0, unique.shape[0]):
unique_count[x, 0] = unique[x]
unique_count[x, 1] = counts[x]

information.append(unique_count)

unique, counts = np.unique(Y[0], return_counts=True)
histrange = [np.min(unique), np.max(unique) + 1]
# the histogram of the data
n, bins, patches = plt.hist(Y, 1000, normed=False, alpha=0.75, range=histrange, label=labels)


#xticks_pos = [0.5 * patch.get_width() + patch.get_xy()[0] for patch in patches]

#plt.xticks(xticks_pos, unique)

plt.xlabel(xLabel)
plt.ylabel(yLabel)
plt.title(title)
plt.grid(True)
plt.legend()


Is producing this :

Result

Answer

I tried and came up with this, you can change the xticks position. Simply what you have to do is pass on a tuple to the plt.hist, can't be more simple right !? So lets suppose you have two lists of 0s and 1s, so what you gotta do is -

a = np.random.randint(2, size=1000)
b = np.random.randint(2, size=1000)
plt.hist((a, b), 2, label = ("data1", "data2"))
plt.legend()
plt.xticks((0.25, 0.75), (0, 1))

enter image description here