Justin Justin - 3 months ago 14
Python Question

Using Matplotlib to plot over a subset of data

I am using matplotlib to plot bar charts of data in my DataFrame. I use this construction to first plot over the whole dataset:

import pandas as pd
from collections import Counter
import matplotlib.pyplot as plt

Temp_Counts = Counter(weatherDFConcat['TEMPBIN_CONS'])
df = pd.DataFrame.from_dict(Temp_Counts, orient = 'index').sort_index()
df.plot(kind = 'bar', title = '1969-2015 National Temp Bins', legend = False, color = ['r', 'r', 'g', 'g', 'b', 'b', 'r', 'r', 'g', 'g', 'b', 'b', 'r', 'r', 'g', 'g', 'b', 'b', 'r', 'r', 'g', 'g', 'b', 'b','r', 'r', 'g', 'g', 'b', 'b', 'r', 'r', 'g', 'g' ] )


Now I would like to plot the same column of data except I would like to do so over a particular subset of data. For each region in 'region_name' I would like to generate the bar plot. Here is an example of my DataFrame.

enter image description here

My attempted solution is to write:

if weatherDFConcat['REGION_NAME'].any() == 'South':
Temp_Counts = Counter(weatherDFConcat['TEMPBIN_CONS'])
df = pd.DataFrame.from_dict(Temp_Counts, orient = 'index').sort_index()
df.plot(kind = 'bar', title = '1969-2015 National Temp Bins', legend = False, color = ['r', 'r', 'g', 'g', 'b', 'b', 'r', 'r', 'g', 'g', 'b', 'b', 'r', 'r', 'g', 'g', 'b', 'b', 'r', 'r', 'g', 'g', 'b', 'b','r', 'r', 'g', 'g', 'b', 'b', 'r', 'r', 'g', 'g' ] )
plt.show()


When I run this code it oddly only works for the 'South' region. For 'South' the plot is generated but for any other regions I try the code runs (I get no error message) but the plot never shows up. Running my code for any region other than south produces this result in the console.

enter image description here

The South region is the first part in my DataFrame, which is 40 million lines long, with other regions being further down. Could the size of the DataFrame I'm trying to plot have anything to do with this?

Answer

If I'm understanding your question correctly, you are trying to do two things prior to plotting:

  1. Filter based on REGION_NAME.

  2. Within that filtered dataframe, count how many times each value in the TEMPBIN_CONS column appears.

You can do both of those things right within pandas:

import pandas as pd
import matplotlib.pyplot as plt

df = pd.DataFrame({'STATE_NAME': ['Alabama', 'Florida', 'Maine', 'Delaware', 'New Jersey'],
                        'GEOID': [1, 2, 3, 4, 5],
                 'TEMPBIN_CONS': ['-3 to 0', '-3 to 0', '0 to 3', '-3 to 0', '0 to 3'],
                  'REGION_NAME': ['South', 'South', 'Northeast', 'Northeast', 'Northeast']},
                         columns=['STATE_NAME', 'GEOID', 'TEMPBIN_CONS', 'REGION_NAME'])

df_northeast = df[df['REGION_NAME'] == 'Northeast']
northeast_count = df_northeast.groupby('TEMPBIN_CONS').size()

print df
print df_northeast
print northeast_count

northeast_count.plot(kind='bar')
plt.show()

output:

   STATE_NAME  GEOID TEMPBIN_CONS REGION_NAME
0     Alabama      1      -3 to 0       South
1     Florida      2      -3 to 0       South
2       Maine      3       0 to 3   Northeast
3    Delaware      4      -3 to 0   Northeast
4  New Jersey      5       0 to 3   Northeast

   STATE_NAME  GEOID TEMPBIN_CONS REGION_NAME
2       Maine      3       0 to 3   Northeast
3    Delaware      4      -3 to 0   Northeast
4  New Jersey      5       0 to 3   Northeast

TEMPBIN_CONS
-3 to 0    1
0 to 3     2
dtype: int64

enter image description here