equanimity equanimity - 6 months ago 50
Python Question

Subset Pandas Data Frame Based on Some Row Value

I have a Pandas data frame with columns that are 'dynamic' (meaning that I don't know what the column names will be until I retrieve the data from the various databases).

The data frame is a single row and looks something like this:

Make Date Red Blue Green Black Yellow Pink Silver
89 BMW 2016-10-28 300.0 240.0 2.0 500.0 1.0 1.0 750.0

Note that '89' is that particular row in the data frame.

I have the following code:

cars_bar_plot = df_cars.loc[(df_cars.Make == 'BMW') & (df_cars.Date == as_of_date)]

cars_bar_plot = cars_bar_plot.replace(0, value=np.nan)

cars_bar_plot = cars_bar_plot.dropna(axis=1, how='all')

This works fine in helping me to create the above-mentioned single-row data frame, BUT some of the values in each column are very small (e.g. 1.0 and 2.0) relative to the other values and they are distorting a horizontal bar chart that I'm creating with Matplotlib. I'd like to get rid of numbers that are smaller than some minimum threshold value (e.g. 3.0).

Any idea how I can do that?



The following line of code helps, but does not fully solve the problem.

cars_bar_plot = cars_bar_plot.loc[:, (cars_bar_plot >= 3.0).any(axis=0)]

The problem is that it's eliminating unintended columns. For example, referencing the original data frame, is it possible to modify this code such that it only removes columns with a value less than 3.0 to the right of the "Black" column (under the assumption that we actually want to retain the value of 2.0 in the "Green" column)?



Here is the answer to my question:

lower_threshold = 3.0
start_column = 5
df = df.loc[start_column:, (df >= lower_threshold).any(axis=0)]