JimLohse JimLohse - 10 months ago 71
Python Question

Find a sum of matching values of [a Python list] in a Pandas Dataframe cell?

Bottom line question:

This is ugly, right? For a pandas dataframe:

happyBarFrame['Sunday'].apply(lambda x : x == [0]).value_counts()[0]

Long version: I am trying to find the sum of "empty" values in a Pandas Dataframe. Empty is when the entry is [0] -- a list.

I guess the best solution is not to use a list [0] as a value in a dataframe cell. Assuming I am sticking with that, I'd like to understand why so many functions don't work, giving a "unhashable list" error?

Yes this is a homework assignment for a data science class. We scrape a webpage that presents bars and happy hours. I have built this dataframe, I was using lists of hours where the last entry in the list is when the happy hour ends. Ignoring a couple bugs in the data:
DataFrame of happy hours by day of week

I am coming to believe that storing lists inside a dataframe is not the best idea, because many functions I would expect to work tell me they can't hash as list:

In [173]: happyBarFrame['Sunday'].value_counts()

... skipping long error ...

TypeError: unhashable type: 'list'

Question is: I have tried the following, among many other things. What's the best approach to count the cells by day that have a list with a single 0 (cell value = [0])?

-- Doesn't work
-- Doesn't work

Both give
TypeError: unhashable type: 'list'

happyBarFrame['Sunday'].apply(lambda x : x == [0]).value_counts()[0]

Does work but just feels wrong! I can iterate through the days of the week, getting the count by day and then build the histogram I need to build. But there must be a better way, this seems really inefficient, true?

EDIT: Fixed a bug where need to count True, not False, in value_counts[0]

Answer Source

If you simply want to apply a function to every cell in a dataframe, you can use applymap; then to get the column sums, just call sum:

df = pd.DataFrame({"Sunday": [[0],[0,1]], "Monday": [[0],[0]]})
df.applymap(lambda x: x == [0]).sum()

Monday    2
Sunday    1
dtype: int64