Chris T. Chris T. - 11 months ago 101
Python Question

Filter tokens in a list based on frequency

I am working on a list object containing several tokens with different frequency

from collections import Counter

s = {'book',
'car',
'bird',
'cup',
'book',
'cup',
'river'}

print(Counter(s))

[('book': 2), ('cup': 2), ('river': 1), ('car': 1), ('bird': 1)]


I want to set a filter by which only tokens that have appeared twice will be selected, and my use the following code in my current attempt

select = [word for word in s if list(s).count(word) >= 2]
select


I thought it's very straightforward, but I didn't any output from 'select.' What went wrong with my code and how to deal with it?

Answer Source

In case s is a list and not a set (like you wrote in your question, but not in the code in your example), you can use the most_common function of the Counter object to get the top X elements in your list:

In [67]: s = ['book',
    ...:  'car',
    ...:  'bird',
    ...:  'cup',
    ...:  'book',
    ...:  'cup',
    ...:  'river']

In [68]: s
Out[68]: ['book', 'car', 'bird', 'cup', 'book', 'cup', 'river']

In [69]: c = Counter(s)

In [70]: c.most_common(2)
Out[70]: [('book', 2), ('cup', 2)]

In case you want to get elements that appear more than Y times you can use:

In [71]: [x[0] for x in c.items() if x[1] >= 2]
Out[71]: ['book', 'cup']

x[0] is the item (from the list) and x[1] is the frequency

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download