I have a problem with the following problem:
Implement a function count_words() in Python that takes as input a string s and a number n, and returns the n most frequently-occuring words in s. The return value should be a list of tuples - the top n words paired with their respective counts [(, ), (, ), ...], sorted in descending count order.
You can assume that all input will be in lowercase and that there will be no punctuations or other characters (only letters and single separating spaces). In case of a tie (equal count), order the tied words alphabetically.
print count_words("betty bought a bit of butter but the butter was bitter",3)
[('butter', 2), ('a', 1), ('betty', 1)]
This is my solution:
from operator import itemgetter
from collections import Counter
def count_words(s, n):
"""Return the n most frequently occuring words in s."""
# TODO: Count the number of occurences of each word in s
words = s.split(" ");
words = Counter(words)
# TODO: Sort the occurences in descending order (alphabetically in case of ties)
# TODO: Return the top n words as a list of tuples (<word>, <count>)
top_n = words.most_common(n)
"""Test count_words() with some inputs."""
print(count_words("cat bat mat cat bat cat", 3))
print(count_words("betty bought a bit of butter but the butter was bitter", 3))
if __name__ == '__main__':
You can sort them using the number of occurrence (in reverse order) and then the lexicographical order:
>>> lst = [('meat', 2), ('butter', 2), ('a', 1), ('betty', 1)] >>> >>> sorted(lst, key=lambda x: (-x, x)) # ^ reverse order [('butter', 2), ('meat', 2), ('a', 1), ('betty', 1)]
The number of occurrence takes precedence over the lex. order.
In your case, use
words.items() in place of the list of the list I have used. You will no longer need to use
sorted already does the same.