Omkar Omkar - 3 months ago 8
Python Question

(key, value) pair using Python Lambdas

I am trying to work on a simple word count problem and trying to figure if that can be done by use of map, filter and reduce exclusively.

Following is an example of an wordRDD(the list used for spark):

myLst = ['cats', 'elephants', 'rats', 'rats', 'cats', 'cats']


All i need is to count the words and present it in a tuple format:

counts = [('cat', 1), ('elephant', 1), ('rat', 1), ('rat', 1), ('cat', 1)]


I tried with simple map() and lambdas as:

counts = myLst.map(lambdas x: (x, <HERE IS THE PROBLEM>))


I might be wrong with the syntax or maybe confused.
P.S.: This isnt a duplicate questin as rest answers give suggestions using if/else or list comprehensions.

Thanks for the help.

Answer

You don't need map(..) at all. You can do it with just reduce(..)

>>> def function(obj, x):
...     obj[x] += 1
...     return obj
...
>>> from functools import reduce
>>> reduce(function, myLst, defaultdict(int)).items()
dict_items([('elephants', 1), ('rats', 2), ('cats', 3)])

You can then iterate of the result.


However, there's a better way of doing it: Look into Counter