Ace Haidrey Ace Haidrey - 2 months ago 14
Python Question

Getting the first item for a tuple for each row in list pyspark

I'm a bit new to Spark and am trying to do a simple mapping. My data is like the following:

RDD((0, list(tuples)), ..., (19, list(tuples))


What I want to do is grab the first item in each tuple so ultimately something like this:

RDD((0, list(first item of each tuple),..., (19, list(first item of each tuple))


Can someone help me out with how to map this? Appreciate this!

Answer

You can use mapValues to convert the list of tuples to a list of tuple[0]:

rdd.mapValues(lambda x: [t[0] for t in x])