José María Luna José María Luna - 3 months ago 40
Scala Question

Filter repeated elements RDD

I'm working with a huge RDD and I'd like to filter it following a rule. I have a RDD with two elements and I don't mind the order of the factors, so I could filter it in order to remove those repeated pairs.

My input data is something like this:


And the output filtered RDD should be this one:


Thank you in advance.


I'd apply a .map step to the RDD that sorts the elements in your tuples. so that [(A,C), (C, A)] turns into [(A,C), (A, C)]

after that you can do a .distinct to get all the unique values.