José María Luna José María Luna - 20 days ago 5
Scala Question

Filter repeated elements RDD

I'm working with a huge RDD and I'd like to filter it following a rule. I have a RDD with two elements and I don't mind the order of the factors, so I could filter it in order to remove those repeated pairs.

My input data is something like this:

{{A,B},{A,C},{B,A},{B,C},{C,A},{C,B}}


And the output filtered RDD should be this one:

{{A,B},{A,C},{B,C}}


Thank you in advance.

Answer

I'd apply a .map step to the RDD that sorts the elements in your tuples. so that [(A,C), (C, A)] turns into [(A,C), (A, C)]

after that you can do a .distinct to get all the unique values.

Comments