I'm working with a huge RDD and I'd like to filter it following a rule. I have a RDD with two elements and I don't mind the order of the factors, so I could filter it in order to remove those repeated pairs.
My input data is something like this:
I'd apply a .map step to the RDD that sorts the elements in your tuples. so that [(A,C), (C, A)] turns into [(A,C), (A, C)]
after that you can do a .distinct to get all the unique values.