The general contract for
This integer need not remain consistent from one execution of an application to another execution of the same application.
In my experience I use things with deterministic hashes so it hasn't been a problem.
That is indeed the way to go, Spark can't overcome usage of objects with non-deterministic hash codes.
Usage of Java Enums is a specifically notorious example of how this can go wrong, see: http://dev.bizo.com/2014/02/beware-enums-in-spark.html. Quoting that post:
... the hashCode method on Java's enum type is based on the memory address of the object. So while yes, we're guaranteed that the same enum value have a stable hashCode inside a particular JVM (since the enum will be a static object) - we don't have this guarantee when you try to compare hashCodes of Java enums with identical values living in different JVMs