Is it possible with pyspark to initialize some variable x and define some function f(q) that makes use of x (and returns an RDD) before entering interactive shell? I want to give access to another user in the shell to this function f(q) but I don't want to expose x variable to him. Would a possible solution be to attach this function to the spark context variable? If not possible, how could one do that?
It is perfectly possible but it won't serve the intended purpose. You could for example use modified shell script and further obfuscate data by using native extensions but it will protect you only from accidental exposure.
As long you give the user access to the fully functional Python environment, they inspect existing objects, analyze closures, access the source or invoke debugger. So if assume malicious intentions this is simply not the way to go. And this is only the tip of the iceberg. User that have direct access to the Spark shell can execute arbitrary commands on the cluster, effectively limited only by the permissions granted to Spark user.