Mnemosyne Mnemosyne - 1 month ago 11
Scala Question

How to query the presence of an element inside a Spark Dataframe Column that contains a set?

I have a spark dataframe where one column has the type

Set<text>
.
This column contains a set of string, for example
["eenie","meenie","mo"]
.
How do I filter the contents of the whole dataframe so that
I only get those rows that (for example) contain the value
eenie
in the set?

I'm looking for something similar to

dataframe.where($"list".contains("eenie"))


the above shown example is only valid for when the content of column list is a string not a Set. What alternatives are there to fit my circumstances?

Edit: My question is not a duplicate. The user in that question has a set of values and wants to know which ones are located inside a specific column. I have a column that contains a set, and I want to know if a specific value is part of the set. My approach is the opposite of that.

Answer

Try:

import org.apache.spark.sql.functions.array_contains

dataframe.where(array_contains($"list", "eenie"))