hdy hdy - 1 year ago 109
SQL Question

pyspark sql dataframe keep only null

I have a sql dataframe

and there is a column
, how do I filter the dataframe and keep only
is actually null for further analysis? From the pyspark module page here, one can drop na rows easily but did not say how to do the opposite.

df.filter(df.user_id == 'null')
, but the result is 0 column. Maybe it is looking for a string "null". Also
df.filter(df.user_id == null)
won't work as it is looking for a variable named 'null'

df.filter(df.user_id == None)
