hdy hdy - 5 months ago 30
SQL Question

pyspark sql dataframe keep only null

I have a sql dataframe

df
and there is a column
user_id
, how do I filter the dataframe and keep only
user_id
is actually null for further analysis? From the pyspark module page here, one can drop na rows easily but did not say how to do the opposite.

Tried
df.filter(df.user_id == 'null')
, but the result is 0 column. Maybe it is looking for a string "null". Also
df.filter(df.user_id == null)
won't work as it is looking for a variable named 'null'

Answer

Try

df.filter(df.user_id == None)
Comments