Abhishek Shankhadhar Abhishek Shankhadhar - 5 months ago 11
Python Question

How to delete columns with at least 20% missing values

I there any efficient way to delete the column which have greater than 20% filled values.

suppose my dataframe is like :


A B C D
0 sg hh 1 7
1 gf 9
2 hh 10
3 dd 8
4 6
5 y 8


And after removing dataframe becomes like this:

A D
0 sg 7
1 gf 9
2 hh 10
3 dd 8
4 6
5 y 8

Answer

You can use boolean indexing on the columns where the count of notnull values is larger then 80%:

df.loc[:, pd.notnull(df).sum()>len(df)*.8]

Alternatively, you can also specify the thresh keyword to use .dropna() as illustrated by @EdChum:

df.dropna(tresh=0.8*len(df)), axis=1)
Comments