Galaffer Galaffer - 2 months ago 7
R Question

Deleting columns from a dataframe in R based on number of values >0

I have a dataframe where some columns contain very few values above 0, and I would like to delete those columns.

For example, if I would like to delete all columns with less than 5 values above 0 in the following data.frame, how would I go about that?

Name,Col1,Col2,Col3,Col4,Col5,Col6
Row1,0,0,0,0,10,400
Row2,125.0,2.50,0,0,5,0
Row3,10,0,0,10,100,70
Row4,10,0,50,0,0,10
Row5,489,0,30,0,35,50
Row6,0,0,450.5,0,10,400
Row7,125.5,0,2.50,0,0,5
Row8,10,0,0.50,10,100,70
Row9,10,0,50,0,0,10
Row10,489,0,0,0,35,50

Answer

You can try to combine colSums with logical subsetting:

df1[, colSums(df1 > 0) >= 5]
#       Col1  Col3 Col5 Col6
#Row1    0.0   0.0   10  400
#Row2  125.0   0.0    5    0
#Row3   10.0   0.0  100   70
#Row4   10.0  50.0    0   10
#Row5  489.0  30.0   35   50
#Row6    0.0 450.5   10  400
#Row7  125.5   2.5    0    5
#Row8   10.0   0.5  100   70
#Row9   10.0  50.0    0   10
#Row10 489.0   0.0   35   50

data

df1 <- structure(list(Col1 = c(0, 125, 10, 10, 489, 0, 125.5, 10, 10, 
       489), Col2 = c(0, 2.5, 0, 0, 0, 0, 0, 0, 0, 0), Col3 = c(0, 0, 
        0, 50, 30, 450.5, 2.5, 0.5, 50, 0), Col4 = c(0L, 0L, 10L, 0L, 
        0L, 0L, 0L, 10L, 0L, 0L), Col5 = c(10L, 5L, 100L, 0L, 35L, 10L, 
        0L, 100L, 0L, 35L), Col6 = c(400L, 0L, 70L, 10L, 50L, 400L, 5L, 
       70L, 10L, 50L)), .Names = c("Col1", "Col2", "Col3", "Col4", "Col5", 
       "Col6"), class = "data.frame", row.names = c("Row1", "Row2", 
       "Row3", "Row4", "Row5", "Row6", "Row7", "Row8", "Row9", "Row10"))