Zach Eisner Zach Eisner - 3 months ago 6
R Question

Selecting the first non-zero value from each column in a data frame

I have a 10x100 data frame called

CoeNIST
. The rows are in order of significance (i.e. the value in row 1 is more important than the value in row 2) and each column represents a different sample. I would like to extract only the most significant non-zero value, i.e. the first non-zero value, for each sample.

Here is a sample from the first 9 columns of
CoeNIST
.

> CoeNIST[,1:9]
1 2 3 4 5 6 7 8 9
1 0 352232 0 0 0 0 0 28733 0
2 332829 0 0 380109 0 0 0 380343 0
3 0 0 0 380111 0 0 0 380409 0
4 0 0 0 380101 0 0 0 0 0
5 0 0 299211 380112 0 0 0 0 0
6 0 0 0 380103 0 0 0 0 0
7 0 0 0 380100 0 0 0 71899 0
8 0 0 0 24812 0 0 0 0 0
9 0 0 0 0 0 0 0 380410 0
10 0 332958 0 0 0 0 0 380440 0


And here is what I would like the outcome to look like

> NIST
[1] 332829 352232 299211 380109 NA NA NA 28733 NA


OR...as a list...

> NIST
[[1]]
[1] 332829

[[2]]
[1] 352232

[[3]]
[1] 299211

[[4]]
[1] 380109

[[5]] integer(0)

[[6]] integer(0)

[[7]] integer(0)

[[8]]
[1] 28733

[[9]] integer(0)

Answer
CoeNIST <- read.table(header=TRUE,text="
1      2      3      4 5 6 7      8 9
1       0 352232      0      0 0 0 0  28733 0
2  332829      0      0 380109 0 0 0 380343 0
3       0      0      0 380111 0 0 0 380409 0
4       0      0      0 380101 0 0 0      0 0
5       0      0 299211 380112 0 0 0      0 0
6       0      0      0 380103 0 0 0      0 0
7       0      0      0 380100 0 0 0  71899 0
8       0      0      0  24812 0 0 0      0 0
9       0      0      0      0 0 0 0 380410 0
10      0 332958      0      0 0 0 0 380440 0")

I would describe your problem as "selecting the first non-zero value in each column." My solution gives you NA values when there are only zeros in the column ...

apply(CoeNIST,2,function(x) (x[x>0])[1])
##     X1     X2     X3     X4     X5     X6     X7     X8     X9 
## 332829 352232 299211 380109     NA     NA     NA  28733     NA 
Comments