Toni - 1 year ago 125
R Question

Rank() in R excluding zeros

I am trying to duplicated "manually" the example in this Wikipedia post using R.

Here is the data:

``````after = c(125, 115, 130, 140, 140, 115, 140, 125, 140, 135)
before = c(110, 122, 125, 120, 140, 124, 123, 137, 135, 145)
sgn = sign(after-before)
abs = abs(after - before)
d = data.frame(after,before,sgn,abs)

after before sgn abs
1    125    110   1  15
2    115    122  -1   7
3    130    125   1   5
4    140    120   1  20
5    140    140   0   0
6    115    124  -1   9
7    140    123   1  17
8    125    137  -1  12
9    140    135   1   5
10   135    145  -1  10
``````

If I try to rank the rows based on the
`abs`
column, the
`0`
entry is naturally ranked as
`1`
:

``````rank = rank(abs)
(d = data.frame(after,before,sgn,abs,rank))

after before sgn abs rank
1    125    110   1  15  8.0
2    115    122  -1   7  4.0
3    130    125   1   5  2.5
4    140    120   1  20 10.0
5    140    140   0   0  1.0
6    115    124  -1   9  5.0
7    140    123   1  17  9.0
8    125    137  -1  12  7.0
9    140    135   1   5  2.5
10   135    145  -1  10  6.0
``````

However, zeros are ignored in the Wilcoxon signed-test.

How can I get R to ignore that row, so as to end up with:

``````   after before sgn abs rank
1    125    110   1  15  7.0
2    115    122  -1   7  3.0
3    130    125   1   5  1.5
4    140    120   1  20  9.0
5    140    140   0   0    0
6    115    124  -1   9  4.0
7    140    123   1  17  8.0
8    125    137  -1  12  6.0
9    140    135   1   5  1.5
10   135    145  -1  10  5.0
``````

``````after = c(125, 115, 130, 140, 140, 115, 140, 125, 140, 135)
before = c(110, 122, 125, 120, 140, 124, 123, 137, 135, 145)
sgn = sign(after-before)
abs = abs(after - before)
d = data.frame(after,before,sgn,abs)
d\$rank = rank(replace(abs,abs==0,NA), na='keep')
d\$multi = d\$sgn * d\$rank

(W=abs(sum(d\$multi, na.rm = T)))
9
``````

From the Wikipedia article:

1. Exclude pairs with |x2,ix1,i| = 0. Let Nr be the reduced sample size.

We need to exclude zeroes. By my thinking, you should replace zeroes with NA, and then specify to `rank()` that you want to exclude NAs from consideration for ranking. Since you need to return a vector of the same length as the input, you can specify `'keep'` as the argument:

``````d\$rank <- rank(replace(abs,abs==0,NA),na='keep');
d;
##    after before sgn abs rank
## 1    125    110   1  15  7.0
## 2    115    122  -1   7  3.0
## 3    130    125   1   5  1.5
## 4    140    120   1  20  9.0
## 5    140    140   0   0   NA
## 6    115    124  -1   9  4.0
## 7    140    123   1  17  8.0
## 8    125    137  -1  12  6.0
## 9    140    135   1   5  1.5
## 10   135    145  -1  10  5.0
``````

The subtraction-based solutions will not work if the input vector contains zero zeroes or multiple zeroes.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download