WAF - 4 months ago 30

R Question

Say I have a vector **x** which defines the number of samples in a given class:

`x <- c(250,362,10,246,30)`

In this case, there are five classes.

The resulting class proportion vector is given by:

`p <- x/sum(x)`

[1] 0.27839644 0.40311804 0.01113586 0.27394209 0.03340757

How to update this class proportion vector so that class has proportion of at least 0.05?

An additional constraint is that the proportions to be removed from the classes > 0.05 will be evenly distributed to the all classes is larger or equal to 0.05.

Answer

If I understand you correctly, the following lines of code may do what you are after.

```
# Your vector
x <- c(250,362,10,246,30);
# Add a name tag
names(x) <- seq(1, length(x));
print(x);
1 2 3 4 5
250 362 10 246 30
# The new "collapsed" vector
x.new <- x0;
names(x.new) <- names(x);
# Remove entries with less than 5% of sum, and distribute them
# evenly (as per request) across the other entries.
# Note that the order in which you do this obviously matters:
# Omitting a sub-5% entry from position i may lead to an original
# sub-5% entry at j>i that is now above the 5% threshold and won't
# get omitted.
for (i in 1:length(x.new)) {
if (!is.na(x.new[i]) & x.new[i]/sum(x.new) < 0.05) {
x.new[-i] <- x.new[-i] + x.new[i] / length(x.new[-i]);
x.new <- x.new[-i];
}
}
print(x.new);
1 2 4
263.3333 375.3333 259.3333
# Or print as fraction
print(x.new/sum(x.new));
1 2 4
0.2932442 0.4179659 0.2887899
```

Note that the order matters (see my comment in the code). You can check that `sum(x) == sum(x.new)`

and `x.new/sum(x.new) > 0.05`

.