user451151 - 1 year ago 69
R Question

# Apply-family on two lists (to avoid nested for-loops)

Let's say I have the following:

``````myseq <- seq(0, 1, by = 0.1)
scores <- sample(seq(0, 1, by = 0.01), 10)
var1 <- sample(c(0,1), 10, replace = T)
var2 <- sample(c(0,1), 10, replace = T)
mydf <- data.frame(scores = scores, var1 = var1, var2 = var2)

myseq
[1] 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

mydf
scores var1 var2
1   0.10    1    0
2   0.06    1    0
3   0.74    0    0
4   0.15    1    0
5   0.40    1    1
6   0.96    0    0
7   0.04    1    0
8   0.71    0    1
9   0.94    1    1
10  0.38    0    0
``````

For each value in
`myseq`
, I want to sum
`var1`
and
`var2`
for the subset of records where
`scores`
is greater than the value in
`myseq`
.

I want to do this only using the apply-family functions (apply, lapply, tapply, sapply, mapply, etc.). In other words, no nested for-loops.

So, for example:

The first value in
`myseq`
is
`0.0`
. All
`scores`
are greater than
`0.0`
, so I want to return
`var1`
=
`6`
and
`var2`
=
`3`
.

The second value in
`myseq`
is
`0.1`
. Only 7 of the 10
`scores`
are greater than
`0.1`
, so I want to return
`var1`
=
`3`
and
`var2`
=
`3`
.

...so on and so forth...

In the end, I'd like to the final output to be a 11(r) x 2(c) matrix (or data frame or list) containing the sums for each var.

``````var1 var2
6    3
3    3
...
...
``````

Note: 11(r) is because the length of
`myseq`
is 11; 2(c) is because there are two vars,
`var1`
and
`var2`

``````res<-t(sapply(myseq,function(x){apply(mydf[scores>x,2:3],2,sum)}))