hfisch hfisch - 26 days ago 4
R Question

Assigning a value to each range of consecutive numbers with same sign in R

I'm trying to create a data frame where a column exists that holds values representing the length of runs of positive and negative numbers, like so:

Time V Length
0.5 -2 1.5
1.0 -1 1.5
1.5 0 0.0
2.0 2 1.0
2.5 0 0.0
3.0 1 1.75
3.5 2 1.75
4.0 1 1.75
4.5 -1 0.75
5.0 -3 0.75


The
Length
column sums the length of time that the value has been positive or negative. Zeros are given a
0
since they are an inflection point. If there is no zero separating the sign change, the values are averaged on either side of the inflection.

I am trying to approximate the amount of time that these values are spending either positive or negative. I've tried this with a
for
loop with varying degrees of success, but I would like to avoid looping because I am working with extremely large data sets.

I've spent some time looking at
sign
and
diff
as they are used in this question about sign changes. I've also looked at this question that uses
transform
and
aggregate
to sum consecutive duplicate values. I feel like I could use this in combination with
sign
and/or
diff
, but I'm not sure how to retroactively assign these sums to the ranges that created them or how to deal with spots where I'm taking the average across the inflection.

Any suggestions would be appreciated. Here is the sample dataset:

dat <- data.frame(Time = seq(0.5, 5, 0.5), V = c(-2, -1, 0, 2, 0, 1, 2, 1, -1, -3))

Answer

First find indices of "Time" which need to be interpolated: i.e. consecutive "V" which lack a zero between positive and negative values; they have an abs(diff(sign(V)) larger than one.

id <- which(abs(c(0, diff(sign(dat$V)))) > 1)

To the original data, add rows of V = zero at Time = 0 and at last time step (according to the assumptions mentioned by @Gregor), and add mean of "Time" at relevant indices and corresponding "V" values of zero. Order by "Time".

d2 <- rbind(dat,
            data.frame(Time = c(0, max(dat$Time)), V = c(0, 0)),
            data.frame(Time = (dat$Time[id] + dat$Time[id - 1])/2, V = 0))
d2 <- d2[order(d2$Time), ]

Calculate time differences between time steps which are zero and replicate them using "zero-group indices".

d2$Length <- diff(d2$Time[d2$V == 0])[cumsum(d2$V == 0)]

Add values to original data:

merge(dat, d2)

#    Time  V Length
# 1   0.5 -2   1.50
# 2   1.0 -1   1.50
# 3   1.5  0   1.00
# 4   2.0  2   1.00
# 5   2.5  0   1.75
# 6   3.0  1   1.75
# 7   3.5  2   1.75
# 8   4.0  1   1.75
# 9   4.5 -1   0.75
# 10  5.0 -3   0.75

Set "Length" to 0 where V == 0.