Dylan K Dylan K - 3 months ago 69
R Question

R diff() function for bigz data?

I want to obtain the difference between consecutive rows in a data frame, which is what the built-in diff() function does. But my data is of the bigz class (gmp package), so I cannot use the existing function.

class(MyData$IntIndex)
[1] "bigz"
diff(MyData$IntIndex)
Error in r[i1] - r[-length(r):-(length(r) - lag + 1L)] :
non-numeric argument to binary operator


Perhaps there is a package with a function that could solve my problem? Or something else I could do?

Answer

Since diff is an S3 generic, and pretty straightforward to implement, you can just add your own diff.bigz method on the fly. Here is a very basic example for the default case of lag = 1, differences = 1:

library(gmp)

z <- as.bigz(
    c("1000000000000000000000000000",
      "1000000000000000000000000010",
      "1000000000000000000000000021",
      "1000000000000000000000000033",
      "1000000000000000000000000047")
)

diff.bigz <- function(x) {
    x[-1] - x[-length(x)]
}

diff(z)
#Big Integer ('bigz') object of length 4:
#[1] 10 11 12 14 

If you want something more elaborate, translating diff.default shouldn't be too difficult:

diff.default
# function (x, lag = 1L, differences = 1L, ...) 
# {
#     ismat <- is.matrix(x)
#     xlen <- if (ismat) 
#         dim(x)[1L]
#     else length(x)
#     if (length(lag) != 1L || length(differences) > 1L || lag < 
#         1L || differences < 1L) 
#         stop("'lag' and 'differences' must be integers >= 1")
#     if (lag * differences >= xlen) 
#         return(x[0L])
#     r <- unclass(x)
#     i1 <- -seq_len(lag)
#     if (ismat) 
#         for (i in seq_len(differences)) r <- r[i1, , drop = FALSE] - 
#             r[-nrow(r):-(nrow(r) - lag + 1L), , drop = FALSE]
#     else for (i in seq_len(differences)) r <- r[i1] - r[-length(r):-(length(r) - 
#         lag + 1L)]
#     class(r) <- oldClass(x)
#     r
# }
# <bytecode: 0x62f5c78>
# <environment: namespace:base>