agenis - 1 year ago 88
R Question

# Avoid propagation of NA in matrix multiplication

I have some difficulties with propagation of missing values in the context of matrix multiplication.
My first matrix

`X`
is the gas flow measurements each hour for 5 flowmeters:

``````X=structure(c(16, 19, 28, 32, 30, 22, 16, 13, 8, 6, 5, 3, 5, 5, 6, 13, 7, 10, 4, 2, 1, 2, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 7, 8, 8, 7, 7, 6, 6, 5, 5, 4, 4, 4, -16, -17, -20, -31, -25, -25, -16, -12, -13, -15, -9, -7), .Dim = c(12L, 5L), .Dimnames = list(NULL, c("meter1", "meter2", "meter3", "meter4", "meter5")))
####      meter1 meter2 meter3 meter4 meter5
#### [1,]     16      5      0      7    -16
#### [2,]     19      5      0      8    -17
#### ...
``````

My second matrix
`Z`
says how these gas flows are distributed to feed 4 cities: for instance (first column of
`Z`
), for city1 the total net flow is defined as the sum of
`(1)*Meter1 + (-1)*Meter2 + (1)*Meter5`
.

``````Z=structure(c(1, -1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0), .Dim = c(5L, 4L), .Dimnames = list(NULL, c("city1", "city2", "city3", "city4")))
####      city1 city2 city3 city4
#### [1,]     1     0     0     0
#### [2,]    -1     1     0     0
#### [3,]     0     1     0     0
#### [4,]     0     0     1     0
#### [5,]     1     0     0     0
``````

So to calculate the net flow per city I just have to do a matrix multiplication:

``````X %*% Z
####      city1 city2 city3 city4
#### [1,]    -5     5     7     0
#### [2,]    -3     5     8     0
#### ...
``````

My problem is that there are lots of missing values in my
`X`
matri
x (here 9
`NA`
):

``````set.seed(3); for (i in 1:10) X[sample.int(nrow(X), 1), sample.int(ncol(X), 1)] <- NA
``````

When I do the matrix multiplication the
`NA`
propagates to the whole row even if it is located on a zero value column (which does'nt impacts the sum). So I get 24
`NA`
after the multiplication. However, if I do the calculation city by city only with the meters that are non-null, i only get 11
`NA`
:

``````sum(is.na(cbind(X[, 1] - X[, 2] + X[, 5], X[, 2] + X[, 3], X[, 4], 0)))
#### [1] 11
``````

I would like to know if there is a way to do this calculation of the flows for each city that does'n propagate my
`NA`
so much. In the reality my matrices are much bigger but a city is never alimented by more than 4 meters (it is quite sparse). i'd like to avoid coding each column by hand (because if there is any change in the network the script won't work anymore).
Thanks,

Yes, I am sure this is what you need:

``````library(Matrix)
ZZ <- Matrix(Z, sparse = TRUE)
X %*% ZZ

#12 x 4 Matrix of class "dgeMatrix"
#      city1 city2 city3 city4
# [1,]    -5     5     7     0
# [2,]    NA    NA    NA     0
# [3,]    NA     6     8     0
# [4,]   -12    13     7     0
# [5,]    NA    NA     7     0
# [6,]   -13    10     6     0
# [7,]    -4    NA    NA     0
# [8,]    -1     2    NA     0
# [9,]    -6     1     5     0
#[10,]   -11     2     4     0
#[11,]    NA    NA     4     0
#[12,]    -5     1     4     0
``````

As you expected, there are only 11 `NA`.

Follow-up

It throws an error when I try to convert the result to a data frame: `data.frame(X %*% ZZ)`. How can I do it?

Use `data.frame(as.matrix(X %*% ZZ))`.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download