agenis agenis - 2 months ago 23
R Question

Avoid propagation of NA in matrix multiplication

I have some difficulties with propagation of missing values in the context of matrix multiplication.
My first matrix

X
is the gas flow measurements each hour for 5 flowmeters:

X=structure(c(16, 19, 28, 32, 30, 22, 16, 13, 8, 6, 5, 3, 5, 5, 6, 13, 7, 10, 4, 2, 1, 2, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 7, 8, 8, 7, 7, 6, 6, 5, 5, 4, 4, 4, -16, -17, -20, -31, -25, -25, -16, -12, -13, -15, -9, -7), .Dim = c(12L, 5L), .Dimnames = list(NULL, c("meter1", "meter2", "meter3", "meter4", "meter5")))
#### meter1 meter2 meter3 meter4 meter5
#### [1,] 16 5 0 7 -16
#### [2,] 19 5 0 8 -17
#### ...


My second matrix
Z
says how these gas flows are distributed to feed 4 cities: for instance (first column of
Z
), for city1 the total net flow is defined as the sum of
(1)*Meter1 + (-1)*Meter2 + (1)*Meter5
.

Z=structure(c(1, -1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0), .Dim = c(5L, 4L), .Dimnames = list(NULL, c("city1", "city2", "city3", "city4")))
#### city1 city2 city3 city4
#### [1,] 1 0 0 0
#### [2,] -1 1 0 0
#### [3,] 0 1 0 0
#### [4,] 0 0 1 0
#### [5,] 1 0 0 0


So to calculate the net flow per city I just have to do a matrix multiplication:

X %*% Z
#### city1 city2 city3 city4
#### [1,] -5 5 7 0
#### [2,] -3 5 8 0
#### ...


My problem is that there are lots of missing values in my
X
matri
x (here 9
NA
):

set.seed(3); for (i in 1:10) X[sample.int(nrow(X), 1), sample.int(ncol(X), 1)] <- NA


When I do the matrix multiplication the
NA
propagates to the whole row even if it is located on a zero value column (which does'nt impacts the sum). So I get 24
NA
after the multiplication. However, if I do the calculation city by city only with the meters that are non-null, i only get 11
NA
:

sum(is.na(cbind(X[, 1] - X[, 2] + X[, 5], X[, 2] + X[, 3], X[, 4], 0)))
#### [1] 11


I would like to know if there is a way to do this calculation of the flows for each city that does'n propagate my
NA
so much. In the reality my matrices are much bigger but a city is never alimented by more than 4 meters (it is quite sparse). i'd like to avoid coding each column by hand (because if there is any change in the network the script won't work anymore).
Thanks,

Answer

Yes, I am sure this is what you need:

library(Matrix)
ZZ <- Matrix(Z, sparse = TRUE)
X %*% ZZ

#12 x 4 Matrix of class "dgeMatrix"
#      city1 city2 city3 city4
# [1,]    -5     5     7     0
# [2,]    NA    NA    NA     0
# [3,]    NA     6     8     0
# [4,]   -12    13     7     0
# [5,]    NA    NA     7     0
# [6,]   -13    10     6     0
# [7,]    -4    NA    NA     0
# [8,]    -1     2    NA     0
# [9,]    -6     1     5     0
#[10,]   -11     2     4     0
#[11,]    NA    NA     4     0
#[12,]    -5     1     4     0

As you expected, there are only 11 NA.


Follow-up

It throws an error when I try to convert the result to a data frame: data.frame(X %*% ZZ). How can I do it?

Use data.frame(as.matrix(X %*% ZZ)).