Floni Floni - 3 months ago 13
R Question

Replace 0 when first observation for a level factor R

I have this sample:

data <- structure(list(mmsi = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 2L), .Label = c("a", "b"), class = "factor"),
tr = c(1, 1, 1, 0, 2, 2, 0, 4, 4, 0, 5, 5)), .Names = c("mmsi",
"tr"), row.names = c(NA, -12L), class = "data.frame")


I want to replace each 0 in the column
tr
with the previous value of
tr
, for each
mmsi
.

This function works well on the sample:

for ( i in levels(data$mmsi) ) {
data$test <- na.locf(with(data, { is.na(tr) <- tr == 0; tr }), fromLast = FALSE)}


But when I play with a bigger sample, one issue apears: if the first value is 0, then I have an error (because it can not find the previous value...).

For example if I edit the small sample with

data <- structure(list(mmsi = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 2L), .Label = c("a", "b"), class = "factor"),
tr = c(0, 1, 1, 0, 2, 2, 0, 4, 4, 0, 5, 5)), .Names = c("mmsi",
"tr"), row.names = c(NA, -12L), class = "data.frame")


The column
tr
begins here with 0 instead of 1 in the previous sample. If I apply the same function
for ( i in levels(data$mmsi) ) {
data$test <- na.locf(with(data, { is.na(tr) <- tr == 0; tr }), fromLast = FALSE)}
then I have of course the error

Error in `$<-.data.frame`(`*tmp*`, "test", value = c(1, 1, 1, 2, 2, 2, :
replacement has 11 rows, data has 12


--> the function could not replace the value I changes (the first value in the column
tr
)

I guess I need in my function one more row to edit first the 0 when they occur as a first level in
tr
. The new row should to replace the 0 with the following non-zero value. Then the rest of the function is fine.

The output I am looking for this new column is:

data$test
[1] 1 1 1 1 2 2 2 4 4 4 5 5


Any idea how to get this?

Answer

We can do this with one of the group by functions. Convert the 'data.frame' to 'data.table' (setDT(data)), grouped by 'mmsi', apply the na.locf (from zoo) after replacing the '0' values to 'NA' and with the option na.rm = FALSE, then we do a second na.locf with fromLast = TRUE to replace the starting 0 (aka NA) to the next value.

library(data.table)
library(zoo)
setDT(data)[, test := na.locf(na.locf(replace(tr, tr==0, NA), 
                   na.rm=FALSE), fromLast=TRUE), by = mmsi]
data
#    mmsi tr test
# 1:    a  0    1
# 2:    a  1    1
# 3:    a  1    1
# 4:    a  0    1
# 5:    a  2    2
# 6:    a  2    2
# 7:    a  0    2
# 8:    b  4    4
# 9:    b  4    4
#10:    b  0    4
#11:    b  5    5
#12:    b  5    5

We could also do this without using the na.locf

setDT(data)[, test := pmax(pmax(tr, shift((NA^!tr) * tr), na.rm = TRUE),1), mmsi]
Comments