Floni - 3 months ago 13
R Question

# Replace 0 when first observation for a level factor R

I have this sample:

``````data <- structure(list(mmsi = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 2L), .Label = c("a", "b"), class = "factor"),
tr = c(1, 1, 1, 0, 2, 2, 0, 4, 4, 0, 5, 5)), .Names = c("mmsi",
"tr"), row.names = c(NA, -12L), class = "data.frame")
``````

I want to replace each 0 in the column
`tr`
with the previous value of
`tr`
, for each
`mmsi`
.

This function works well on the sample:

``````for ( i in levels(data\$mmsi) ) {
data\$test <- na.locf(with(data, { is.na(tr) <- tr == 0; tr }), fromLast = FALSE)}
``````

But when I play with a bigger sample, one issue apears: if the first value is 0, then I have an error (because it can not find the previous value...).

For example if I edit the small sample with

``````data <- structure(list(mmsi = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 2L), .Label = c("a", "b"), class = "factor"),
tr = c(0, 1, 1, 0, 2, 2, 0, 4, 4, 0, 5, 5)), .Names = c("mmsi",
"tr"), row.names = c(NA, -12L), class = "data.frame")
``````

The column
`tr`
begins here with 0 instead of 1 in the previous sample. If I apply the same function
```for ( i in levels(data\$mmsi) ) { data\$test <- na.locf(with(data, { is.na(tr) <- tr == 0; tr }), fromLast = FALSE)}```
then I have of course the error

``````Error in `\$<-.data.frame`(`*tmp*`, "test", value = c(1, 1, 1, 2, 2, 2,  :
replacement has 11 rows, data has 12
``````

--> the function could not replace the value I changes (the first value in the column
`tr`
)

I guess I need in my function one more row to edit first the 0 when they occur as a first level in
`tr`
. The new row should to replace the 0 with the following non-zero value. Then the rest of the function is fine.

The output I am looking for this new column is:

``````data\$test
[1] 1 1 1 1 2 2 2 4 4 4 5 5
``````

Any idea how to get this?

Answer

We can do this with one of the group by functions. Convert the 'data.frame' to 'data.table' (`setDT(data)`), grouped by 'mmsi', apply the `na.locf` (from `zoo`) after replacing the '0' values to 'NA' and with the option `na.rm = FALSE`, then we do a second `na.locf` with `fromLast = TRUE` to replace the starting 0 (aka NA) to the next value.

``````library(data.table)
library(zoo)
setDT(data)[, test := na.locf(na.locf(replace(tr, tr==0, NA),
na.rm=FALSE), fromLast=TRUE), by = mmsi]
data
#    mmsi tr test
# 1:    a  0    1
# 2:    a  1    1
# 3:    a  1    1
# 4:    a  0    1
# 5:    a  2    2
# 6:    a  2    2
# 7:    a  0    2
# 8:    b  4    4
# 9:    b  4    4
#10:    b  0    4
#11:    b  5    5
#12:    b  5    5
``````

We could also do this without using the `na.locf`

``````setDT(data)[, test := pmax(pmax(tr, shift((NA^!tr) * tr), na.rm = TRUE),1), mmsi]
``````
Comments