Thomas Browne Thomas Browne - 3 months ago 6
R Question

Find the index position of the first non-NA value in an R vector?

I have a problem where a vector has a bunch of NAs at the beginning, and data thereafter. However the peculiarity of my data is that the first n values that are non NA, are probably unreliable, so I would like to remove them and replace them with NA.

For example, if I have a vector of length 20, and non-NAs start at index position 4:

> z
[1] NA NA NA -1.64801942 -0.57209233 0.65137286 0.13324344 -2.28339326
[9] 1.29968050 0.10420776 0.54140323 0.64418164 -1.00949072 -1.16504423 1.33588892 1.63253646
[17] 2.41181291 0.38499825 -0.04869589 0.04798073


I would like to remove the first 3 non-NA values, which I believe to be unreliable, to give this:

> z
[1] NA NA NA NA NA NA 0.13324344 -2.28339326
[9] 1.29968050 0.10420776 0.54140323 0.64418164 -1.00949072 -1.16504423 1.33588892 1.63253646
[17] 2.41181291 0.38499825 -0.04869589 0.04798073


Of course I need a general solution and I never know when the first non-NA value starts. How would I go about doing this? IE how do I find out the index position of the first non-NA value?

For completeness, my data is actually arranged in a data frame with lots of these vectors in columns, and each vector can have a different non-NA starting position. Also once the data starts, there may be sporadic NAs further down, which prevents me from simply counting their number, as a solution.

Answer

Use a combination of is.na and which to find the non-NA index locations.

NonNAindex <- which(!is.na(z))
firstNonNA <- min(NonNAindex)

# set the next 3 observations to NA
is.na(z) <- seq(firstNonNA, length.out=3)
Comments