Rich Pauloo - 1 month ago 13

R Question

I have some timeseries data with values for some time indices but not others. **I need a way to filter for all observations that occur in both time indices**.

Here's a reproducible example that illustrates my problem. In the final graph I only want observations of

`Type == a`

`set.seed(1005)`

mydat <- data.frame(

ID = c('a1', 'a2', 'a3', 'a4', 'a5', 'a1', 'a2', 'a5', 'a12', 'a13'),

Year = c(2000, 2000, 2000, 2000, 2000, 2001, 2001, 2001, 2001, 2001),

Result = rnorm(10, mean = 20, sd = 10),

Type = c('a','a','b','b','a', 'a', 'a', 'a', 'b', 'b'))

mydat %>%

ggplot(aes(x = Year, y = Result)) +

geom_point(aes(color = Type)) +

geom_line(aes(group = ID))

`Type`

`Type`

Solutions should be independent of the

`Type`

`Type`

Answer Source

You can find the, okay let's called them repeated, repeated IDs for two time entries and mark them as `type == a`

.

`reshape`

:You can reshape the data to *wide* format and remove the ones with `NA`

which means they don't have data for both time entries. Look below:

```
mydat_a <- reshape(mydat, idvar = "ID", timevar = "Year", direction = "wide")
mydat_a #Those with NA are the ones that you set them as Type == b
# ID Result.2000 Result.2001
# 1 a1 14.39524 37.150650
# 2 a2 17.69823 24.609162
# 3 a3 35.58708 NA
# 4 a4 20.70508 NA
# 5 a5 21.29288 7.349388
# 9 a12 NA 13.131471
# 10 a13 NA 15.543380
#Add the types again
mydat_a$Type <- "a"
mydat_a[which(is.na(mydat_a), arr.ind=TRUE)[,1],]$Type <- "b"
#go back to long format
mydat_a <- reshape(mydat_a, direction="long",
varying=list(names(mydat_a)[2:3]), v.names="Result",
idvar="ID", timevar="Year", times=2000:2001)
#remove NA
mydat_a <- na.omit(mydat_a)
```

You can look for the final plotting solution below (use `mydat_a`

instead of `mydat`

in `ggplot`

syntax).

```
mydat$Type <- "b" #make all of them "b" later change the repeated ones to "a"
mydat[ mydat$ID %in% mydat[mydat$Year==2000,]$ID
& mydat$ID %in% mydat[mydat$Year==2001,]$ID,]$Type <- "a"
mydat$Type <- as.factor(mydat$Type)
mydat
# ID Year Result type
# 1 a1 2000 17.67485 a
# 2 a2 2000 15.16812 a
# 3 a3 2000 27.18261 b
# 4 a4 2000 14.18510 b
# 5 a5 2000 32.91164 a
# 6 a1 2001 13.30867 a
# 7 a2 2001 20.15258 a
# 8 a5 2001 31.21311 a
# 9 a12 2001 32.62673 b
# 10 a13 2001 6.85111 b
```

It gives you the types that you entered manually here.

Then you can use @d.b's solution:

```
ggplot(data = split(mydat, mydat$Type)$a, aes(x = Year, y = Result)) +
geom_point(aes(color = Type)) + geom_line(aes(group = ID))
```

*Data:*

```
set.seed(123)
mydat <- data.frame(ID = c('a1', 'a2', 'a3', 'a4', 'a5', 'a1', 'a2', 'a5', 'a12', 'a13'),
Year = c(2000, 2000, 2000, 2000, 2000, 2001, 2001, 2001, 2001, 2001),
Result = rnorm(10, mean = 20, sd = 10))
```