iembry - 2 months ago 12

R Question

I am trying to determine which Species have a count of 0 (no value for N) by unique Site.

For example, Species5 through Species33 are not present in Site1.d1. Species2 through Species33 are not present in Site1.d2, Species1 through Species4 & Species6 through Species33 are not present in Site1.d3, and so forth.

There are 33 total Species (Species1 to Species33), there are 4 different sets of Sites (Site1 to Site4), ranging from d1 to d45 (for example, Site1.d1 to Site1.d45 & Site2.d1 to Site2.d35, etc.).

I want to add those missing Species with a N of 0 to the existing data.table named testit.

`# Species Site N`

# Species1 Site1.d1 17

# Species2 Site1.d1 1

# Species3 Site1.d1 4

# Species4 Site1.d1 1

# Species1 Site1.d2 14

# Species5 Site1.d3 1

# Species6 Site2.d2 1

# Species6 Site2.d4 12

# Species7 Site3.d3 9

# Species6 Site3.d5 1

testit <- structure(list(Species = structure(c(1L, 2L, 3L, 4L, 1L, 5L,

6L, 6L, 7L, 6L), .Label = c("Species1", "Species2", "Species3",

"Species4", "Species5", "Species6", "Species7"), class = "factor"),

Site = structure(c(1L, 1L, 1L, 1L, 2L, 3L, 4L, 5L, 6L, 7L

), .Label = c("Site1.d1", "Site1.d2", "Site1.d3", "Site2.d2",

"Site2.d4", "Site3.d3", "Site3.d5"), class = "factor"), N = c(17L,

1L, 4L, 1L, 14L, 1L, 1L, 12L, 9L, 1L)), .Names = c("Species",

"Site", "N"), class = "data.frame", row.names = c(NA, -10L))

species <- sprintf("Species%d", 1:33)

fullsites1 <- sprintf("Site1.d%d", 1:45)

fullsites2 <- sprintf("Site2.d%d", 1:35)

fullsites3 <- sprintf("Site3.d%d", 1:40)

fullsites4 <- sprintf("Site4.d%d", 1:42)

fullsites <- c(fullsites1, fullsites2, fullsites3, fullsites4)

This is what I have tried thus far:

`testit[, which(species %chin% testit$Species) == FALSE, by = list(Species, Site)]`

This does not get me what I'm looking for.

What suggestions do you have?

Thank you.

Answer

Try this out using tidyr

```
library(tidyr)
xx <- testit %>%
spread(Site,N) %>%
gather(Sites,N,Site1.d1:Site3.d5)
xx$N[is.na(xx$N)] <- 0
```

First step: spread will give all the combinations

```
testit %>% spread(Site,N)
Species Site1.d1 Site1.d2 Site1.d3 Site2.d2 Site2.d4 Site3.d3 Site3.d5
1 Species1 17 14 NA NA NA NA NA
2 Species2 1 NA NA NA NA NA NA
3 Species3 4 NA NA NA NA NA NA
4 Species4 1 NA NA NA NA NA NA
5 Species5 NA NA 1 NA NA NA NA
6 Species6 NA NA NA 1 12 NA 1
7 Species7 NA NA NA NA NA 9 NA
```

Second step: gather data in long form again, and replace NA with zeros.

```
Species Sites N
1 Species1 Site1.d1 17
2 Species2 Site1.d1 1
3 Species3 Site1.d1 4
4 Species4 Site1.d1 1
5 Species5 Site1.d1 0
6 Species6 Site1.d1 0
7 Species7 Site1.d1 0
8 Species1 Site1.d2 14
9 Species2 Site1.d2 0
10 Species3 Site1.d2 0
11 Species4 Site1.d2 0
12 Species5 Site1.d2 0
13 Species6 Site1.d2 0
14 Species7 Site1.d2 0
........
```