iembry - 1 year ago 73
R Question

r - Determine which values are absent from a given column thus have a 0 count

I am trying to determine which Species have a count of 0 (no value for N) by unique Site.

For example, Species5 through Species33 are not present in Site1.d1. Species2 through Species33 are not present in Site1.d2, Species1 through Species4 & Species6 through Species33 are not present in Site1.d3, and so forth.

There are 33 total Species (Species1 to Species33), there are 4 different sets of Sites (Site1 to Site4), ranging from d1 to d45 (for example, Site1.d1 to Site1.d45 & Site2.d1 to Site2.d35, etc.).

I want to add those missing Species with a N of 0 to the existing data.table named testit.

``````# Species       Site    N
# Species1  Site1.d1    17
# Species2  Site1.d1    1
# Species3  Site1.d1    4
# Species4  Site1.d1    1
# Species1  Site1.d2    14
# Species5  Site1.d3    1
# Species6  Site2.d2    1
# Species6  Site2.d4    12
# Species7  Site3.d3    9
# Species6  Site3.d5    1

testit <- structure(list(Species = structure(c(1L, 2L, 3L, 4L, 1L, 5L,
6L, 6L, 7L, 6L), .Label = c("Species1", "Species2", "Species3",
"Species4", "Species5", "Species6", "Species7"), class = "factor"),
Site = structure(c(1L, 1L, 1L, 1L, 2L, 3L, 4L, 5L, 6L, 7L
), .Label = c("Site1.d1", "Site1.d2", "Site1.d3", "Site2.d2",
"Site2.d4", "Site3.d3", "Site3.d5"), class = "factor"), N = c(17L,
1L, 4L, 1L, 14L, 1L, 1L, 12L, 9L, 1L)), .Names = c("Species",
"Site", "N"), class = "data.frame", row.names = c(NA, -10L))

species <- sprintf("Species%d", 1:33)

fullsites1 <- sprintf("Site1.d%d", 1:45)

fullsites2 <- sprintf("Site2.d%d", 1:35)

fullsites3 <- sprintf("Site3.d%d", 1:40)

fullsites4 <- sprintf("Site4.d%d", 1:42)

fullsites <- c(fullsites1, fullsites2, fullsites3, fullsites4)
``````

This is what I have tried thus far:

``````testit[, which(species %chin% testit\$Species) == FALSE, by = list(Species, Site)]
``````

This does not get me what I'm looking for.

What suggestions do you have?

Thank you.

Try this out using tidyr

``````library(tidyr)

xx <- testit %>%
gather(Sites,N,Site1.d1:Site3.d5)

xx\$N[is.na(xx\$N)] <- 0
``````

First step: spread will give all the combinations

``````testit %>% spread(Site,N)

Species Site1.d1 Site1.d2 Site1.d3 Site2.d2 Site2.d4 Site3.d3 Site3.d5
1 Species1       17       14       NA       NA       NA       NA       NA
2 Species2        1       NA       NA       NA       NA       NA       NA
3 Species3        4       NA       NA       NA       NA       NA       NA
4 Species4        1       NA       NA       NA       NA       NA       NA
5 Species5       NA       NA        1       NA       NA       NA       NA
6 Species6       NA       NA       NA        1       12       NA        1
7 Species7       NA       NA       NA       NA       NA        9       NA
``````

Second step: gather data in long form again, and replace NA with zeros.

``````    Species    Sites  N
1  Species1 Site1.d1 17
2  Species2 Site1.d1  1
3  Species3 Site1.d1  4
4  Species4 Site1.d1  1
5  Species5 Site1.d1  0
6  Species6 Site1.d1  0
7  Species7 Site1.d1  0
8  Species1 Site1.d2 14
9  Species2 Site1.d2  0
10 Species3 Site1.d2  0
11 Species4 Site1.d2  0
12 Species5 Site1.d2  0
13 Species6 Site1.d2  0
14 Species7 Site1.d2  0
........
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download