iembry iembry - 19 days ago 7
R Question

r - Determine which values are absent from a given column thus have a 0 count

I am trying to determine which Species have a count of 0 (no value for N) by unique Site.

For example, Species5 through Species33 are not present in Site1.d1. Species2 through Species33 are not present in Site1.d2, Species1 through Species4 & Species6 through Species33 are not present in Site1.d3, and so forth.

There are 33 total Species (Species1 to Species33), there are 4 different sets of Sites (Site1 to Site4), ranging from d1 to d45 (for example, Site1.d1 to Site1.d45 & Site2.d1 to Site2.d35, etc.).

I want to add those missing Species with a N of 0 to the existing data.table named testit.

# Species Site N
# Species1 Site1.d1 17
# Species2 Site1.d1 1
# Species3 Site1.d1 4
# Species4 Site1.d1 1
# Species1 Site1.d2 14
# Species5 Site1.d3 1
# Species6 Site2.d2 1
# Species6 Site2.d4 12
# Species7 Site3.d3 9
# Species6 Site3.d5 1


testit <- structure(list(Species = structure(c(1L, 2L, 3L, 4L, 1L, 5L,
6L, 6L, 7L, 6L), .Label = c("Species1", "Species2", "Species3",
"Species4", "Species5", "Species6", "Species7"), class = "factor"),
Site = structure(c(1L, 1L, 1L, 1L, 2L, 3L, 4L, 5L, 6L, 7L
), .Label = c("Site1.d1", "Site1.d2", "Site1.d3", "Site2.d2",
"Site2.d4", "Site3.d3", "Site3.d5"), class = "factor"), N = c(17L,
1L, 4L, 1L, 14L, 1L, 1L, 12L, 9L, 1L)), .Names = c("Species",
"Site", "N"), class = "data.frame", row.names = c(NA, -10L))


species <- sprintf("Species%d", 1:33)

fullsites1 <- sprintf("Site1.d%d", 1:45)

fullsites2 <- sprintf("Site2.d%d", 1:35)

fullsites3 <- sprintf("Site3.d%d", 1:40)

fullsites4 <- sprintf("Site4.d%d", 1:42)

fullsites <- c(fullsites1, fullsites2, fullsites3, fullsites4)


This is what I have tried thus far:

testit[, which(species %chin% testit$Species) == FALSE, by = list(Species, Site)]


This does not get me what I'm looking for.

What suggestions do you have?

Thank you.

Answer

Try this out using tidyr

library(tidyr)

xx <- testit %>% 
        spread(Site,N) %>% 
        gather(Sites,N,Site1.d1:Site3.d5)

xx$N[is.na(xx$N)] <- 0

First step: spread will give all the combinations

testit %>% spread(Site,N)

   Species Site1.d1 Site1.d2 Site1.d3 Site2.d2 Site2.d4 Site3.d3 Site3.d5
1 Species1       17       14       NA       NA       NA       NA       NA
2 Species2        1       NA       NA       NA       NA       NA       NA
3 Species3        4       NA       NA       NA       NA       NA       NA
4 Species4        1       NA       NA       NA       NA       NA       NA
5 Species5       NA       NA        1       NA       NA       NA       NA
6 Species6       NA       NA       NA        1       12       NA        1
7 Species7       NA       NA       NA       NA       NA        9       NA

Second step: gather data in long form again, and replace NA with zeros.

    Species    Sites  N
1  Species1 Site1.d1 17
2  Species2 Site1.d1  1
3  Species3 Site1.d1  4
4  Species4 Site1.d1  1
5  Species5 Site1.d1  0
6  Species6 Site1.d1  0
7  Species7 Site1.d1  0
8  Species1 Site1.d2 14
9  Species2 Site1.d2  0
10 Species3 Site1.d2  0
11 Species4 Site1.d2  0
12 Species5 Site1.d2  0
13 Species6 Site1.d2  0
14 Species7 Site1.d2  0
........
Comments