eastafri - 1 year ago 106
R Question

insert values(rows) to a dataframe

I have a dataframe of this nature generated with a dplyr summary function.

``````pos nuc sample total
23 A 10028_1#2 3
23 C 10028_1#2 1
23 G 10028_1#2 5129
23 T 10028_1#2 128
231 C 10028_1#2 4
231 T 10028_1#2 3123
.
.
``````

A bar plot of this data with ggplot2 gives an 'uneven' bars because pos 231 is missing its A and G total values for the corresponding sample name. The values are missing and are generated by a program outside of R.

What would be an idiomatic way of inserting 0 totals for each missing value of A,T,G,C at each position for each corresponding value. In other words how do i get this dataframe?

``````pos nuc sample total
23 A 10028_1#2 3
23 C 10028_1#2 1
23 G 10028_1#2 5129
23 T 10028_1#2 128
231 C 10028_1#2 4
231 T 10028_1#2 3123
231 G 10028_1#2 0
231 A 10028_1#2 0
``````

We can use `complete` from `tidyr`

``````library(dplyr)
library(tidyr)
df1 %>%
complete(pos, nuc, nesting(sample), fill = list(total = 0))
#  pos   nuc    sample total
#  <int> <chr>     <chr> <dbl>
#1    23     A 10028_1#2     3
#2    23     C 10028_1#2     1
#3    23     G 10028_1#2  5129
#4    23     T 10028_1#2   128
#5   231     A 10028_1#2     0
#6   231     C 10028_1#2     4
#7   231     G 10028_1#2     0
#8   231     T 10028_1#2  3123
``````

Or we can use `expand.grid/merge` from `base R`

``````transform(merge(expand.grid(lapply(df1[1:3], unique)),
df1, all.x=TRUE), total = replace(total, is.na(total), 0))
``````

data

``````df1 <- structure(list(pos = c(23L, 23L, 23L, 23L, 231L, 231L),
nuc = c("A",
"C", "G", "T", "C", "T"), sample = c("10028_1#2", "10028_1#2",
"10028_1#2", "10028_1#2", "10028_1#2", "10028_1#2"), total = c(3L,
1L, 5129L, 128L, 4L, 3123L)), .Names = c("pos", "nuc", "sample",
"total"), class = "data.frame", row.names = c(NA, -6L))
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download