hsteini hsteini - 3 years ago 95
R Question

aggregate multiple rows in dataframe

I have a dataframe that looks like this:

id=c(3, 3, 4, 5, 5)
a_2015 =c("abc", NA, NA, "abc", NA)
a_2016 = c("NA", "def", "abc", NA, "abc")
df = data.frame(id, a_2015, a_2016)

df

id a_2015 a_2016
1 3 abc NA
2 3 NA def
3 4 NA abc
4 5 abc NA
5 5 NA abc


that means that if in column a_2015 is an entry than there is an NA in a_2016 or viceversa. So you can never have in the same row an valid entry in both columns a_2015 and a_2016.

I would like to aggregate the dataframe like

id a_2015 a_2016
3 abc def
4 NA abc
5 abc abc


I tried to solve it with aggregate but now I think I need something like apply, or? I am thankful for any hints!

Answer Source

We can use

library(data.table)
setDT(df)[, lapply(.SD, function(x) x[!is.na(x)]), id][]
setDT(df)[, lapply(.SD, function(x) x[!is.na(x)]), id][]
#   id a_2015 a_2016
#1:  3    abc    def
#2:  4     NA    abc
#3:  5    abc    abc

data

id=c(3, 3, 4, 5, 5)
a_2015 =c("abc", NA, NA, "abc", NA)
a_2016 = c(NA, "def", "abc", NA, "abc")
df = data.frame(id, a_2015, a_2016)
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download