MaruB MaruB - 3 months ago 11
R Question

Grouping by row- data.table type change

This is related to the question Group by in data.table in R which only keep non NA values from columns

Example:
I have

df <- data.frame(x = c('a', 'a', 'b', 'b' ), y = c(1,NA,2,NA), z = c(NA, 3, NA, 4))

df

x y z
1 a 1 NA
2 a NA 3
3 b 2 NA
4 b NA 4


and I want

df2 <- data.frame(x = c('a', 'b' ), y = c(1,2), z = c(3,4))

df2

x y z
1 a 1 3
2 b 2 4


I am having the same issue as in the above question and I tried the accepted answer and it worked, but it changed the type of the contents in my data frame. I need them to stay as numeric values for downstream analysis and using
as.numeric
afterwards did not work. I also tried solving the initial question using dplyr
group_by
but it didn't work either so I guess I am misunderstandig the function (still a beginner in R and data analysis in general!).

Sorry for the very basic question but I have been stuck trying to solve this for a while! Any suggestions are welcome.

Thanks!

Answer

We can do this with data.table

library(data.table)
dt1 <- setDT(df)[, lapply(.SD, function(x) x[!is.na(x)]), x]
str(dt1)
#Classes ‘data.table’ and 'data.frame':  2 obs. of  3 variables:
#$ x: Factor w/ 2 levels "a","b": 1 2
#$ y: num  1 2
#$ z: num  3 4

str(df)
#Classes ‘data.table’ and 'data.frame':  4 obs. of  3 variables:
#$ x: Factor w/ 2 levels "a","b": 1 1 2 2
#$ y: num  1 NA 2 NA
#$ z: num  NA 3 NA 4

If we needed, we can change the 'dt1' to 'data.frame' with the setDF

setDF(dt1)
Comments