Rambo Rambo - 1 month ago 6
R Question

R - Combining duplicate rows within dataframe in R :

I have a dataframe as below: please note that

COL1
is having duplicate entries

COL1 COL2 COL3
10 hai 2
10 hai 3
10 pal 1


I want the output to be like this as shown below: i.e
COL1
should have the unique entry alone(10),
COL2
should contain the merged entries under it without duplicates(hai pal), and
COL3
should contain the sum of entries(2+3+1=6)

OUTPUT:

COL1 COL2 COL3
10 hai pal 6

Answer

Perhaps we need to aggregate by group. Convert the 'data.frame' to 'data.table' (setDT(df1), grouped by 'COL1', paste the unique elements in 'COL2' together as well as get the sum of 'COL3'.

library(data.table)
setDT(df1)[,.(COL2  = paste(unique(COL2), collapse=" "), COL3= sum(COL3)) , by = COL1]
#     COL1    COL2 COL3
#1:    10 hai pal    6
Comments