Floni Floni - 11 months ago 45
R Question

R: find the closest observation to averages

I have a table with description statistics (means for a, b and c) per type

### stats
type <- c("a","b","c","d","e","f","g","h","i","j","k","l")
mean_a <- c(0,1,1,0,2,2,0,4,4,0,5,5)
mean_b<- c(4,7,8,0,3,10,5,4,7,0,1,6)
mean_c<- c(1,2,0,3,4,5,1,24,3,0,4,5)
stats <- data.frame(type, mean_a, mean_b, mean_c)


I have a dataset with observations of specimen for the parameters a, b and c.
Each of the specimens have a particular type

# data
Id <- c("ted","bert","test","john","elf","fea","goul","houl","ili","jok","ko","lol")
type <- c("a","a","b","d","f","f","c","d","a","b","k","l")
a <- c(2,1,3,2,1,2,0,1,2,1,5,5)
b<- c(1,3,4,7,5,4,5,6,5,0,1,6)
c<- c(3,5,2,6,8,5,1,5,3,1,6,6)
data <- data.frame(Id, type, a, b, c )


Following these two tables, I would like to get from
data
the specimen the most representative of the type following the statistics in
stats
.
By most representative, I would like to get the one with the closest values for a, b and c to their respectives averages.

I can not find ideas on internet following 3 averages( a, b and c). Help is welcome! Ouput wanted (but not sure if ted, test and john are the closest to the averages for the types a, b and c):

# output wanted
Id <- c("ted","test","john")
type <- c("a","b","c")
a <- c(2,3,2)
b<- c(1,4,7)
c<- c(3,2,6)
data2 <- data.frame(Id, type, a, b, c )

Answer Source

The "most representative" as you mention on its own is very vague but here is an attempt which finds the difference between the values from data and the mean_values from stats and keeps the one with the lowest average. Since I joined the data frames before hand, you can use the select() function at the end of the code and modify (keep/drop variables) accordingly.

library(dplyr)
df1 <- merge(data1, stats, by = 'type')
df1 %>% 
  mutate(new = abs(rowMeans(mapply(`-`, df1[,(3:5)], df1[,(6:8)])))) %>% 
  group_by(type) %>% 
  filter(new == min(new)) %>% 
  select(-new)

#Source: local data frame [7 x 8]
#Groups: type [7]

#    type     Id     a     b     c mean_a mean_b mean_c
#  <fctr> <fctr> <dbl> <dbl> <dbl>  <dbl>  <dbl>  <dbl>
#1      a    ted     2     1     3      0      4      1
#2      b   test     3     4     2      1      7      2
#3      c   goul     0     5     1      1      8      0
#4      d   houl     1     6     5      0      0      3
#5      f    elf     1     5     8      2     10      5
#6      k     ko     5     1     6      5      1      4
#7      l    lol     5     6     6      5      6      5