Floni - 1 year ago 71

R Question

I have a table with description statistics (means for a, b and c) per type

`### stats`

type <- c("a","b","c","d","e","f","g","h","i","j","k","l")

mean_a <- c(0,1,1,0,2,2,0,4,4,0,5,5)

mean_b<- c(4,7,8,0,3,10,5,4,7,0,1,6)

mean_c<- c(1,2,0,3,4,5,1,24,3,0,4,5)

stats <- data.frame(type, mean_a, mean_b, mean_c)

I have a dataset with observations of specimen for the parameters a, b and c.

Each of the specimens have a particular type

`# data`

Id <- c("ted","bert","test","john","elf","fea","goul","houl","ili","jok","ko","lol")

type <- c("a","a","b","d","f","f","c","d","a","b","k","l")

a <- c(2,1,3,2,1,2,0,1,2,1,5,5)

b<- c(1,3,4,7,5,4,5,6,5,0,1,6)

c<- c(3,5,2,6,8,5,1,5,3,1,6,6)

data <- data.frame(Id, type, a, b, c )

Following these two tables, I would like to get from

`data`

`stats`

By most representative, I would like to get the one with the closest values for a, b and c to their respectives averages.

I can not find ideas on internet following 3 averages( a, b and c). Help is welcome! Ouput wanted (but not sure if ted, test and john are the closest to the averages for the types a, b and c):

`# output wanted`

Id <- c("ted","test","john")

type <- c("a","b","c")

a <- c(2,3,2)

b<- c(1,4,7)

c<- c(3,2,6)

data2 <- data.frame(Id, type, a, b, c )

Recommended for you: Get network issues from **WhatsUp Gold**. **Not end users.**

Answer Source

The "most representative" as you mention on its own is very vague but here is an attempt which finds the difference between the values from `data`

and the mean_values from `stats`

and keeps the one with the lowest average.
Since I joined the data frames before hand, you can use the `select()`

function at the end of the code and modify (keep/drop variables) accordingly.

```
library(dplyr)
df1 <- merge(data1, stats, by = 'type')
df1 %>%
mutate(new = abs(rowMeans(mapply(`-`, df1[,(3:5)], df1[,(6:8)])))) %>%
group_by(type) %>%
filter(new == min(new)) %>%
select(-new)
#Source: local data frame [7 x 8]
#Groups: type [7]
# type Id a b c mean_a mean_b mean_c
# <fctr> <fctr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 a ted 2 1 3 0 4 1
#2 b test 3 4 2 1 7 2
#3 c goul 0 5 1 1 8 0
#4 d houl 1 6 5 0 0 3
#5 f elf 1 5 8 2 10 5
#6 k ko 5 1 6 5 1 4
#7 l lol 5 6 6 5 6 5
```

Recommended from our users: **Dynamic Network Monitoring from WhatsUp Gold from IPSwitch**. ** Free Download**