Rick Arko Rick Arko - 2 months ago 13
R Question

Selecting the Maximum Value of

Problem:

For a set of Personal IDs there are six conditions (5 binary and 1 continuous valued) stored in a dataframe.

Each condition can be thought of as a single observance of a characteristic. Every time a binary condition is observed for a Personal ID, a row is added to the dataframe with the person's

ID
, the name of the
Condition
(either CON1, CON2, CON4, CON5, CON6) and a
WT
value of 1.

Every time the continuous valued condition is observed the same process occurs except the
Condition
value is now always
CON3
and the
WT
value is between zero and one.

The dataframe looks something like this:

ID Condition WT

29 CON1 1
29 CON1 1
79 CON1 1
. . .
. . .
25 CON3 .3181
5 CON3 .2316
33 CON3 .1428
47 CON3 .6651
47 CON3 .0948
47 CON3 .3839
. . .
. . .

48 CON6 1
22 CON6 1


I want to generate a new dataframe that only contains one row for each person ID. I also want a separate column for each condition, where if the original dataframe contained a row for a binary condition then the new dataframe will represent this as a one for the named condition column.

Each person can show up multiple times in the original dataframe. If this is the case I want to return their maximum value for the non-binary valued condition.

I'm trying to generate a new dataframe which lists each unique person ID along with columns (CON1 thru CON6) where the dummy condition columns take a value of one if the unique person ID did have a row for that condition. The continuous valued
CON3
should return the maximum value achieved for a unique ID.

Reproducible Example of my Attempt:

I was able to work out how to return a one for the binary conditions, but I can't think of how to also grab the maximum value for the continuous condition for every unique ID.

# SET SEED and Generate Data Frame

set.seed(123)

# Set # of times each binary condition is met
CON1 = 4
CON2 = 12
CON4 = 3
CON5 = 6
CON6 = 3

df <- as.data.frame(
rbind(
cbind( as.numeric( sample(99, CON1, replace = T)), rep("CON1", CON1) , as.numeric(rep(1, CON1))),
cbind( as.numeric( sample(99, CON2, replace = T)), rep("CON2", CON2), as.numeric(rep(1, CON2))),
cbind( as.numeric( sample(99, 22, replace = T)), rep("CON3", 22), runif(22, min = 0, max = 1)),
cbind( as.numeric( rep(47, 5) ), rep("CON3", 5) , runif(5, min = 0, max = 1)),
cbind( as.numeric( sample(99, CON4, replace = T)), rep("CON4", CON4) , as.numeric(rep(1, CON4))),
cbind( as.numeric( sample(99, CON5, replace = T)), rep("CON5", CON5) , as.numeric(rep(1, CON5))),
cbind( as.numeric( sample(99, CON6, replace = T)), rep("CON6", CON6) , as.numeric(rep(1, CON6)))
)
) %>%
setnames(old = c("V1", "V2", "V3"), new = c("ID", "Condition", "WT"))



# Generate Results

results <- as.data.frame(
cbind(
"ID" = unique(df$ID),
"CON1" = sapply(unique(df$ID), function(x) ifelse( sum(df[df$ID == x, "Condition"] == "CON1") >=1, 1, 0)),
"CON2" = sapply(unique(df$ID), function(x) ifelse( sum(df[df$ID == x, "Condition"] == "CON2") >=1, 1, 0)),

# "CON3" = sapply(unique(df$ID), function(x) max(df[df$ID == x & df$Condition == "CON3", "WT"])),

"CON4" = sapply(unique(df$ID), function(x) ifelse( sum(df[df$ID == x, "Condition"] == "CON4") >=1, 1, 0)),
"CON5" = sapply(unique(df$ID), function(x) ifelse( sum(df[df$ID == x, "Condition"] == "CON5") >=1, 1, 0)),
"CON6" = sapply(unique(df$ID), function(x) ifelse( sum(df[df$ID == x, "Condition"] == "CON6") >=1, 1, 0))

)
)


# Check Sums

results %>%
select(CON1, CON2, CON4, CON5, CON6) %>% colSums()

Answer

I understood that you want per individual the maximum value (if it is binary or continuous does not matter for the maximum).

library(reshape2)

s1 <- df %>% group_by(ID, Condition) %>%
  summarise(value = max(as.numeric(as.character(WT))))


s1 %>% dcast(ID ~ Condition)

result:

ID CON1 CON2       CON3 CON4 CON5 CON6
1   1   NA   NA         NA   NA    1   NA
2  11   NA    1         NA   NA   NA   NA
3  15   NA   NA 0.44220007   NA   NA   NA
4  22   NA   NA 0.37446278   NA   NA    1
5  25   NA   NA 0.31818101   NA   NA   NA
6  29    1   NA 0.04583117   NA   NA   NA
7   3   NA   NA 0.12753165   NA   NA   NA
8  33   NA   NA 0.14280002   NA   NA   NA
9  38   NA   NA         NA   NA   NA    1
10 41    1   NA         NA   NA   NA   NA
11 44   NA   NA         NA   NA    1   NA
12 45   NA    1         NA    1   NA   NA
13 46   NA    1         NA   NA   NA   NA
14 47   NA   NA 0.81464004   NA   NA   NA
15 48   NA   NA 0.75330786   NA   NA    1
16  5   NA    1 0.23162579   NA   NA   NA
17 53   NA    1         NA   NA   NA   NA
18 54   NA   NA 0.26597264   NA   NA   NA
19 55   NA    1         NA   NA   NA   NA
20 57   NA    1         NA   NA   NA   NA
21 59   NA   NA 0.85782772   NA   NA   NA
22 63   NA   NA         NA   NA    1   NA
23 64   NA   NA 0.15244475   NA   NA   NA
24 65   NA   NA 0.23303410   NA   NA   NA
25 68   NA    1         NA   NA   NA   NA
26 69   NA   NA 0.56094798   NA   NA   NA
27 71   NA   NA 0.46596245   NA    1   NA
28 75   NA   NA         NA   NA    1   NA
29 76   NA   NA 0.89504536   NA   NA   NA
30 79    1   NA 0.20653139   NA    1   NA
31 81   NA   NA         NA    1   NA   NA
32 88    1   NA         NA   NA   NA   NA
33 89   NA    1 0.41372433   NA   NA   NA
34 90   NA    1 0.12189926   NA   NA   NA
35 94   NA    1         NA   NA   NA   NA
36 95   NA    1 0.41454634   NA   NA   NA
37 96   NA   NA 0.79892485   NA   NA   NA
38 99   NA   NA 0.13880606   NA   NA   NA
Comments