Rick Arko - 1 year ago 109
R Question

# Selecting the Maximum Value of

Problem:

For a set of Personal IDs there are six conditions (5 binary and 1 continuous valued) stored in a dataframe.

Each condition can be thought of as a single observance of a characteristic. Every time a binary condition is observed for a Personal ID, a row is added to the dataframe with the person's

`ID`
, the name of the
`Condition`
(either CON1, CON2, CON4, CON5, CON6) and a
`WT`
value of 1.

Every time the continuous valued condition is observed the same process occurs except the
`Condition`
value is now always
`CON3`
and the
`WT`
value is between zero and one.

The dataframe looks something like this:

``````ID    Condition   WT

29    CON1        1
29    CON1        1
79    CON1        1
.     .           .
.     .           .
25    CON3       .3181
5     CON3       .2316
33    CON3       .1428
47    CON3       .6651
47    CON3       .0948
47    CON3       .3839
.     .           .
.     .           .

48    CON6        1
22    CON6        1
``````

I want to generate a new dataframe that only contains one row for each person ID. I also want a separate column for each condition, where if the original dataframe contained a row for a binary condition then the new dataframe will represent this as a one for the named condition column.

Each person can show up multiple times in the original dataframe. If this is the case I want to return their maximum value for the non-binary valued condition.

I'm trying to generate a new dataframe which lists each unique person ID along with columns (CON1 thru CON6) where the dummy condition columns take a value of one if the unique person ID did have a row for that condition. The continuous valued
`CON3`
should return the maximum value achieved for a unique ID.

Reproducible Example of my Attempt:

I was able to work out how to return a one for the binary conditions, but I can't think of how to also grab the maximum value for the continuous condition for every unique ID.

``````# SET SEED and Generate Data Frame

set.seed(123)

# Set # of times each binary condition is met
CON1 = 4
CON2 = 12
CON4 = 3
CON5 = 6
CON6 = 3

df <- as.data.frame(
rbind(
cbind( as.numeric( sample(99, CON1,  replace = T)), rep("CON1", CON1) ,  as.numeric(rep(1, CON1))),
cbind( as.numeric( sample(99, CON2, replace = T)), rep("CON2", CON2), as.numeric(rep(1, CON2))),
cbind( as.numeric( sample(99, 22, replace = T)), rep("CON3", 22), runif(22, min = 0, max = 1)),
cbind( as.numeric( rep(47, 5)                 ), rep("CON3", 5) , runif(5, min = 0, max = 1)),
cbind( as.numeric( sample(99, CON4,  replace = T)), rep("CON4", CON4) , as.numeric(rep(1, CON4))),
cbind( as.numeric( sample(99, CON5,  replace = T)), rep("CON5", CON5) , as.numeric(rep(1, CON5))),
cbind( as.numeric( sample(99, CON6,  replace = T)), rep("CON6", CON6) , as.numeric(rep(1, CON6)))
)
) %>%
setnames(old = c("V1", "V2", "V3"), new = c("ID", "Condition", "WT"))

# Generate Results

results <- as.data.frame(
cbind(
"ID"   = unique(df\$ID),
"CON1" = sapply(unique(df\$ID), function(x) ifelse( sum(df[df\$ID == x, "Condition"] == "CON1") >=1, 1, 0)),
"CON2" = sapply(unique(df\$ID), function(x) ifelse( sum(df[df\$ID == x, "Condition"] == "CON2") >=1, 1, 0)),

# "CON3" = sapply(unique(df\$ID), function(x) max(df[df\$ID == x & df\$Condition == "CON3", "WT"])),

"CON4" = sapply(unique(df\$ID), function(x) ifelse( sum(df[df\$ID == x, "Condition"] == "CON4") >=1, 1, 0)),
"CON5" = sapply(unique(df\$ID), function(x) ifelse( sum(df[df\$ID == x, "Condition"] == "CON5") >=1, 1, 0)),
"CON6" = sapply(unique(df\$ID), function(x) ifelse( sum(df[df\$ID == x, "Condition"] == "CON6") >=1, 1, 0))

)
)

# Check Sums

results %>%
select(CON1, CON2, CON4, CON5, CON6) %>% colSums()
``````

I understood that you want per individual the maximum value (if it is binary or continuous does not matter for the maximum).

``````library(reshape2)

s1 <- df %>% group_by(ID, Condition) %>%
summarise(value = max(as.numeric(as.character(WT))))

s1 %>% dcast(ID ~ Condition)
``````

result:

``````ID CON1 CON2       CON3 CON4 CON5 CON6
1   1   NA   NA         NA   NA    1   NA
2  11   NA    1         NA   NA   NA   NA
3  15   NA   NA 0.44220007   NA   NA   NA
4  22   NA   NA 0.37446278   NA   NA    1
5  25   NA   NA 0.31818101   NA   NA   NA
6  29    1   NA 0.04583117   NA   NA   NA
7   3   NA   NA 0.12753165   NA   NA   NA
8  33   NA   NA 0.14280002   NA   NA   NA
9  38   NA   NA         NA   NA   NA    1
10 41    1   NA         NA   NA   NA   NA
11 44   NA   NA         NA   NA    1   NA
12 45   NA    1         NA    1   NA   NA
13 46   NA    1         NA   NA   NA   NA
14 47   NA   NA 0.81464004   NA   NA   NA
15 48   NA   NA 0.75330786   NA   NA    1
16  5   NA    1 0.23162579   NA   NA   NA
17 53   NA    1         NA   NA   NA   NA
18 54   NA   NA 0.26597264   NA   NA   NA
19 55   NA    1         NA   NA   NA   NA
20 57   NA    1         NA   NA   NA   NA
21 59   NA   NA 0.85782772   NA   NA   NA
22 63   NA   NA         NA   NA    1   NA
23 64   NA   NA 0.15244475   NA   NA   NA
24 65   NA   NA 0.23303410   NA   NA   NA
25 68   NA    1         NA   NA   NA   NA
26 69   NA   NA 0.56094798   NA   NA   NA
27 71   NA   NA 0.46596245   NA    1   NA
28 75   NA   NA         NA   NA    1   NA
29 76   NA   NA 0.89504536   NA   NA   NA
30 79    1   NA 0.20653139   NA    1   NA
31 81   NA   NA         NA    1   NA   NA
32 88    1   NA         NA   NA   NA   NA
33 89   NA    1 0.41372433   NA   NA   NA
34 90   NA    1 0.12189926   NA   NA   NA
35 94   NA    1         NA   NA   NA   NA
36 95   NA    1 0.41454634   NA   NA   NA
37 96   NA   NA 0.79892485   NA   NA   NA
38 99   NA   NA 0.13880606   NA   NA   NA
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download