Rick Arko - 8 months ago 46

R Question

**Problem:**

For a set of Personal IDs there are six conditions (5 binary and 1 continuous valued) stored in a dataframe.

Each condition can be thought of as a single observance of a characteristic. Every time a binary condition is observed for a Personal ID, a row is added to the dataframe with the person's

`ID`

`Condition`

`WT`

Every time the continuous valued condition is observed the same process occurs except the

`Condition`

`CON3`

`WT`

The dataframe looks something like this:

`ID Condition WT`

29 CON1 1

29 CON1 1

79 CON1 1

. . .

. . .

25 CON3 .3181

5 CON3 .2316

33 CON3 .1428

47 CON3 .6651

47 CON3 .0948

47 CON3 .3839

. . .

. . .

48 CON6 1

22 CON6 1

I want to generate a new dataframe that only contains one row for each person ID. I also want a separate column for each condition, where if the original dataframe contained a row for a binary condition then the new dataframe will represent this as a one for the named condition column.

Each person can show up multiple times in the original dataframe. If this is the case I want to return their maximum value for the non-binary valued condition.

I'm trying to generate a new dataframe which lists each unique person ID along with columns (CON1 thru CON6) where the dummy condition columns take a value of one if the unique person ID did have a row for that condition. The continuous valued

`CON3`

I was able to work out how to return a one for the binary conditions, but I can't think of how to also grab the maximum value for the continuous condition for every unique ID.

`# SET SEED and Generate Data Frame`

set.seed(123)

# Set # of times each binary condition is met

CON1 = 4

CON2 = 12

CON4 = 3

CON5 = 6

CON6 = 3

df <- as.data.frame(

rbind(

cbind( as.numeric( sample(99, CON1, replace = T)), rep("CON1", CON1) , as.numeric(rep(1, CON1))),

cbind( as.numeric( sample(99, CON2, replace = T)), rep("CON2", CON2), as.numeric(rep(1, CON2))),

cbind( as.numeric( sample(99, 22, replace = T)), rep("CON3", 22), runif(22, min = 0, max = 1)),

cbind( as.numeric( rep(47, 5) ), rep("CON3", 5) , runif(5, min = 0, max = 1)),

cbind( as.numeric( sample(99, CON4, replace = T)), rep("CON4", CON4) , as.numeric(rep(1, CON4))),

cbind( as.numeric( sample(99, CON5, replace = T)), rep("CON5", CON5) , as.numeric(rep(1, CON5))),

cbind( as.numeric( sample(99, CON6, replace = T)), rep("CON6", CON6) , as.numeric(rep(1, CON6)))

)

) %>%

setnames(old = c("V1", "V2", "V3"), new = c("ID", "Condition", "WT"))

# Generate Results

results <- as.data.frame(

cbind(

"ID" = unique(df$ID),

"CON1" = sapply(unique(df$ID), function(x) ifelse( sum(df[df$ID == x, "Condition"] == "CON1") >=1, 1, 0)),

"CON2" = sapply(unique(df$ID), function(x) ifelse( sum(df[df$ID == x, "Condition"] == "CON2") >=1, 1, 0)),

# "CON3" = sapply(unique(df$ID), function(x) max(df[df$ID == x & df$Condition == "CON3", "WT"])),

"CON4" = sapply(unique(df$ID), function(x) ifelse( sum(df[df$ID == x, "Condition"] == "CON4") >=1, 1, 0)),

"CON5" = sapply(unique(df$ID), function(x) ifelse( sum(df[df$ID == x, "Condition"] == "CON5") >=1, 1, 0)),

"CON6" = sapply(unique(df$ID), function(x) ifelse( sum(df[df$ID == x, "Condition"] == "CON6") >=1, 1, 0))

)

)

# Check Sums

results %>%

select(CON1, CON2, CON4, CON5, CON6) %>% colSums()

Answer

I understood that you want per individual the maximum value (if it is binary or continuous does not matter for the maximum).

```
library(reshape2)
s1 <- df %>% group_by(ID, Condition) %>%
summarise(value = max(as.numeric(as.character(WT))))
s1 %>% dcast(ID ~ Condition)
```

result:

```
ID CON1 CON2 CON3 CON4 CON5 CON6
1 1 NA NA NA NA 1 NA
2 11 NA 1 NA NA NA NA
3 15 NA NA 0.44220007 NA NA NA
4 22 NA NA 0.37446278 NA NA 1
5 25 NA NA 0.31818101 NA NA NA
6 29 1 NA 0.04583117 NA NA NA
7 3 NA NA 0.12753165 NA NA NA
8 33 NA NA 0.14280002 NA NA NA
9 38 NA NA NA NA NA 1
10 41 1 NA NA NA NA NA
11 44 NA NA NA NA 1 NA
12 45 NA 1 NA 1 NA NA
13 46 NA 1 NA NA NA NA
14 47 NA NA 0.81464004 NA NA NA
15 48 NA NA 0.75330786 NA NA 1
16 5 NA 1 0.23162579 NA NA NA
17 53 NA 1 NA NA NA NA
18 54 NA NA 0.26597264 NA NA NA
19 55 NA 1 NA NA NA NA
20 57 NA 1 NA NA NA NA
21 59 NA NA 0.85782772 NA NA NA
22 63 NA NA NA NA 1 NA
23 64 NA NA 0.15244475 NA NA NA
24 65 NA NA 0.23303410 NA NA NA
25 68 NA 1 NA NA NA NA
26 69 NA NA 0.56094798 NA NA NA
27 71 NA NA 0.46596245 NA 1 NA
28 75 NA NA NA NA 1 NA
29 76 NA NA 0.89504536 NA NA NA
30 79 1 NA 0.20653139 NA 1 NA
31 81 NA NA NA 1 NA NA
32 88 1 NA NA NA NA NA
33 89 NA 1 0.41372433 NA NA NA
34 90 NA 1 0.12189926 NA NA NA
35 94 NA 1 NA NA NA NA
36 95 NA 1 0.41454634 NA NA NA
37 96 NA NA 0.79892485 NA NA NA
38 99 NA NA 0.13880606 NA NA NA
```

Source (Stackoverflow)