Sharon - 1 year ago 52

R Question

I have rather un-tidy data by category where sometimes there are multiple categories in a single field. I'd like to assign the same value to each category in a category entry after splitting the multi-category entry. For example, if the data is

`cat <- c("A,B,C", "B", "B,C", "A,E")`

val <- c(300, 350, 400, 450)

mydf <- data.frame(cat, val, stringsAsFactors = FALSE)

cat val

A,B,C 300

B 350

B,C 400

A,E 450

From the first row, I need 300 to be assigned to each category A, B & C, then from the second row 350 assigned to B, 400 assigned to each B and C from the third row and then 450 assigned to A and E in the fourth row.

I came up with a very kludge-y for loop to accomplish this, but I know this is not memory-efficient because it is using rbind on an existing data frame.

`resultsdf <- data.frame(temp_cats = character(0),`

temp_vals = numeric(0), stringsAsFactors = FALSE)

for(i in 1:nrow(mydf)){

temp_cats <- stringr::str_split(mydf$cat[i], ",")[[1]]

temp_vals <- rep(val[i], length(temp_cats))

temp_df <- data.frame(temp_cats, temp_vals, stringsAsFactors = FALSE)

resultsdf <- rbind(resultsdf, temp_df)

}

Just curious if anyone has a more elegant R-syntax way of accomplishing this to end up with

`temp_cats temp_vals`

1 A 300

2 B 300

3 C 300

4 B 350

5 B 400

6 C 400

7 A 450

8 E 450

Answer Source

**1) stack/unstack** Use `strsplit`

to split the strings and then unstack and stack it to get the long form `s`

required. Finally fix up the names and make `temp_vals`

numeric since `unstack`

coerced them to character. No packages are used.

```
s <- stack(unstack(transform(mydf, cat = strsplit(cat, ","))))
with(s, data.frame(temp_cats = values, temp_vals = as.numeric(as.character(ind))))
```

**2) dplyr/tidyr** Another approach is to use `unnest`

in the tidyr package:

```
library(dplyr)
library(tidyr)
mydf %>%
mutate(cat = strsplit(cat, ",")) %>%
unnest() %>%
transmute(temp_cat = cat, temp_vals = val)
```