Bobby Bobby - 4 months ago 17
R Question

Create name-value pair data frame from input with delimiters

I have a data frame which can be created with this code:

input <- data.frame( 'ID'=c(1:3),
Destination=c("A\r\nB", "C", "D\r\nE\r\nF"),
Topic=c("W", "X", "Y\r\nZ") )


It looks like this:

ID Destination Topic
1 1 A\r\nB W
2 2 C X
3 3 D\r\nE\r\nF Y\r\nZ


I would like to create an output data frame that looks like this:

desiredOutput <- data.frame(
ID = c(1,1,1,2,2,3,3,3,3,3) ,
name=c( "Destination", "Destination", "Topic", "Destination", "Topic",
"Destination", "Destination", "Destination" , "Topic", "Topic"),
value=c("A","B", "W", "C", "X", "D", "E", "F", "Y", "Z") )

ID name value
1 1 Destination A
2 1 Destination B
3 1 Topic W
4 2 Destination C
5 2 Topic X
6 3 Destination D
7 3 Destination E
8 3 Destination F
9 3 Topic Y
10 3 Topic Z


Whenever the delimiter
\r\n
occurs, I would like to split the contents into separate rows, with the correct ID, the name of the column, and the corresponding value.

I can split a single column into a list using
strsplit
, but I don't know how to put the contents into a data frame as above apart from attempting to write a loop. I expect the
tidyr
package might be helpful.

strsplit(input$Destination, split = "\r\n")


How can this be done, ideally without a loop?

Answer

With tidyr, gather to long form, then use separate_rows to separate the joined elements:

library(tidyr)

input %>% gather(name, value, -ID) %>% separate_rows(value)
##    ID        name value
## 1   1 Destination     A
## 2   1 Destination     B
## 3   2 Destination     C
## 4   3 Destination     D
## 5   3 Destination     E
## 6   3 Destination     F
## 7   1       Topic     W
## 8   2       Topic     X
## 9   3       Topic     Y
## 10  3       Topic     Z

Note: If your data is factors instead of character, tidyr will warn you, as it coerces to character in order to rearrange. It will work regardless, but if you hate warnings, coerce to character manually before reshaping.