aravind aravind - 1 month ago 7
R Question

Processing a list of variables (strings) in to relational format in R

I am trying to process the following string which is a record in my data frame in to a relational format as columns and rows

Below String has 3 set of lists so as a result I will have 3 columns with header value of identifier

"lbl_st"


Test_ls = [
[
lbl_st = "TestPrdct",
min = "",
max = "",
type = "dscrt123",
format = [
string,
11,
"",
""
],
sample_ls = [
"GM_TEST",
"GL_TEST"
]
],
[
lbl_st = "TestCTRY",
min = "",
max = "",
format = [
string,
35,
"",
""
],
type = "dscrt1345",
sample_ls = [
"ES"
]
],
[
lbl_st = "TestPrtnr",
min = "",
max = "",
format = [
string,
35,
"",
""
],
type = "dscrt1",
sample_ls = [
"S&G",
"Something Test",
"Abcd Test",
"Bcvd Test"
]
]
]


Below is the output format I am trying to achieve

TestPrdct;TestCTRY;TestPrtnr
GM_TEST;ES;S&G
GM_TEST;ES;Something Test
GM_TEST;ES;Abcd Test
GM_TEST;ES;Bcvd Test
GL_TEST;ES;S&G
GL_TEST;ES;Something Test
GL_TEST;ES;Abcd Test
GL_TEST;ES;Bcvd Test


I tried using strsplit but I am not sure how to iterate through a list with in a list. Any help is much appreciated. Thanks in advance.

Answer

Replace [ with list( and ] with ) and string with "string". Then we can parse and evaluate it to an R list, e, and create every combination of columns to get DF. Finally, add column names, convert columns to character and sort.

Test2 <- gsub('string', '"string"', gsub("]", ")", gsub("\\[", "list(", Test_ls)))
e <- eval(parse(text = Test2))

DF <- do.call(expand.grid, lapply(e, function(x) unlist(x$sample_ls)))

names(DF) <- sapply(e, "[[", "lbl_st")
DF[] <- lapply(DF, as.character)
o <- do.call(order, DF[1:2])
DF <- DF[o, ]

which gives:

> DF
  TestPrdct TestCTRY      TestPrtnr
2   GL_TEST       ES            S&G
4   GL_TEST       ES Something Test
6   GL_TEST       ES      Abcd Test
8   GL_TEST       ES      Bcvd Test
1   GM_TEST       ES            S&G
3   GM_TEST       ES Something Test
5   GM_TEST       ES      Abcd Test
7   GM_TEST       ES      Bcvd Test

Note: The input Test_ls in reproducible form is:

Test_ls <-
"[\n        [\n         lbl_st = \"TestPrdct\",\n         min = \"\",\n         max = \"\",\n         type = \"dscrt123\",\n         format = [\n                   string,\n                   11,\n                   \"\",\n                   \"\"\n                  ],\n         sample_ls = [\n                      \"GM_TEST\",\n                      \"GL_TEST\"\n                     ]\n        ],\n        [\n         lbl_st = \"TestCTRY\",\n         min = \"\",\n         max = \"\",\n         format = [\n                   string,\n                   35,\n                   \"\",\n                   \"\"\n                  ],\n         type = \"dscrt1345\",\n         sample_ls = [\n                      \"ES\"\n                     ]\n        ],\n        [\n         lbl_st = \"TestPrtnr\",\n         min = \"\",\n         max = \"\",\n         format = [\n                   string,\n                   35,\n                   \"\",\n                   \"\"\n                  ],\n         type = \"dscrt1\",\n         sample_ls = [\n                      \"S&G\",\n                      \"Something Test\",\n                      \"Abcd Test\",\n                      \"Bcvd Test\"\n                     ]\n        ]\n       ]"