Noobie Noobie - 1 month ago 7
R Question

How to import in R variables that contain lists created in Python?

I have a

csv
that contains a variable that appears as follows (after I read it in R using
fread
followed by
as_tibble
):

myvar
<chr>
[]
[u'welcome']
[u'the oil price']


The variable has been created in
Python
, and I have to deal with this pythonesque list.

Is there a way using the
tidyverse
(
dplyr
and others) to actually read-in this variable directly as a proper string (and not a list) without filtering myself all the
[
,
]
] and
u'
?

myvar_wanted
<chr>
NA
'welcome'
'the oil price'


Thanks!

Answer

If your strings won't contain any single quotes, you can use '\\[u\'(.*)\'\\]' which matches a bracket followed by "u" and the text surrounded by single quotes ended with another bracket. Then you can extract what was between the single quotes by capturing it (this is myvar3 below)

Easier (to me) is to capture exactly what you want and ignore the rest, so \'(.*)\'|. will match a single quote, capture any character any number of times to group \1 up to another single quote. The |. allows us not to write out the exact pattern as we did in myvar3.

data <- data.frame(myvar = c("[]", "[u'welcome']", "[u'the oil price']"))

within(data, {
  myvar2 <- gsub('\'(.*)\'|.', '\\1', myvar)
  myvar3 <- gsub('\\[(?:u\'(.*)\')?\\]', '\\1', myvar)
})


#                myvar        myvar3        myvar2
# 1                 []                           
# 2       [u'welcome']       welcome       welcome
# 3 [u'the oil price'] the oil price the oil price
Comments