user3919790 user3919790 - 3 months ago 7
R Question

Extracting matched words from a string

I have a database structure - abbreviated version below

structure(list(sex1 = c("totalmaleglobal", "totalfemaleglobal",
"totalglobal", "totalfemaleGSK", "totalfemaleglobal",
"totalfemaleUN")), .Names = "sex1", row.names = c(NA, 6L),
class="data.frame")


I want to extract the words 'total', 'totalmale', 'totalfemale'

How do do this?

I tried regex with the following code

pattern1=c("total")
pattern2=c("totalmale")
pattern3=c("totalfemale")

daly$sex <- str_extract(daly$sex1,pattern1)
daly$sex <- str_extract(daly$sex1,pattern2)
daly$sex <- str_extract(daly$sex1,pattern3)


But its giving me NA.

Answer

Maybe

library(stringr)
daly$sex <- str_extract(daly$sex1,paste(rev(mget(ls(pattern = "pattern\\d+"))), collapse="|"))
daly
#                sex1         sex
# 1   totalmaleglobal   totalmale
# 2 totalfemaleglobal totalfemale
# 3       totalglobal       total
# 4    totalfemaleGSK totalfemale
# 5 totalfemaleglobal totalfemale
# 6     totalfemaleUN totalfemale