Jean-Baptiste Fagot Jean-Baptiste Fagot - 4 months ago 39
LaTeX Question

How cut latex acronym chain in R dataframe

I have a latex file with my acronym definitions like :

\newacronym{AEP}{AEP}{Alimentation en Eau Potable}
\newacronym{AERMC}{AERMC}{Agence de l'Eau Rhône Méditerranée et Corse}
\newacronym[longplural=Cotes d'Abondance Numériques]{CAN}{CAN}{Cote d'Abondance Numérique}

My aim is to have a data frame with two columns like :

AEP Alimentation en Eau Potable
AERMC Agence de l'Eau Rhône Méditerranée et Corse
CAN Cote d'Abondance Numérique

I think it's possible with regex or strsplit formula, but I can't establish it, with lot of problems with

acronymes <- read_lines("acronymes.tex")
acronymes <- as.tbl(
acronymes %>%
rename(Complet = acronymes) %>%
filter(!grepl("^%.*", Complet)) # Because I have non used lines
acronymes$ABR <- sub("}.*","", acronymes$Complet)

Have you ideas or explicite manual for regex formulas ? Thank you


Maybe not the most elegant solution, but this works. You need to escape the braces with a double backslash:

a <- readLines("acronymes.tex")
acronyms <- gsub(".*\\}\\{(.*)\\}\\{.*", "\\1", a)
descriptions <- gsub(".*\\}\\{(.*)\\}$", "\\1", a)
data.frame(acronyms, descriptions)