user3745472 user3745472 - 1 year ago 86
LaTeX Question

Extract specific section from LaTeX file with python

I have a set of LaTeX files. I would like to extract the "abstract" section for each one:




I have tried the suggestion here: How to Parse LaTex file

And tried :

A = re.findall(r'\\begin{abstract}(.*?)\\end{abstract}', data)

Where data contains the text from the latex file. But A is just an empty list. Any help would be greatly appreciated!

Answer Source

.* does not match newlines unless the re.S flag is given:

re.findall(r'\\begin{abstract}(.*?)\\end{abstract}', data, re.S)


Consider this test file:


Title maybe
Good stuff
Other stuff

This gets the abstract:

>>> import re
>>> data = open('a.tex').read()
>>> re.findall(r'\\begin{abstract}(.*?)\\end{abstract}', data, re.S)
['\nGood stuff\n']


From the re module's webpage:


Make the '.' special character match any character at all, including a newline; without this flag, '.' will match anything except a newline.