user3745472 - 1 year ago 86
LaTeX Question

# Extract specific section from LaTeX file with python

I have a set of LaTeX files. I would like to extract the "abstract" section for each one:

\begin{abstract}

.....

\end{abstract}

I have tried the suggestion here: How to Parse LaTex file

And tried :

A = re.findall(r'\\begin{abstract}(.*?)\\end{abstract}', data)


Where data contains the text from the latex file. But A is just an empty list. Any help would be greatly appreciated!

.* does not match newlines unless the re.S flag is given:

re.findall(r'\\begin{abstract}(.*?)\\end{abstract}', data, re.S)


### Example

Consider this test file:

\documentclass{report}
\usepackage[margin=1in]{geometry}
\usepackage{longtable}

\begin{document}
Title maybe
\begin{abstract}
Good stuff
\end{abstract}
Other stuff
\end{document}


This gets the abstract:

>>> import re

From the re module's webpage: