Parsing a latex file in Perl

Apologies for the very basic question!

I just want to read in a latex file (so text basically) and output all the (say) theorems, which are always in the format

some lines of latex

I always kind of figured Perl was the right language for this!

Of course, I only know very basic programming in C++ and Java, and virtually no Perl.

Nonetheless I can currently read in a text file, and process it line by line.

It seems the most basic way to do this is:

($string =~ /pattern/)

I started getting confused by then reading about control codes like ?,*+,$, etc.

Any simple references or links to get me started?

(I put this on here and not the Tex site, as it could be useful generally for reading text files, and not just LaTeX!)

Answer Source

If you're on a Unix-y machine (this includes Macs), for a task this small you should reach for sed first:

$ sed -ne '/^\\begin{theorem}$/,/^\\end{theorem}$/p' doc.tex

If you're on Windows, though, you don't get sed bundled with the OS, and perl is rather easier to install AIUI, so here's the equivalent:

> perl -ne 'print if /^\\begin\{theorem\}$/.../^\\end\{theorem\}$/;' doc.tex

You may notice a distinct resemblance between these two commands. That's not an accident; Perl took ideas from many of the older Unix text-munging utilities, sed included.

