Mahe_sundar Mahe_sundar - 2 months ago 10x
Perl Question

Convert non-ASCII/UTF-8 characters into LaTeX codes

We have to convert non-ASCII, UTF-8, or named entity characters into LaTeX codes. Now we are using non-ASCII to Unicode, then Unicode to LaTeX/entity using a Perl script.

For example:

ó --> \'{o}
ó --> \'{o}
ó --> \'{o}

Is there any direct conversion from non-ASCII, or UTF-8 to LaTeX codes in Perl program/script?


This is very straightforward using the XML::Entities module to decode the entities, and the LaTeX::Encode module to re-encode them as LaTeX

Note that I've explicitly created an alias xml_decode for the decoding function, as the exported name is just decode, which is far too imprecise

use utf8;
use strict;
use warnings 'all';
use feature 'say';

use XML::Entities ();
use LaTeX::Encode 'latex_encode';
*xml_decode = \&XML::Entities::decode;

for my $s ( 'ó', 'ó', 'ó' ) {
    my $reencoded = latex_encode(xml_decode('all', $s));
    say $reencoded;