Thomas Thomas - 5 months ago 9
HTML Question

Escaping special Tags for HTML

I'm pretty new to perl and recently wrote a converter for our SharePoint.
It basically takes our old wiki's html pages and converts them to aspx pages with SP classes and so on.

Everything works fine till the point someone used

<tags>
as Text.
Here's an example form the html of the old twiki:

<li> Moduldateinamen haben folgendes Format <code> <Content>_<Type>_<Name>_</code> ...


So
<Content> <Type> <Name>
is text wrapped in a
<code>
Tag

How it looks in old wiki:

How it looks in old wiki

How it looks after converting to aspx and uploaded to SharePoint:
How it looks after converting to aspx and uploaded to SharePoint

You can see that SP tried to interpret them as tags (sure), not as text and therefore it wont be displayed.

For SharePoint Pages I need an escaped HTML markup between SP ASPX markup.
So I've changed f.e.
<
to
&lt;
and so on via regex.

However the example snippet I posted should look like this in the ASPX:

&lt;li&gt; Moduldateinamen haben folgendes Format &lt;code>&gt;openTagContentclosingTag_openTagTypeclosingTag_openTagNameclosingTag_&lt;/code&gt;


So < converts to openTag and > to closing Tag but only for actual content between this
<li>
tag. Later this needs to be changed by hand (I don't see another way)

How can I achieve that only "text" tags get escaped with openTag/closingTag but "real" HTML markup gets escape in this manner
<
to
&lt;

Answer

As far as I understand the question right, all you need is a regex like:

$page =~ s{(?<=<code>)(.+?)(?=<\/code>)}
          {
              my $text = $1;
              $text =~ s/([<>])/ $1 eq '<' ? '&lt;': '&gt;'/ge;
              $text;
          }gxe;
Comments