user1754738 user1754738 - 7 months ago 15
Perl Question

Wrap Form wth New Tag

I have a Perl script that parses through HTML and modifies the content. I’d like to update my script below to wrap a noindex tag around a specific ID on the page.

Relevant Perl Info

undef $/;
my $doc = <>;

if ($main::atomz_search_url=~ m{mydomain.com/(.+?)/support}si)
{
$doc =~ s{<div id="header">}{<div id="header" class="noindex">}sig;
}


Current HTML

<form id="search" action="../results.html" method="post">
<fieldset>
...
</fieldset>
</form>


I simply want to find the FORM with an ID of “search” and wrap the entire FORM block (including original content) with a noindex tag.

<noindex>
<form id="search" action="../results.html" method="post">
<fieldset>
...
</fieldset>
</form>
</noindex>


Note: I can only use core modules so MOJO isn't an option.

Answer

Given that this is one specific task a simple-minded text processing may do. If you have any more to do I would strongly recommend using a suitable package. There is a whole range of packages for various manipulations of HTML.

It is critical that html forms cannot be nested, so you can search for your pair of form tags.

If you can read the whole page into a string

my $file = 'page_with_form.html';
my $page = do {
    local $/ = undef; 
    open my $fh, '<', $file or die $!; 
    <$fh>;
};
$page =~ s{(<form\s+id="search".+?</form>)}{<noindex>\n$1</noindex>}s;

If the page is too large to read into a varialbe, read it line by line and use markers for (the lines with) the opening and closing tags for your form. Let me know if I need to add this.