I have a Perl script that parses through HTML and modifies the content. I’d like to update my script below to wrap a noindex tag around a specific ID on the page.
Relevant Perl Info
undef $/;
my $doc = <>;
if ($main::atomz_search_url=~ m{mydomain.com/(.+?)/support}si)
{
$doc =~ s{<div id="header">}{<div id="header" class="noindex">}sig;
}
<form id="search" action="../results.html" method="post">
<fieldset>
...
</fieldset>
</form>
<noindex>
<form id="search" action="../results.html" method="post">
<fieldset>
...
</fieldset>
</form>
</noindex>
Given that this is one specific task a simple-minded text processing may do. If you have any more to do I would strongly recommend using a suitable package. There is a whole range of packages for various manipulations of HTML.
It is critical that html form
s cannot be nested, so you can search for your pair of form tags.
If you can read the whole page into a string
my $file = 'page_with_form.html';
my $page = do {
local $/ = undef;
open my $fh, '<', $file or die $!;
<$fh>;
};
$page =~ s{(<form\s+id="search".+?</form>)}{<noindex>\n$1</noindex>}s;
If the page is too large to read into a varialbe, read it line by line and use markers for (the lines with) the opening and closing tags for your form. Let me know if I need to add this.