Mike Froetscher Mike Froetscher - 6 months ago 13
Perl Question

Pattern match--lookahead and lookbehind

use strict;
use warnings;
use XML::Twig;
my @discard = qw / abc de bond/;
my $filter = join '|', @discard;
$filter = qr/\b(?:$filter)\b/;
my $twig = XML::Twig->new;
$twig->parse(\*DATA);
for my $line ( $twig->findnodes('//line') ) {
$line->delete if $line->text =~ $filter;
}
$twig->print;

__DATA__
<data>
<line> sdfe abc adsfefsdf </line>
<line> abcsdffedcfsdf sdf </line>
<line> sdfe </line>
<line> abc </line>
<line> sdabc sfefsdf </line>
<line>
<id> bond </id>
<dest> UK </dest>
adsfefsdf
</line>
<line> fhgh kk jj hjsda </line>
<line> abc </line>
..
..
..
</data>


The above program generates the following result:

<data><line> abcsdffedcfsdf sdf </line><line> sdfe </line><line> sdabc sfefsdf </line><line> fhgh kk jj hjsda </line>
..
..
..
</data>


The follows is the desired output:

<data>
<line> sdfe </line>
<line> fhgh kk jj hjsda </line>
..
..
..
</data>


Conditions to be accounted for the desired output:


  1. Match, Pre-match, Post-match the input values provided in the array and remove the tags from input data in which they are present

    Example:
    Match ---- abc

    Pre-match ---- sdabc

    Post-match ---- abcsdffedcfsdf

  2. Ensure the format of output is in a similar fashion as input data



**Match,Prematch and Postmatch are my terminologies as described above.

Answer

Are you asking how to filter out the line elements that contain words that start or end with one of the strings in @discard? If so, simply replace the search pattern with the following:

my $filter = join '|', map quotemeta, @discard;
$filter = "(?:$filter)";
$filter = qr/\b$filter|$filter\b/;

Output:

<data><line> sdfe </line><line> fhgh kk jj hjsda </line>
    ..
    ..
    ..
</data>