ontherocks ontherocks - 9 days ago 6
Perl Question

perl - extract substring to a character with count zero or more

I have the following strings in a file

1. aaa bbb zccc ddd eee;
2. yyaaa bbb zccc dzdd eee; ('z' is present multiple times)
3. yyaaa bbb ccc *zddd eee; (special character '*' present)
4. yyaaa bbb ccc * zddd eee; (special character '*' present)
5. aaa bbb ccc* zddd eee; (special character '*' present)
6. aaa bbb ccc ddd eee; ('z' is absent)


Another example file

1. aaa bbb zccc ddd eee;
2. yyaaa bbb zccc dzdd eee;
3. yyaaa bbb *ccc * zddd eee;
4. yyaaa bbb * ccc zddd eee;
5. aaa bbb* ccc zddd eee;
6. aaa bbb ccc ddd eee;


In each line, I want to extract the substring from the end of
aaa
to the first presence of
z
(minus the
z
). If
z
is absent, it should print the whole string. If there are special characters it should omit them.

REQUIRED OUTPUT

bbb
bbb
bbb ccc
bbb ccc
bbb ccc
aaa bbb ccc ddd eee


I have tried the following but it doesn't give the output I am seeking

my $file = qq(test.txt);
open (my $IN, '<', $file) || die "Cannot open $file for read: $!";
my @lines=<$IN>;
close($IN);

foreach (@lines)
{
if( $_ =~ m/aaa\b(.*?)z/)
{
print "$1\n";
}
}


MY OUTPUT

bbb
bbb
bbb ccc *
bbb ccc *
bbb ccc*


I am not sure how to exclude the special character (tried character classes) and it doesn't output anything for line#6 where there is no 'z' character present.

Answer

You can use a negated character class as

if( $_ =~ m/aaa\b([^z;]*)/)
{
    $string = $1;
    $string =~ s/\*//g;
    print "$string\n";
}
# Outputs
# bbb
# bbb
# bbb ccc
# bbb ccc
# bbb ccc
# bbb ccc ddd eee
  • [^z;]* Matches anything other than z or ;
  • $string =~ s/\*//g; substitute * in the group with nothing.