Dev Dev - 5 months ago 24
Perl Question

Perl program to find matching words in a paragraph

I have two text files.
The first one has a list of words , like the following :
File 1.txt

Laura
Samuel
Gerry
Peter
Maggie


The second one has paragraphs on it. For e.g. :
File2.txt

Laura
is
about
to
meet
Gerry
and
is
planning
to
take
Peter
along


All I want the program to do is , look for common words and print "Match" beside the matching words in File2.txt or to a third output file.
So the desired output should look like this.

Laura | MATCH
is
about
to
meet
Gerry | MATCH
and
is
planning
to
take
Peter | MATCH
along


I have tried the following code , however I am not getting the desired output. Please help

use warnings;
use strict;
use Data::Dumper;
my $result = {};
my $first_file = shift || 'File1.txt';
my $second_file = shift || 'File2.txt';
my $output = 'output2.txt';
open my $a_fh, '<', $first_file or die "$first_file: $!";
open my $b_fh, '<', $second_file or die "$second_file: $!";
open (OUTPUT,'>'.$output) or die "Cannot create $output.\n";
while(my $line = <$a_fh>) {
chomp;
next if /^$/;
$result->{$_}++;
}
while(my $line = <$b_fh>) {
chomp;
next if /^$/;
if ($result->{$_}) {
delete $result->{$_};
$result->{join " |"=> $_,"MATCH"}++;
}
else {
$result->{$_}++;
}
}
{ $Data::Dumper::Sortkeys = 0;
print OUTPUT Dumper $result;
}


However , the output that I am getting is like this.

Laura | MATCH
Samuel | MATCH
take
Maggie | MATCH
Laura
about
to
Gerry
meet
Gerry | MATCH
and
is
Maggie |MATCH
planning
to
Peter |MATCH
take
Peter |MATCH


the output is not in a paragraph format , neither its printing MATCH for all matches.
Please advise.

Answer

Here's one example. I read in the wordlist file and put them all into a hash, then iterate over the paragraph file one line at a time. I then separate all the words on each line, and print them, but only after checking whether the word is in wordlist. If it is, I print it with " | MATCH".

My example paragraph file:

Laura is about to meet Gerry, and is planning to take Peter along.

But Peter and Sarah have other plans.

The code:

use warnings;
use strict;

open my $fh, '<', 'file.txt' or die $!;
open my $word_fh, '<', 'wordlist.txt' or die $!;

my %words_to_match = map {chomp $_; $_ => 1} <$word_fh>;

close $word_fh;

while (<$fh>){
    chomp;
    my @words_in_line = split;

    for my $word (@words_in_line){
        $word =~ s/[\.,;:!]//g; # strip out punctuation
        $word .= ' | MATCH' if exists $words_to_match{$word};
        print "$word\n";
    }
}

Output:

Laura | MATCH
is
about
to
meet
Gerry | MATCH
and
is
planning
to
take
Peter | MATCH
along
But
Peter | MATCH
and
Sarah
have
other
plans

If you want to print it to a file, open a write file handle, and change the print statement inside the while loop to print $wfh ....

Comments