Sumathi Gokul Sumathi Gokul - 7 months ago 23
Perl Question

Perl script to remove only matched duplicate lines?

I know that hashes can be used to remove duplicate lines in a file and it removes all the duplicate lines in a file. I used the following lines to remove all duplicate lines in a file..

my %lines;
while (<DATA>) {
print if not $lines{$_}++;
}


But, i need to remove only duplicate lines with matched patterns...
Sample input file:

line1
line2
line3
line1 #duplicate line
line2 #duplicate line
line4
line5


Though both line1 and line2 are duplicated, i only want to remove duplication of line1.

output:

line1
line2
line3
line2 #this duplicated line need to be resumed
line4
line5


Any suggestion to combine hashes and regex to achieve my requirement???

Answer

This solution allows you to set up a regex pattern $check_dups that defines which lines are susceptible to duplicate removal. If a line matches that pattern then it is removed if it has been seen before; all other lines are retained

Here, only duplicates of lines that match /line1/ are removed as required by the example in your question

use strict;
use warnings qw/ all FATAL /;

my $check_dups = qr/line1/;

my %seen;

while ( <DATA> ) {
    if ( /$check_dups/ ) {
        print unless $seen{$_}++;
    }
    else {
        print;
    }
}

__DATA__
line1
line2
line3
line1
line2
line4
line5

output

line1
line2
line3
line2
line4
line5
Comments