jan jan - 3 months ago 11
Perl Question

replace words with perl script reading from table (to anonymize names)

I am looking for a way to replace words in a large text with perl and would like to read the words from another file containing the substitutions. I know I can do:

#!/usr/bin/perl

use warnings;
use strict;

open my $fh_in, '<', $ARGV[0] or die "No input: $!";
open my $fh_out, '>', $ARGV[1] or die "No output: $!";

while (<$fh_in>)
{
s/John/Jack/g;

print $fh_out $_;
}


... and put any number of
s/Fred/Frank/g;
lines.
Is it possible to refer to an external text file containing all the substitutions? It's in order to anonymize Names in interviews.

I'm thinking of keeping a simple text file in the Format:

Name Pseudonym
John Jack
Fred Frank
etc.


(separated by tabs)

If there are better ways to do it I'd be thankful for suggestions.
The original comes from an Excel database that has all the name substitutions in two columns but it's fairly easy to get that into a text file and I don't want to make it too complicated since I'm not very familiar with scripting and stuff.

Answer

Read the file that contains the correspondances in a hash then do the substitution like this:

#!/usr/bin/perl

use warnings;
use strict;

open my $fh_in, '<', $ARGV[0] or die "No input: $!";
open my $fh_out, '>', $ARGV[1] or die "No output: $!";

open my $fh_pseudo, '<', "path/to/pseudo" or die "unable to open pseudo: $!";
my %corres;
while (<$fh_pseudo>) {
    chomp;
    my ($name, $pseudo) = split /\t/, $_;
    $corres{$name} = $pseudo if $name && $pseudo;
}

while (my $line = <$fh_in>) {
    $line =~ s/\b$_\b/$corres{$_}/g for keys %corres;
    print $fh_out $line;
}   
Comments