Joshua Joshua - 1 year ago 43
Perl Question

String masking: Confoming text to a given mask

Given the following text and mask string: -

text: the quick brown fox jumps over the lazy dog
mask: xx xxx xxxx x xxx

I'm trying to find a terse way to arrive at the result: -

th qui brow f jum

The mask conforms the text to it's pattern. The resulting string should have the same amount of words as the mask.

My current implentation is using
to zip the words of each list together and do a string substitution. ( I've copied the logic of the zip function into the below example so you don't need to install it to test)

# Squashed version of List::Zip->zip function
sub zip{map{[map{shift@{$_}}@_]}0..((sort map{0+@{$_}}@_)[0]-1)}

my $mask = 'xx xxx xxxx x xxx';
my $text = 'the quick brown fox jumps over the lazy dog';

for my $mt ( zip( [split(' ', $mask)], [split(' ', $text)] ) ) {
my ( $m, $t ) = @{ $mt };
$mask =~ s/ $m / substr( $t, 0, length($m) ) /xe;

print $mask; # OUTPUT: th qui brow f jum

... but I can't help but think there's a shorter way. Maybe a funky regex trick?

Suggestions welcome.


The accepted answer here is intriguing for it's use of
. Trying to figure out how to apply it to my problem. (Edit: Borodin pointed out why it's not applicable to this problem)

I should also note that arbitrary whitespace is of no concern, ie given:

text: 'one two three'
mask: 'x xx xxx'

I don't care if the result that comes back is
o tw thr
. The only requirement is the same number of words, and the same length words.


In the end I've accepted Alexandr's 'funky regex' solution. It's terse, and very fast, running the fastest Benchmarks by a comfortable margin.

Borodin's first solution, while very similar, created a regex pattern that didn't perform as well.

Borodin: (\S{1,2}) \S*\s+ (\S{1,3}) \S*\s+ (\S{1,4}) \S*\s+ (\S{1,1}) \S*\s+ (\S{1,3})
Alexandr: (\S{2})\S*\s+(\S{3})\S*\s+(\S{4})\S*\s+(\S{1})\S*\s+(\S{3})\S*

A few minor changes to Borodin's solution brings it on par with Alexandr's, but I gave it to Alexandr for arriving there first.

All the solutions are full of great and interesting idea's, thanks everyone.

Answer Source

Clean regex solution, transforming mask to the regex:

use strict;
use v5.10;

my $text = 'the quick brown fox jumps over the lazy dog';
my $mask = 'xx xxx xxxx x xxx';
say $mask;
$mask =~ s/(x+)/ '(\S{'.(length $1).'})\S*'/ge;
$mask =~ s/\s+/\\s+/g;
say $mask;
say join ' ', ($text =~ /^$mask/);