user2044638 user2044638 - 6 months ago 16
Perl Question

Correct syntax for multiline search and replace in perl so non-matching lines aren't printed

I'm trying to do a multiline search and replace but can't quite get it right that only what I need is outputted.

I want to extract the time from each line that directly follows a dashed line so this input:

--------------------
2016-05-13 10:00:00 abc
2016-05-13 10:00:01 def
2016-05-13 10:00:02 ghi
--------------------
2016-05-13 10:00:03 jkl
2016-05-13 10:00:04 mno


should produce output like this:

10:00:00
10:00:03


This command does seem to correctly replace in every match, however it also prints the rest of the line and every line that doesn't match.

perl -0ne 'print if s/-{20}\n\d{4}-\d\d-\d\d (\d\d:\d\d:\d\d)/$1/g'


Adding
.*
at the end of the regex doesn't help much as it only removes the rest of the line after the match and adding
/s
makes the command output only the very first replaced match.

How to get only the needed output?

EDIT:

Sobrique's answer utilizes the dashed line (or part of it) as a record separator but I'm also interested in how would I obtain the required data if the dashed line was after the needed output.

Let's say I wanted
10:00:02
from the above input, i.e. the equivalent of matching the backreference in the regex
^\d{4}-\d\d-\d\d (\d\d:\d\d:\d\d).*\n-{20}
(the caret not too important there, I believe). I could just use
tac
before and after executing Sobrique's solution but would like to see how to achieve this without doing it.

Answer

Ok, so what you need to know is this:

-0 sets the record separator. You probably don't want to do this.

-n tells perl to iterate STDIN (or files specified) in a way quite similar to how grep/sed/awk would.

And -e specifies code to run.

What's happening in your code though, means that if that pattern match tests, perl is printing the 'whole block' - which is the whole file.

I would suggest instead what you want is:

#!/usr/bin/env perl
use strict;
use warnings; 

local $/ = '--'; 
while ( <DATA> ) {
   print $1,"\n" if m/ (\d\d:\d\d:\d\d)/;
}

__DATA__
--------------------
2016-05-13 10:00:00 abc
2016-05-13 10:00:01 def
2016-05-13 10:00:02 ghi
--------------------
2016-05-13 10:00:03 jkl
2016-05-13 10:00:04 mno

Or as a one liner:

perl -ne 'BEGIN { $/ = "--" } print $1,"\n" if m/ (\d\d:\d\d:\d\d)/'

What this does instead, is iterate a 'chunk' at a time, based on '--' being a record separator, and then just grabs the first instance of a 'time-like' format within each chunk.

To answer your follow on question - if you wanted to catch the last time in the block, then I'd probably do it like this:

#!/usr/bin/env perl
use strict;
use warnings; 

local $/ = '--'; 
while ( <DATA> ) {
   my @matches = m/ (\d\d:\d\d:\d\d)/g ;
   print $matches[-1],"\n" if @matches;
}

__DATA__
--------------------
2016-05-13 10:00:00 abc
2016-05-13 10:00:01 def
2016-05-13 10:00:02 ghi
--------------------
2016-05-13 10:00:03 jkl
2016-05-13 10:00:04 mno

Capture all the regex matches on (time like) strings, into a list, then print the last element.

Matches contains:

$VAR1 = [
          '10:00:00',
          '10:00:01',
          '10:00:02'
        ];

And

$VAR1 = [
          '10:00:03',
          '10:00:04'
        ];