stevieb stevieb - 6 months ago 14
Perl Question

Perl6: Capturing Windows newline in a string with regex

Disclaimer: I've cross-posted this over at PerlMonks.

In Perl5, I can quickly and easily print out the hex representation of the

\r\n
Windows-style line ending:

perl -nE '/([\r\n]{1,2})/; print(unpack("H*",$1))' in.txt
0d0a


To create a Windows-ending file on Unix if you want to test, create a
in.txt
file with a single line and line ending. Then:
perl -ni -e 's/\n/\r\n/g;print' in.txt
. (or in vi/vim, create the file and just do
:set ff=dos
).

I have tried many things in Perl6 to do the same thing, but I can't get it to work no matter what I do. Here's my most recent test:

use v6;
use experimental :pack;

my $fn = 'in.txt';

my $fh = open $fn, chomp => False; # I've also tried :bin
for $fh.lines -> $line {
if $line ~~ /(<[\r\n]>**1..2)/ {
$0.Str.encode('UTF-8').unpack("H*").say;
}
}


Outputs
0a
, as do:

/(\n)/
/(\v)/


First, I don't even know if I'm using
unpack()
or the regex properly. Second, how do I capture both elements (
\r\n
) of the newline in P6?

Answer

Ok, so what my goal was (I'm sorry I didn't make that clear when I posted the question) was I want to read a file, capture the line endings, and write the file back out using the original line endings (and not the endings for the current platform).

I got a proof of concept working now. I'm very new to Perl 6, so the code probably isn't very p6-ish, but it does do what I needed it to.

Code tested on FreeBSD:

    use v6;
    use experimental :pack;

    my $fn = 'in.txt';
    my $outfile = 'out.txt';

    # write something with a windows line ending to a new file

    my $fh = open $fn, :w;
    $fh.print("ab\r\ndef\r\n");
    $fh.close;

    # re-open the file 

    $fh = open $fn, :bin;

    my $eol_found = False;
    my Str $recsep = '';

    # read one byte at a time, or else we'd have to slurp the whole
    # file, as I can't find a way to differentiate EOL from EOF

    while $fh.read(1) -> $buf {
        my $hex = $buf.unpack("H*");
        if $hex ~~ /(0d|0a)/ {
            $eol_found = True;
            $recsep = $recsep ~ $hex;
            next;
        }
        if $eol_found {
            if $hex !~~ /(0d|0a)/ {
                last;
            }
        }
    }

    $fh.close;

    my %recseps = (
        '0d0a' => "\r\n",
        '0d'   => "\r",
        '0a'   => "\n",
    );

    my $nl = %recseps<<$recsep>>;

    # write a new file with the saved record separator

    $fh = open $outfile, :w;
    $fh.print('a' ~ $nl);
    $fh.close;

    # re-read file to see if our newline stuck

    $fh = open $outfile, :bin;

    my $buf = $fh.read(1000);
    say $buf;

Output:

Buf[uint8]:0x<61 0d 0a>