Otterbein Otterbein - 5 months ago 22
Perl Question

Regex match on whole .csv file

Good Morning!

I'm struggling with a problem, I never thought it would be one. I have a .csv file with seperator

;
which I like to check with respect of its syntax. It looks like this:

wellenname;tag
Welle A;01/02/2016
Welle B3;14/11/2016
server;welle
server5name032;Welle B3 Rand
server3name01;Welle A
server2name;Welle B3


So I have a beautiful formatted .csv file which I can process with a regex. Therefore I constructed four cases for the regex:


  1. wellenname;tag\n

  2. (Welle [A-Z]+[0-9]*;\d{2}/\d{2}/\d{4}\n)+

  3. server;welle\n

  4. (\S*;Welle [a-z,A-Z,0-9]+( Rand)(\n))+



This worked quiet beautiful in a Tool named The Regex Coach which basically tries to match a regexp and a string and outputs where it struggles.

Then I put it together in Perl. Reading the file and checking the syntax:

use strict;
use warnings;
use Data::Dumper;
my $filename = 'theFile.csv';

my $content;

open(my $fh, '<', $filename) or die "Could not open file $filename $!";
$content = join('',<$fh>);

if ($content =~ /wellenname;tag\n(Welle [A-Z]+[0-9]*;\d{2}\/\d{2}\/\d{4}\n)+server;welle\n(\S*;Welle [a-z,A-Z,0-9]+( Rand)*(\n)*)+/) {
print "Syntax seems to be valid!";

}else{
print "You have syntax errors!";
}


I went through the file line by line, but even if I insert only one entry per section it fails. (or better: It jumps to the else and prints the string)

Did I forgot sth. or is there a mayor mistake in my thinking?
I would be very pleased if somebody could give me a hint!

Answer

Well, this actually works:

use strict;
use warnings;
use Data::Dumper;

my $content = join('',<DATA>);

if ($content =~ /wellenname;tag\n(Welle [A-Z]+[0-9]*;\d{2}\/\d{2}\/\d{4}\n)+server;welle\n(\S*;Welle [a-z,A-Z,0-9]+( Rand)*(\n)*)+/) {
  print "Syntax seems to be valid!";
} else {
  print "You have syntax errors!";
}

__DATA__
wellenname;tag
Welle A;01/02/2016
Welle B3;14/11/2016
server;welle
server5name032;Welle B3 Rand
server3name01;Welle A
server2name;Welle B3

There is a chance you have a problem with your file.

But your regex still is not perfect.

First of all, you miss ^ and $ at the beginning and the end of the regex respectively. Now it can match CSV with the extra symbols before and after your actual CSV. Second, you don't need commas inside [], so [a-z,A-Z,0-9] should be [a-zA-Z0-9]. Also, you should consider using /x switch (to make your regex more readable) and use m{...} instead of /.../ to not escape / inside of the regex.

So, my final version is:

use strict;
use warnings;

my $content = join('',<DATA>);

if (
    $content =~ m{
        ^
        wellenname;tag\n
        (Welle[ ][A-Z]+[0-9]*;\d{2}\/\d{2}\/\d{4}\n)+
        server;welle\n
        (\S*;Welle[ ][a-z,A-Z,0-9]+([ ]Rand)*(\n)*)+
        $
    }x
) {
  print 'Syntax seems to be valid!';
}
else {
  print 'You have syntax errors!';
}

__DATA__
wellenname;tag
Welle A;01/02/2016
Welle B3;14/11/2016
server;welle
server5name032;Welle B3 Rand
server3name01;Welle A
server2name;Welle B3

Mind that is use [ ] to match space while using /x. I also use '...' instead of "...", because we don't need interpolation enabled in that strings.