Arthur Accioly Arthur Accioly - 28 days ago 9
Perl Question

Convert complex config file to id,parent,key,value format

I'm trying to convert a config file like the example below so then I can generate SQL commands to insert into my Oracle table. I'm trying to use Perl, but I'm open to try other languages, like Python.

Example of config:

# This is a comment
feed_realtime_processor_pool = ( 11, 12 ) ;
dropout_detection_time_start = "17:00";
# Sometimes the config can have sub-structures
named_clients = (
{
name = "thread1";
user_threads = (
{ name = "realtime1"; cpu = 11; } # more comments
{ name = "realtime2"; cpu = 12; } # more comments
);
}
);


(...)

Converting (the spaces are just for a better visualization):

id,parent, key, value
01,null, 'feed_realtime_processor_pool', '11'
02,null, 'feed_realtime_processor_pool', '12'
03,null, 'dropout_detection_time_start', '17:00'
04,null, 'named_clients', null
05,04, 'name', 'thread1'
06,04, 'user_threads', null
07,06, 'name', 'realtime1'
08,06, 'cpu', '11'
09,06, 'name', 'realtime2'
09,06, 'cpu', '12'


So, the questions are:


  1. Somebody knows some lib/module that can do this? I could find config parsers, but couldn't find a good one that could give me in a tree format.

  2. If not, somebody can give me some suggestion about how should I start this script, maybe using different modules to help me parsing the config? Cleaning and parsing the config is the hardest part for me.



UPDATE#:




  1. I forgot to mention, I have tons of configs to import into my database and this is not a one-time event. This script will be running every time that I have a new config generated by the customers of my company for a different installation.

  2. I chose perl because I thought that this could be faster than Python to execute.

  3. I'm very familiar with SQL, but I'm not very good at Perl

  4. The configs follow the same structure that I put in the example, the only variations are related to number of white spaces between the key/values pairs or the indentation of the curly braces, etc.

  5. I already started writing a perl script using DBI, but the main problem is to parse this config into a format that I can easily work with. I'm always finding situations where my regex expressions are breaking and I keep adjusting over and over. If I could just use a lib to parse the config automatically, it would be great. I'm going to try the ones that you guys mentioned.



Thanks!

Second Update:



Guys, I cross-posted this question on Perl Monks and got another feedback there that I'm testing as well. Will post the answer once I test everything. Thanks.

Answer

Thanks everybody for the tips, but I think that the best option is to follow the suggestion that I received from Choroba at the perlmonks site. I'm copying here his answer:

"If you can't find a module to parse your config format, write your own parser. Marpa::R2 can help you in the task:

#!/usr/bin/perl
use warnings;
use strict;
use feature qw{ say };

use Marpa::R2;

my $input = << '__INPUT__';
# This is a comment
feed_realtime_processor_pool = ( 11, 12 ) ;
dropout_detection_time_start = "17:00";
# Sometimes the config can have sub-structures
named_clients = (
{
  name = "thread1";
  user_threads = (
     { name = "realtime1"; cpu = 11; } # more comments
     { name = "realtime2"; cpu = 12; } # more comments
    );
}
);
__INPUT__

my $dsl = << '__DSL__';

lexeme default = latm => 1
:default ::= action => ::first

Config    ::= Elements
Elements  ::= Element+                              action => grep_def
+ined
Element   ::= (Comment)                             action => empty
            | Name (s eq s) Value                   action => [values]
            | Name (s eq s) Value (semicolon s)     action => [values]
Comment   ::= (hash nonnl nl)                       action => empty
Name      ::= alpha
Value     ::= List
            | String
            | Num
            | Struct
List      ::= (lpar) Nums (rpar s semicolon s)
Nums      ::= Num+  separator => comma              action => listify
Num       ::= (s) digits (s)
            | (s) digits
            | digits (s)
            | digits
String    ::= (qq) nqq (qq semicolon s)             action => quote
Struct    ::= (lpar s) InStructs (rpar semicolon s)
InStructs ::= InStruct+                             action => grep_def
+ined
InStruct  ::= (lcurl s) Elements (rcurl s)
            | (Comment s)                           action => empty
            | Element

s         ~ [\s]*
eq        ~ '='
hash      ~ '#'
nonnl     ~ [^\n]*
nl        ~ [\n]
alpha     ~ [a-z_]+
lpar      ~ '('
rpar      ~ ')'
lcurl     ~ '{'
rcurl     ~ '}'
semicolon ~ ';'
comma     ~ ','
digits    ~ [\d]+
qq        ~ '"'
nqq       ~ [^"]+

__DSL__

sub listify      { shift; [ @_ ] }
sub quote        { qq("$_[1]") }
sub empty        {}
sub grep_defined { shift; [ grep defined, @_ ] }


my $id = 1;
sub show {
    my ($parent, $name, $elems) = @_;
    if (ref $elems->[0]) {
        show($parent, $name, $_) for @$elems;
    } elsif (ref $elems->[1]) {
        if (ref $elems->[1][0]) {
            say join ', ', $id, $parent, $elems->[0], 'null';
            show($id++, $elems->[0], $elems->[1]);
        } else {
            for my $e (@{ $elems->[1] }) {
                say join ', ', $id++, $parent, $elems->[0], $e;
            }
        }
    } else {
        say join ', ', $id++, $parent, @$elems;
    }
}


my $grammar = 'Marpa::R2::Scanless::G'->new({ source => \$dsl });
show('null', q(), ${ $grammar->parse(\$input, 'main') });
[download]
Output:

1, null, feed_realtime_processor_pool, 11
2, null, feed_realtime_processor_pool, 12
3, null, dropout_detection_time_start, "17:00"
4, null, named_clients, null
5, 4, name, "thread1"
6, 4, user_threads, null
7, 6, name, "realtime1"
8, 6, cpu, 11
9, 6, name, "realtime2"
10, 6, cpu, 12

Please, note that to install this module, we need to install many dependencies:

sudo cpan IPC::Cmd
sudo cpan Module::Build
sudo cpan Time::Piece
sudo cpan Marpa::R2
Comments