rodrigo-silveira rodrigo-silveira - 6 months ago 11
Javascript Question

RegExp to replace matching parenthesis in nested structure

How can I replace a set of matching opening/closing parentheses if the first opening parenthesis follows the keyword

array
? Can regular expressions help with this type of problem?

In order to be more specific, I'd like to solve this using either JavaScript or PHP

// input
$data = array(
'id' => nextId(),
'profile' => array(
'name' => 'Hugo Hurley',
'numbers' => (4 + 8 + 15 + 16 + 23 + 42) / 108
)
);

// desired output
$data = [
'id' => nextId(),
'profile' => [
'name' => 'Hugo Hurley',
'numbers' => (4 + 8 + 15 + 16 + 23 + 42) / 108
]
];

sln sln
Answer

Tim Pietzcker gave the Dot-Net counting version.
It has the same elements as the PCRE (php) version below.

All the caveats are the same. In particular, non-apply parenthesis must
be balanced.
The reason is because closing parenthesis for non-apply methods are
shared with the apply methods.

All text must be parsed (or should be).
The outer groups 1, 2, 3 allow you to get the parts CONTENT/CORE/EXCEPTIONS.
in a search while loop.
Each match gets you one of these things and are mutually exclusive.

The trick is to define a php function parse( core) that parses the CORE.
Inside that function is the while (regex.search( core ) { .. } loop.

Each time the CORE group matched, call the parse( core ) function passing
the contents of the CORE group to it.

And inside the loop, just take off content and assign it to the hash.

Obviously, the group 1 construct which calls (?&content) should be replaced
with constructs to obtain your hash like variable data.

On a detailed scale, this can be very tedious.
Usually, you'd have to account for every single character to correctly
parse the entire thing.

(?is)(?:((?&content))|(?>\barray\s*\()((?=.)(?&core)|)\)|(\barray\s*\(|[()]))(?(DEFINE)(?<core>(?>(?&content)|(?>\barray\s*\()(?:(?=.)(?&core)|)\)|\((?:(?=.)(?&core)|)\))+)(?<content>(?>(?:(?!\barray\s*\(|[()])).)+))

Expanded

 # CONTENT
 # CORE
 # EXCEPTIONS

 (?is)

 (?:
      (                                  # (1), Take off CONTENT
           (?&content) 
      )
   |                                   # OR
      (?>                                # Start-Delimiter 'array('
           \b array \s* \(
      )
      (                                  # (2), Take off The CORE
           (?= . )
           (?&core) 
        |  
      )
      \)                                 # End-Delimiter
   |                                   # OR
      (                                  # (3), Take off Unbalanced or Exceptions
           \b array \s* \(
        |  [()] 
      )
 )

 # Subroutines
 # ---------------

 (?(DEFINE)

      # core
      (?<core>
           (?>
                (?&content) 
             |  
                (?> \b array \s* \( )
                # recurse core of array
                (?:
                     (?= . )
                     (?&core) 
                  |  
                )
                \)
             |  
                \(
                # recurse core of non array ()
                (?:
                     (?= . )
                     (?&core) 
                  |  
                )
                \)
           )+
      )

      # content 
      (?<content>
           (?>
                (?:
                     (?!
                          \b array \s* \(
                       |  [()] 
                     )
                )
                . 
           )+
      )
 )

Output

 **  Grp 0           -  ( pos 0 , len 11 ) 
some_var =   
 **  Grp 1           -  ( pos 0 , len 11 ) 
some_var =   
 **  Grp 2           -  NULL 
 **  Grp 3           -  NULL 
 **  Grp 4 [core]    -  NULL 
 **  Grp 5 [content] -  NULL 

-----------------------

 **  Grp 0           -  ( pos 11 , len 153 ) 
array(
    'id' => nextId(),
    'profile' => array(
       'name' => 'Hugo Hurley',
       'numbers' => (4 + 8 + 15 + 16 + 23 + 42) / 108
    ) 
)  
 **  Grp 1           -  NULL 
 **  Grp 2           -  ( pos 17 , len 146 ) 

    'id' => nextId(),
    'profile' => array(
       'name' => 'Hugo Hurley',
       'numbers' => (4 + 8 + 15 + 16 + 23 + 42) / 108
    ) 

 **  Grp 3           -  NULL 
 **  Grp 4 [core]    -  NULL 
 **  Grp 5 [content] -  NULL 

-------------------------------------

 **  Grp 0           -  ( pos 164 , len 3 ) 
;

 **  Grp 1           -  ( pos 164 , len 3 ) 
;

 **  Grp 2           -  NULL 
 **  Grp 3           -  NULL 
 **  Grp 4 [core]    -  NULL 
 **  Grp 5 [content] -  NULL 

A previous incarnation of something else, to get an idea of usage

 # Perl code:
 # 
 #     use strict;
 #     use warnings;
 #     
 #     use Data::Dumper;
 #     
 #     $/ = undef;
 #     my $content = <DATA>;
 #     
 #     # Set the error mode on/off here ..
 #     my $BailOnError = 1;
 #     my $IsError = 0;
 #     
 #     my $href = {};
 #     
 #     ParseCore( $href, $content );
 #     
 #     #print Dumper($href);
 #     
 #     print "\n\n";
 #     print "\nBase======================\n";
 #     print $href->{content};
 #     print "\nFirst======================\n";
 #     print $href->{first}->{content};
 #     print "\nSecond======================\n";
 #     print $href->{first}->{second}->{content};
 #     print "\nThird======================\n";
 #     print $href->{first}->{second}->{third}->{content};
 #     print "\nFourth======================\n";
 #     print $href->{first}->{second}->{third}->{fourth}->{content};
 #     print "\nFifth======================\n";
 #     print $href->{first}->{second}->{third}->{fourth}->{fifth}->{content};
 #     print "\nSix======================\n";
 #     print $href->{six}->{content};
 #     print "\nSeven======================\n";
 #     print $href->{six}->{seven}->{content};
 #     print "\nEight======================\n";
 #     print $href->{six}->{seven}->{eight}->{content};
 #     
 #     exit;
 #     
 #     
 #     sub ParseCore
 #     {
 #         my ($aref, $core) = @_;
 #         my ($k, $v);
 #         while ( $core =~ /(?is)(?:((?&content))|(?><!--block:(.*?)-->)((?&core)|)<!--endblock-->|(<!--(?:block:.*?|endblock)-->))(?(DEFINE)(?<core>(?>(?&content)|(?><!--block:.*?-->)(?:(?&core)|)<!--endblock-->)+)(?<content>(?>(?!<!--(?:block:.*?|endblock)-->).)+))/g )
 #         {
 #            if (defined $1)
 #            {
 #              # CONTENT
 #                $aref->{content} .= $1;
 #            }
 #            elsif (defined $2)
 #            {
 #              # CORE
 #                $k = $2; $v = $3;
 #                $aref->{$k} = {};
 #      #         $aref->{$k}->{content} = $v;
 #      #         $aref->{$k}->{match} = $&;
 #                
 #                my $curraref = $aref->{$k};
 #                my $ret = ParseCore($aref->{$k}, $v);
 #                if ( $BailOnError && $IsError ) {
 #                    last;
 #                }
 #                if (defined $ret) {
 #                    $curraref->{'#next'} = $ret;
 #                }
 #            }
 #            else
 #            {
 #              # ERRORS
 #                print "Unbalanced '$4' at position = ", $-[0];
 #                $IsError = 1;
 #     
 #                # Decide to continue here ..
 #                # If BailOnError is set, just unwind recursion. 
 #                # -------------------------------------------------
 #                if ( $BailOnError ) {
 #                   last;
 #                }
 #            }
 #         }
 #         return $k;
 #     }
 #     
 #     #================================================
 #     __DATA__
 #     some html content here top base
 #     <!--block:first-->
 #         <table border="1" style="color:red;">
 #         <tr class="lines">
 #             <td align="left" valign="<--valign-->">
 #         <b>bold</b><a href="http://www.mewsoft.com">mewsoft</a>
 #         <!--hello--> <--again--><!--world-->
 #         some html content here 1 top
 #         <!--block:second-->
 #             some html content here 2 top
 #             <!--block:third-->
 #                 some html content here 3 top
 #                 <!--block:fourth-->
 #                     some html content here 4 top
 #                     <!--block:fifth-->
 #                         some html content here 5a
 #                         some html content here 5b
 #                     <!--endblock-->
 #                 <!--endblock-->
 #                 some html content here 3a
 #                 some html content here 3b
 #             <!--endblock-->
 #             some html content here 2 bottom
 #         <!--endblock-->
 #         some html content here 1 bottom
 #     <!--endblock-->
 #     some html content here1-5 bottom base
 #     
 #     some html content here 6-8 top base
 #     <!--block:six-->
 #         some html content here 6 top
 #         <!--block:seven-->
 #             some html content here 7 top
 #             <!--block:eight-->
 #                 some html content here 8a
 #                 some html content here 8b
 #             <!--endblock-->
 #             some html content here 7 bottom
 #         <!--endblock-->
 #         some html content here 6 bottom
 #     <!--endblock-->
 #     some html content here 6-8 bottom base
 # 
 # Output >>
 # 
 #     Base======================
 #     some html content here top base
 #     
 #     some html content here1-5 bottom base
 #     
 #     some html content here 6-8 top base
 #     
 #     some html content here 6-8 bottom base
 #     
 #     First======================
 #     
 #         <table border="1" style="color:red;">
 #         <tr class="lines">
 #             <td align="left" valign="<--valign-->">
 #         <b>bold</b><a href="http://www.mewsoft.com">mewsoft</a>
 #         <!--hello--> <--again--><!--world-->
 #         some html content here 1 top
 #         
 #         some html content here 1 bottom
 #     
 #     Second======================
 #     
 #             some html content here 2 top
 #             
 #             some html content here 2 bottom
 #         
 #     Third======================
 #     
 #                 some html content here 3 top
 #                 
 #                 some html content here 3a
 #                 some html content here 3b
 #             
 #     Fourth======================
 #     
 #                     some html content here 4 top
 #                     
 #                 
 #     Fifth======================
 #     
 #                         some html content here 5a
 #                         some html content here 5b
 #                     
 #     Six======================
 #     
 #         some html content here 6 top
 #         
 #         some html content here 6 bottom
 #     
 #     Seven======================
 #     
 #             some html content here 7 top
 #             
 #             some html content here 7 bottom
 #         
 #     Eight======================
 #     
 #                 some html content here 8a
 #                 some html content here 8b
 #         
Comments