用于替换嵌套结构中匹配括号的正则表达式

RegExp to replace matching parenthesis in nested structure

如果第一个左括号跟在关键字 array 之后,如何替换一组匹配的 opening/closing 括号?正则表达式可以帮助解决这类问题吗?

为了更具体,我想使用 JavaScript 或 PHP

来解决这个问题
// input
$data = array(
    'id' => nextId(),
    'profile' => array(
       'name' => 'Hugo Hurley',
       'numbers' => (4 + 8 + 15 + 16 + 23 + 42) / 108
    )
);

// desired output
$data = [
    'id' => nextId(),
    'profile' => [
       'name' => 'Hugo Hurley',
       'numbers' => (4 + 8 + 15 + 16 + 23 + 42) / 108
    ]
];

以下情况如何(使用 .NET 正则表达式引擎):

resultString = Regex.Replace(subjectString, 
    @"\barray\(            # Match 'array('
    (                      # Capture in group 1:
     (?>                   # Start a possessive group:
      (?:                  # Either match
       (?!\barray\(|[()])  # only if we're not before another array or parens
       .                   # any character
      )+                   # once or more
     |                     # or
      \( (?<Depth>)        # match '(' (and increase the nesting counter)
     |                     # or
      \) (?<-Depth>)       # match ')' (and decrease the nesting counter).
     )*                    # Repeat as needed.
     (?(Depth)(?!))        # Assert that the nesting counter is at zero.
    )                      # End of capturing group.
    \)                     # Then match ')'.", 
    "[]", RegexOptions.IgnorePatternWhitespace | RegexOptions.Singleline);

此正则表达式匹配 array(...),其中 ... 可能包含除另一个 array(...) 之外的任何内容(因此,它仅匹配嵌套最深的事件)。它确实允许在 ... 中使用其他嵌套(并正确平衡)的括号,但它不会检查这些括号是否是语义括号或者它们是否包含在字符串或注释中。

换句话说,类似于

array(
   'name' => 'Hugo ((( Hurley',
   'numbers' => (4 + 8 + 15 + 16 + 23 + 42) / 108
)

将无法匹配(正确)。

您需要迭代地应用该正则表达式,直到它不再修改其输入 - 对于您的示例,两次迭代就足够了。

Tim Pietzcker 给出了 Dot-Net 计数版本。
它与下面的 PCRE (php) 版本具有相同的元素。

所有注意事项都是一样的。特别是,非数组括号必须
平衡,因为它们使用相同的右括号作为定界符。

必须解析(或应该解析)所有文本。
外层组 1, 2, 3, 4 让你得到零件
内容
核心 1 array()
CORE-2 任意 ()
例外

每场比赛都会让你获得这些外部事物之一,并且是相互排斥的。

诀窍是定义一个 php 函数 parse( core) 来解析 CORE。
该函数内部是 while (regex.search( core ) { .. } 循环。

每次 CORE-1 或 2 组匹配时,调用 parse( core ) 函数传递
该核心组的内容。

在循环内部,只需取出内容并将其分配给散列。

显然,应该替换调用 (?&content) 的第 1 组结构
使用构造来获取像变量数据这样的散列。

在详细的尺度上,这可能非常乏味。
通常,您必须考虑到每个字符才能正确
解析整个事情。

(?is)(?:((?&content))|(?>\barray\s*\()((?=.)(?&core)|)\)|\(((?=.)(?&core)|)\)|(\barray\s*\(|[()]))(?(DEFINE)(?<core>(?>(?&content)|(?>\barray\s*\()(?:(?=.)(?&core)|)\)|\((?:(?=.)(?&core)|)\))+)(?<content>(?>(?!\barray\s*\(|[()]).)+))

展开

 # 1:  CONTENT
 # 2:  CORE-1
 # 3:  CORE-2
 # 4:  EXCEPTIONS

 (?is)

 (?:
      (                                  # (1), Take off   CONTENT
           (?&content) 
      )
   |                                   # OR -----------------------------
      (?>                                # Start 'array('
           \b array \s* \(
      )
      (                                  # (2), Take off   'array( CORE-1 )'
           (?= . )
           (?&core) 
        |  
      )
      \)                                 # End ')'
   |                                   # OR -----------------------------
      \(                                 # Start '('
      (                                  # (3), Take off   '( any CORE-2 )'
           (?= . )
           (?&core) 
        |  
      )
      \)                                 # End ')'
   |                                   # OR -----------------------------
      (                                  # (4), Take off   Unbalanced or Exceptions
           \b array \s* \(
        |  [()] 
      )
 )

 # Subroutines
 # ---------------

 (?(DEFINE)

      # core
      (?<core>
           (?>
                (?&content) 
             |  
                (?> \b array \s* \( )
                # recurse core of  array()
                (?:
                     (?= . )
                     (?&core) 
                  |  
                )
                \)
             |  
                \(
                # recurse core of any  ()
                (?:
                     (?= . )
                     (?&core) 
                  |  
                )
                \)
           )+
      )

      # content 
      (?<content>
           (?>
                (?!
                     \b array \s* \(
                  |  [()] 
                )
                . 
           )+
      )
 )

输出

 **  Grp 0           -  ( pos 0 , len 11 ) 
some_var =   
 **  Grp 1           -  ( pos 0 , len 11 ) 
some_var =   
 **  Grp 2           -  NULL 
 **  Grp 3           -  NULL 
 **  Grp 4 [core]    -  NULL 
 **  Grp 5 [content] -  NULL 

-----------------------

 **  Grp 0           -  ( pos 11 , len 153 ) 
array(
    'id' => nextId(),
    'profile' => array(
       'name' => 'Hugo Hurley',
       'numbers' => (4 + 8 + 15 + 16 + 23 + 42) / 108
    ) 
)  
 **  Grp 1           -  NULL 
 **  Grp 2           -  ( pos 17 , len 146 ) 

    'id' => nextId(),
    'profile' => array(
       'name' => 'Hugo Hurley',
       'numbers' => (4 + 8 + 15 + 16 + 23 + 42) / 108
    ) 

 **  Grp 3           -  NULL 
 **  Grp 4 [core]    -  NULL 
 **  Grp 5 [content] -  NULL 

-------------------------------------

 **  Grp 0           -  ( pos 164 , len 3 ) 
;

 **  Grp 1           -  ( pos 164 , len 3 ) 
;

 **  Grp 2           -  NULL 
 **  Grp 3           -  NULL 
 **  Grp 4 [core]    -  NULL 
 **  Grp 5 [content] -  NULL 

其他东西的前身,以了解用法

 # Perl code:
 # 
 #     use strict;
 #     use warnings;
 #     
 #     use Data::Dumper;
 #     
 #     $/ = undef;
 #     my $content = <DATA>;
 #     
 #     # Set the error mode on/off here ..
 #     my $BailOnError = 1;
 #     my $IsError = 0;
 #     
 #     my $href = {};
 #     
 #     ParseCore( $href, $content );
 #     
 #     #print Dumper($href);
 #     
 #     print "\n\n";
 #     print "\nBase======================\n";
 #     print $href->{content};
 #     print "\nFirst======================\n";
 #     print $href->{first}->{content};
 #     print "\nSecond======================\n";
 #     print $href->{first}->{second}->{content};
 #     print "\nThird======================\n";
 #     print $href->{first}->{second}->{third}->{content};
 #     print "\nFourth======================\n";
 #     print $href->{first}->{second}->{third}->{fourth}->{content};
 #     print "\nFifth======================\n";
 #     print $href->{first}->{second}->{third}->{fourth}->{fifth}->{content};
 #     print "\nSix======================\n";
 #     print $href->{six}->{content};
 #     print "\nSeven======================\n";
 #     print $href->{six}->{seven}->{content};
 #     print "\nEight======================\n";
 #     print $href->{six}->{seven}->{eight}->{content};
 #     
 #     exit;
 #     
 #     
 #     sub ParseCore
 #     {
 #         my ($aref, $core) = @_;
 #         my ($k, $v);
 #         while ( $core =~ /(?is)(?:((?&content))|(?><!--block:(.*?)-->)((?&core)|)<!--endblock-->|(<!--(?:block:.*?|endblock)-->))(?(DEFINE)(?<core>(?>(?&content)|(?><!--block:.*?-->)(?:(?&core)|)<!--endblock-->)+)(?<content>(?>(?!<!--(?:block:.*?|endblock)-->).)+))/g )
 #         {
 #            if (defined )
 #            {
 #              # CONTENT
 #                $aref->{content} .= ;
 #            }
 #            elsif (defined )
 #            {
 #              # CORE
 #                $k = ; $v = ;
 #                $aref->{$k} = {};
 #      #         $aref->{$k}->{content} = $v;
 #      #         $aref->{$k}->{match} = $&;
 #                
 #                my $curraref = $aref->{$k};
 #                my $ret = ParseCore($aref->{$k}, $v);
 #                if ( $BailOnError && $IsError ) {
 #                    last;
 #                }
 #                if (defined $ret) {
 #                    $curraref->{'#next'} = $ret;
 #                }
 #            }
 #            else
 #            {
 #              # ERRORS
 #                print "Unbalanced '' at position = ", $-[0];
 #                $IsError = 1;
 #     
 #                # Decide to continue here ..
 #                # If BailOnError is set, just unwind recursion. 
 #                # -------------------------------------------------
 #                if ( $BailOnError ) {
 #                   last;
 #                }
 #            }
 #         }
 #         return $k;
 #     }
 #     
 #     #================================================
 #     __DATA__
 #     some html content here top base
 #     <!--block:first-->
 #         <table border="1" style="color:red;">
 #         <tr class="lines">
 #             <td align="left" valign="<--valign-->">
 #         <b>bold</b><a href="http://www.mewsoft.com">mewsoft</a>
 #         <!--hello--> <--again--><!--world-->
 #         some html content here 1 top
 #         <!--block:second-->
 #             some html content here 2 top
 #             <!--block:third-->
 #                 some html content here 3 top
 #                 <!--block:fourth-->
 #                     some html content here 4 top
 #                     <!--block:fifth-->
 #                         some html content here 5a
 #                         some html content here 5b
 #                     <!--endblock-->
 #                 <!--endblock-->
 #                 some html content here 3a
 #                 some html content here 3b
 #             <!--endblock-->
 #             some html content here 2 bottom
 #         <!--endblock-->
 #         some html content here 1 bottom
 #     <!--endblock-->
 #     some html content here1-5 bottom base
 #     
 #     some html content here 6-8 top base
 #     <!--block:six-->
 #         some html content here 6 top
 #         <!--block:seven-->
 #             some html content here 7 top
 #             <!--block:eight-->
 #                 some html content here 8a
 #                 some html content here 8b
 #             <!--endblock-->
 #             some html content here 7 bottom
 #         <!--endblock-->
 #         some html content here 6 bottom
 #     <!--endblock-->
 #     some html content here 6-8 bottom base
 # 
 # Output >>
 # 
 #     Base======================
 #     some html content here top base
 #     
 #     some html content here1-5 bottom base
 #     
 #     some html content here 6-8 top base
 #     
 #     some html content here 6-8 bottom base
 #     
 #     First======================
 #     
 #         <table border="1" style="color:red;">
 #         <tr class="lines">
 #             <td align="left" valign="<--valign-->">
 #         <b>bold</b><a href="http://www.mewsoft.com">mewsoft</a>
 #         <!--hello--> <--again--><!--world-->
 #         some html content here 1 top
 #         
 #         some html content here 1 bottom
 #     
 #     Second======================
 #     
 #             some html content here 2 top
 #             
 #             some html content here 2 bottom
 #         
 #     Third======================
 #     
 #                 some html content here 3 top
 #                 
 #                 some html content here 3a
 #                 some html content here 3b
 #             
 #     Fourth======================
 #     
 #                     some html content here 4 top
 #                     
 #                 
 #     Fifth======================
 #     
 #                         some html content here 5a
 #                         some html content here 5b
 #                     
 #     Six======================
 #     
 #         some html content here 6 top
 #         
 #         some html content here 6 bottom
 #     
 #     Seven======================
 #     
 #             some html content here 7 top
 #             
 #             some html content here 7 bottom
 #         
 #     Eight======================
 #     
 #                 some html content here 8a
 #                 some html content here 8b
 #