PHP preg_match_all - 以不同顺序从模式中提取内容

Question

我正在清理代码中的一些 wordpress 短代码，我正在寻找一种无论值的顺序如何都能提取正确值的解决方案。

示例：

[Links label="my_label" url="my_url" external="other_value"]

如果我想提取 my_label、my_url 和 other_value，我将使用以下结构：

preg_match_all('/\[Links label=\"(.*?)\" url=\"(.*?)\" external=\"(.*?)\"\]/', $content, $output_array);

问题是我有时会有这样的不同顺序：

[Links url="my_url" external="other_value" label="my_label"]

我之前的 preg_match_all 不适用于此。我试图将每个模式放在 (...) 之间或使用 |但我没有得到预期的结果。我在这里看到了识别字符串的解决方案，但我需要的不仅仅是识别字符串，我还需要提取值。

对于正则表达式专家来说，这可能是微不足道的事情。

谢谢

Answer 1

您可以（也许）做的是不列出要匹配的键，只列出等号前后的任何内容。
这样你就可以“解析”字符串，然后可以计算出什么是什么。

$str = '[Links label="my_label" url="my_url" external="other_value"]';

preg_match("/\[links\s+(.*?)=\"(.*?)\"\s+(.*?)=\"(.*?)\"\s+(.*?)=\"(.*?)\"/i", $str, $match);

unset($match[0]);
foreach(array_chunk($match,2) as $m){
    $res[$m[0]] = $m[1];
}

var_dump($res);

这给你：

array(3) {
  ["label"]=>
  string(8) "my_label"
  ["url"]=>
  string(6) "my_url"
  ["external"]=>
  string(11) "other_value"
}

https://3v4l.org/H1qGD

但这完全取决于你是否有更多的东西要解析，那么也许这也会匹配其他东西。

Answer 2

以上答案有效。但是如果你只需要值而不需要它们对应的键，那么你也可以使用下面的代码。

$content = '[Links label="my_label" url="my_url" external="other_value"]';
$temp = explode("\"",$content);
$output = [];
for ($x = 0; $x < count($temp); $x++) {
    if($x % 2 != 0) { 
       array_push($output,$temp[$x]);
    }
}

$output 数组将包含所有值。

Answer 3

如果属性也可以是任何顺序的不同数量并且应该以 [Links 开头，您可以使用 \G 锚点。键在捕获组 1 中，值在捕获组 2 中。

(?:\[Links|\G(?!^))(?=[^][]*])\h+([^\s=]+)="([^\s"]+)"

说明

(?:非捕获组
- \[Links 匹配 [Links
- | 或
- \G(?!^) 断言上一场比赛结束时的位置，而不是开始
)关闭非捕获组
(?=[^][]*]) 正面前瞻，在右边断言一个]
\h+ 匹配 1+ 个水平空白字符
( 捕获 组 1
- [^\s=]+ 匹配任何字符 1+ 次，除了 = 或空白字符
) 关闭组 1
="字面匹配
( 捕获 第 2 组
- [^\s"]+ 匹配任何字符 1+ 次，除了 " 或空白字符
)" 关闭第 2 组并匹配 "

Regex demo

例子

$re = '/(?:\[Links|\G(?!^))(?=[^][]*])\h+([^\s=]+)="([^\s"]+)"/m';
$str = '[Links label="my_label" url="my_url" external="other_value"]';

preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
print_r($matches);

输出

Array
(
    [0] => Array
        (
            [0] => [Links label="my_label"
            [1] => label
            [2] => my_label
        )

    [1] => Array
        (
            [0] =>  url="my_url"
            [1] => url
            [2] => my_url
        )

    [2] => Array
        (
            [0] =>  external="other_value"
            [1] => external
            [2] => other_value
        )

)

Php demo

Answer 4

如果你想完全矫枉过正，你可以重用 Wordpress 的正则表达式和处理。

例如：

<?php

$res = extract_specific_shortcode('links', $teststring = '[links label="Label" url="https://nisamerica.com/" external="yes" /] '."\n".
'[links label="Label2" url="https://google.com/" external="no"]content[/links]' );

print_r($res);

function extract_specific_shortcode( $tagname, $content ) { 

    $tagname_regex = preg_quote($tagname, '/');

    $wp_shortcode_atts = function( $text ) {
        $atts    = array();
        $pattern = '/([\w-]+)\s*=\s*"([^"]*)"(?:\s|$)|([\w-]+)\s*=\s*\'([^\']*)\'(?:\s|$)|([\w-]+)\s*=\s*([^\s\'"]+)(?:\s|$)|"([^"]*)"(?:\s|$)|\'([^\']*)\'(?:\s|$)|(\S+)(?:\s|$)/';
        $text    = preg_replace( "/[\x{00a0}\x{200b}]+/u", ' ', $text );
        if ( preg_match_all( $pattern, $text, $match, PREG_SET_ORDER ) ) {
            foreach ( $match as $m ) {
                if ( ! empty( $m[1] ) ) {
                    $atts[ strtolower( $m[1] ) ] = stripcslashes( $m[2] );
                } elseif ( ! empty( $m[3] ) ) {
                    $atts[ strtolower( $m[3] ) ] = stripcslashes( $m[4] );
                } elseif ( ! empty( $m[5] ) ) {
                    $atts[ strtolower( $m[5] ) ] = stripcslashes( $m[6] );
                } elseif ( isset( $m[7] ) && strlen( $m[7] ) ) {
                    $atts[] = stripcslashes( $m[7] );
                } elseif ( isset( $m[8] ) && strlen( $m[8] ) ) {
                    $atts[] = stripcslashes( $m[8] );
                } elseif ( isset( $m[9] ) ) {
                    $atts[] = stripcslashes( $m[9] );
                }
            }
     
            // Reject any unclosed HTML elements.
            foreach ( $atts as &$value ) {
                if ( false !== strpos( $value, '<' ) ) {
                    if ( 1 !== preg_match( '/^[^<]*+(?:<[^>]*+>[^<]*+)*+$/', $value ) ) {
                        $value = '';
                    }
                }
            }
        } else {
            $atts = ltrim( $text );
        }
     
        return $atts;
    };

    // Taken from wordpress 
    $regex = '/\['                             // Opening bracket.
        . '(\[?)'                           // 1: Optional second opening bracket for escaping shortcodes: [[tag]].
        . "($tagname_regex)"                     // 2: Shortcode name.
        . '(?![\w-])'                       // Not followed by word character or hyphen.
        . '('                                // 3: Unroll the loop: Inside the opening shortcode tag.
        .     '[^\]\/]*'                   // Not a closing bracket or forward slash.
        .     '(?:'
        .         '\/(?!\])'               // A forward slash not followed by a closing bracket.
        .         '[^\]\/]*'               // Not a closing bracket or forward slash.
        .     ')*?'
        . ')'
        . '(?:'
        .     '(\/)'                        // 4: Self closing tag...
        .     '\]'                          // ...and closing bracket.
        . '|'
        .     '\]'                          // Closing bracket.
        .     '(?:'
        .         '('                        // 5: Unroll the loop: Optionally, anything between the opening and closing shortcode tags.
        .             '[^\[]*+'             // Not an opening bracket.
        .             '(?:'
        .                 '\[(?!\/\2\])' // An opening bracket not followed by the closing shortcode tag.
        .                 '[^\[]*+'         // Not an opening bracket.
        .             ')*+'
        .         ')'
        .         '\[\/\2\]'             // Closing shortcode tag.
        .     ')?'
        . ')'
        . '(\]?)/i';                          // 6: Optional second closing brocket for escaping shortcodes: [[tag]].
    // phpcs:enable


    preg_match_all($regex, $content, $matches, PREG_SET_ORDER);
    $set = [];
    foreach($matches as $match) {
        $set[] = [
            'fullmatch' => $match[0],
            'attributes' => $wp_shortcode_atts($match[3]),
        ];
    }
    return $set;
}

产生以下输出：

Array
(
    [0] => Array
        (
            [fullmatch] => [links label="Label" url="https://nisamerica.com/" external="yes" /]
            [attributes] => Array
                (
                    [label] => Label
                    [url] => https://nisamerica.com/
                    [external] => yes
                )

        )

    [1] => Array
        (
            [fullmatch] => [links label="Label2" url="https://google.com/" external="no"]content[/links]
            [attributes] => Array
                (
                    [label] => Label2
                    [url] => https://google.com/
                    [external] => no
                )

        )

)

以上代码派生自以下函数：

与发布的其他解决方案一样，WordPress 将其属性映射分为两部分：收集键和值，然后将它们组合在一起。他们的正则表达式更复杂一些，因为它处理的边缘情况当然比这里介绍的要多。

Answer 5

您也可以这样尝试：

preg_match_all('/(\b[^"=]+)="([^"]+)"/', $content, $output_array);

$result = array_combine($output_array[1], $output_array[2]);

PHP preg_match_all - 以不同顺序从模式中提取内容

PHP preg_match_all - extract content from pattern in different order

php

regex

preg-match-all