PHP preg_match_all-以不同顺序从模式中提取内容

如何解决PHP preg_match_all-以不同顺序从模式中提取内容

我正在清理代码中的一些wordpress短代码，并且正在寻找一种解决方案，无论值的顺序如何，都可以提取正确的值。

示例：

[Links label="my_label" url="my_url" external="other_value"]

如果要提取my_label，my_url和other_value，我将使用以下结构：

preg_match_all('/\[Links label=\"(.*?)\" url=\"(.*?)\" external=\"(.*?)\"\]/',$content,$output_array);

问题是我有时会有这样的不同顺序：

[Links url="my_url" external="other_value" label="my_label"]

我以前的preg_match_all不适用于此功能。我试图将每个模式放在（...）之间或使用|但我没有得到预期的结果。我在这里看到了用于识别字符串的解决方案，但我不仅需要识别字符串，还需要提取值。

对于正则表达式专家而言，这可能是微不足道的。

谢谢

解决方法

如果属性也可以按任何顺序更改为不同的数量，并且应以[Links 开头，则可以使用\G锚点。密钥在捕获组1中，值在捕获组2中。

(?:\[Links|\G(?!^))(?=[^][]*])\h+([^\s=]+)="([^\s"]+)"

说明

(?:非捕获组
- \[Links匹配[Links
- |或
- \G(?!^)在上一场比赛的末尾而不是在开始时断言位置
)关闭非捕获组
(?=[^][]*])正向前进，在右边声明]
\h+匹配1个以上水平空格字符
(捕获第1组
- [^\s=]+匹配除=或空白字符之外的任何其他字符1倍以上
)关闭第1组
="字面上匹配
(捕获第2组
- [^\s"]+匹配除"或空白字符之外的任何其他字符1倍以上
)"关闭第2组并匹配"

Regex demo

示例

$re = '/(?:\[Links|\G(?!^))(?=[^][]*])\h+([^\s=]+)="([^\s"]+)"/m';
$str = '[Links label="my_label" url="my_url" external="other_value"]';

preg_match_all($re,$str,$matches,PREG_SET_ORDER,0);
print_r($matches);

输出

Array
(
    [0] => Array
        (
            [0] => [Links label="my_label"
            [1] => label
            [2] => my_label
        )

    [1] => Array
        (
            [0] =>  url="my_url"
            [1] => url
            [2] => my_url
        )

    [2] => Array
        (
            [0] =>  external="other_value"
            [1] => external
            [2] => other_value
        )

)

Php demo

您可能（可能）要做的是不列出要匹配的键，只是等号前后的任何内容。
这样，您就可以“解析”字符串，以后可以算出什么是什么。

$str = '[Links label="my_label" url="my_url" external="other_value"]';

preg_match("/\[links\s+(.*?)=\"(.*?)\"\s+(.*?)=\"(.*?)\"\s+(.*?)=\"(.*?)\"/i",$match);

unset($match[0]);
foreach(array_chunk($match,2) as $m){
    $res[$m[0]] = $m[1];
}

var_dump($res);

这给您：

array(3) {
  ["label"]=>
  string(8) "my_label"
  ["url"]=>
  string(6) "my_url"
  ["external"]=>
  string(11) "other_value"
}

https://3v4l.org/H1qGD

但这全部取决于您是否还有更多要解析的内容，那么也许这也将与其他内容匹配。

以上答案有效。但是，如果您只需要这些值而不是它们的对应键，那么也可以使用下面的代码。

$content = '[Links label="my_label" url="my_url" external="other_value"]';
$temp = explode("\"",$content);
$output = [];
for ($x = 0; $x < count($temp); $x++) {
    if($x % 2 != 0) { 
       array_push($output,$temp[$x]);
    }
}

$ output数组将包含所有值。

如果您想走完整的矫kill过正的路线，则可以重复使用Wordpress的正则表达式和处理程序。

例如：

<?php

$res = extract_specific_shortcode('links',$teststring = '[links label="Label" url="https://nisamerica.com/" external="yes" /] '."\n".
'[links label="Label2" url="https://google.com/" external="no"]content[/links]' );

print_r($res);

function extract_specific_shortcode( $tagname,$content ) { 

    $tagname_regex = preg_quote($tagname,'/');

    $wp_shortcode_atts = function( $text ) {
        $atts    = array();
        $pattern = '/([\w-]+)\s*=\s*"([^"]*)"(?:\s|$)|([\w-]+)\s*=\s*\'([^\']*)\'(?:\s|$)|([\w-]+)\s*=\s*([^\s\'"]+)(?:\s|$)|"([^"]*)"(?:\s|$)|\'([^\']*)\'(?:\s|$)|(\S+)(?:\s|$)/';
        $text    = preg_replace( "/[\x{00a0}\x{200b}]+/u",' ',$text );
        if ( preg_match_all( $pattern,$text,$match,PREG_SET_ORDER ) ) {
            foreach ( $match as $m ) {
                if ( ! empty( $m[1] ) ) {
                    $atts[ strtolower( $m[1] ) ] = stripcslashes( $m[2] );
                } elseif ( ! empty( $m[3] ) ) {
                    $atts[ strtolower( $m[3] ) ] = stripcslashes( $m[4] );
                } elseif ( ! empty( $m[5] ) ) {
                    $atts[ strtolower( $m[5] ) ] = stripcslashes( $m[6] );
                } elseif ( isset( $m[7] ) && strlen( $m[7] ) ) {
                    $atts[] = stripcslashes( $m[7] );
                } elseif ( isset( $m[8] ) && strlen( $m[8] ) ) {
                    $atts[] = stripcslashes( $m[8] );
                } elseif ( isset( $m[9] ) ) {
                    $atts[] = stripcslashes( $m[9] );
                }
            }
     
            // Reject any unclosed HTML elements.
            foreach ( $atts as &$value ) {
                if ( false !== strpos( $value,'<' ) ) {
                    if ( 1 !== preg_match( '/^[^<]*+(?:<[^>]*+>[^<]*+)*+$/',$value ) ) {
                        $value = '';
                    }
                }
            }
        } else {
            $atts = ltrim( $text );
        }
     
        return $atts;
    };

    // Taken from wordpress 
    $regex = '/\\['                             // Opening bracket.
        . '(\\[?)'                           // 1: Optional second opening bracket for escaping shortcodes: [[tag]].
        . "($tagname_regex)"                     // 2: Shortcode name.
        . '(?![\\w-])'                       // Not followed by word character or hyphen.
        . '('                                // 3: Unroll the loop: Inside the opening shortcode tag.
        .     '[^\\]\\/]*'                   // Not a closing bracket or forward slash.
        .     '(?:'
        .         '\\/(?!\\])'               // A forward slash not followed by a closing bracket.
        .         '[^\\]\\/]*'               // Not a closing bracket or forward slash.
        .     ')*?'
        . ')'
        . '(?:'
        .     '(\\/)'                        // 4: Self closing tag...
        .     '\\]'                          // ...and closing bracket.
        . '|'
        .     '\\]'                          // Closing bracket.
        .     '(?:'
        .         '('                        // 5: Unroll the loop: Optionally,anything between the opening and closing shortcode tags.
        .             '[^\\[]*+'             // Not an opening bracket.
        .             '(?:'
        .                 '\\[(?!\\/\\2\\])' // An opening bracket not followed by the closing shortcode tag.
        .                 '[^\\[]*+'         // Not an opening bracket.
        .             ')*+'
        .         ')'
        .         '\\[\\/\\2\\]'             // Closing shortcode tag.
        .     ')?'
        . ')'
        . '(\\]?)/i';                          // 6: Optional second closing brocket for escaping shortcodes: [[tag]].
    // phpcs:enable


    preg_match_all($regex,$content,PREG_SET_ORDER);
    $set = [];
    foreach($matches as $match) {
        $set[] = [
            'fullmatch' => $match[0],'attributes' => $wp_shortcode_atts($match[3]),];
    }
    return $set;
}

产生以下输出：

Array
(
    [0] => Array
        (
            [fullmatch] => [links label="Label" url="https://nisamerica.com/" external="yes" /]
            [attributes] => Array
                (
                    [label] => Label
                    [url] => https://nisamerica.com/
                    [external] => yes
                )

        )

    [1] => Array
        (
            [fullmatch] => [links label="Label2" url="https://google.com/" external="no"]content[/links]
            [attributes] => Array
                (
                    [label] => Label2
                    [url] => https://google.com/
                    [external] => no
                )

        )

)

上面的代码是从以下函数派生的：

就像发布的其他解决方案一样，WordPress分两部分进行属性映射：收集键和值，然后将它们组合在一起。他们的正则表达式更加激烈，因为它处理的边缘案例比这里介绍的要多。

您也可以这样尝试：

preg_match_all('/(\b[^"=]+)="([^"]+)"/',$output_array);

$result = array_combine($output_array[1],$output_array[2]);

PHP preg_match_all-以不同顺序从模式中提取内容

如何解决PHP preg_match_all-以不同顺序从模式中提取内容

解决方法

相关推荐