如何 preg_match_all 获取标签“<h3>”和“<h3> <a/> </h3>”中的文本

Question

您好，我目前正在为我的 wordpress 网站创建一个自动 table 内容。我的参考来自 https://webdeasy.de/en/wordpress-table-of-contents-without-plugin/

问题： 一切顺利，除非在 <h3> 标签中有一个 <a> 标签 link。它使 $names 结果丢失。

我发现这个正则表达式部分有问题

preg_match_all("/<h[3,4](?:\sid=\"(.*)\")?(?:.*)?>(.*)<\/h[3,4]>/", $content, $matches);

// get text under <h3> or <h4> tag.
$names = $matches[2];

我试过修改正则表达式（我不太明白这个）

preg_match_all (/ <h [3,4] (?: \ sid = \ "(. *) \")? (?:. *)?> <a (. *)> (. *) <\ / a> <\ / h [3,4]> /", $content, $matches)

// get text under <a> tag.
$names = $matches[4];

上面的代码用于查找 <h3> <a> a text </a> <h3> 标签中的文本，但是不包含 <a> 标签的 h3 标签是个问题。

我的问题： 如何结合上面的代码？我的期望是，如果第一个代码结果没有出现，那么它会作为结果执行第二个代码。

或者有更好的解决办法？谢谢。

Answer 1

这是一种删除 header 标签内的所有标签的方法

$html = <<<EOT
<h3>Here's an <a href="thing.php">alternative solution</a></h3> to using regex. <h3>It may <a name='#thing'>not</a></h3> be the most elegant solution, but it works
EOT;

preg_match_all('#<h(.*?)>(.*?)<\/h(.*?)>#si', $html, $matches);
foreach ($matches[0] as $num=>$blah) {
   $look_for = preg_quote($matches[0][$num],"/");
   $tag = str_replace("<","",explode(">",$matches[0][$num])[0]);
   $replace_with = "<$tag>" . strip_tags($matches[2][$num]) . "</$tag>";
   $html = preg_replace("/$look_for/", $replace_with,$html,1);
}

echo "<pre>$html</pre>";

Answer 2

@kinglish 的回答是这个解决方案的基础，非常感谢。我根据我的问题文章link稍微修改简化一下。此代码对我有用：

preg_match_all('#(\<h[3-4])\sid=\"(.*?)\"?\>(.*?)(<\/h[3-4]>)#si',$content, $matches);    
    $tags = $matches[0];
    $ids = $matches[2];
    $raw_names = $matches[3];
    /* Clean $rawnames from other html tags */
    $clean_names= array_map(function($v){
        return trim(strip_tags($v));
    }, $raw_names);
    $names = $clean_names;

如何 preg_match_all 获取标签“<h3>”和“<h3> <a/> </h3>”中的文本

How to preg_match_all to get the text inside the tags "<h3>" and "<h3> <a/> </h3>"

html

php

regex

wordpress

preg-match-all