PHP Preg_split 选择 HTML 标签的内部内容

PHP Preg_split selecting the internal contents of an HTML tag

我有一个字符串,其中包含各种 HTML 标签中的文本。我需要自己清理 HTML 标签,所以 <> 之间的数据使得

<p class="MsoNormal" style="text-align: justify;">1939&nbsp;After considerable negotiation between the Kemp estate and the Dunwich Trusts, the charter was purchased and returned to Dunwich.</p>

变成

<p>1939&nbsp;After considerable negotiation between the Kemp estate and the Dunwich Trusts, the charter was purchased and returned to Dunwich.</p>

我用

做了这个
$value = preg_replace("/<p[^>]+>/", "<p>", $value);

但我需要在字符串中保留 <a> 标签的内容,但还要清除多余的内容,例如 style 内容。

我打算通过 运行 循环并提取锚标记,然后处理每个锚标记,在空格处拆分并保持以 href= 开头的分解数组值,title=等等等等

但现在我的问题是:

如何使用 Preg_split 正则表达式拆分字符串以获取 <a> 标记的内容?

如果我这样做

$value = preg_split("/<a[^>]+>/", $value);

则值returns锚标签外部的内容,而不是锚标签内部。我不知道锚标签里面是什么,所以只能根据 <a.......>

我想从一个字符串中创建一个锚标记数组,这样:

<h2>Headlines</h2>
<a href="index.php?id=11">Charter Returned to Dunwich in 1939</a>  
<a href="index.php?id=10">Thomas Gardner Visits Dunwich</a>  
<a href="index.php?id=9">Treasure Chest Purchases</a>  
<a href="index.php?id=8">Dunwich Charter 1215</a>  
<a href="index.php?id=7">Why did Dunwich have a Charter?</a>  
</div> 

可以给我:

$array[0] = 'a href="index.php?id=11"';
$array[1] = 'a href="index.php?id=10"';
$array[2] = 'a href="index.php?id=9"';
$array[3] = 'a href="index.php?id=8"';
$array[4] = 'a href="index.php?id=7"';

仅使用 preg_match_all:

$re = "/<a[^>]+>/"; 
$str = "<h2>Headlines</h2>\n<a href=\"index.php?id=11\">Charter Returned to Dunwich in 1939</a>  \n<a href=\"index.php?id=10\">Thomas Gardner Visits Dunwich</a>  \n<a href=\"index.php?id=9\">Treasure Chest Purchases</a>  \n<a href=\"index.php?id=8\">Dunwich Charter 1215</a>  \n<a href=\"index.php?id=7\">Why did Dunwich have a Charter?</a>  \n</div> "; 
preg_match_all($re, $str, $matches);

$matches 将包含:

a href="index.php?id=11"
a href="index.php?id=10"
a href="index.php?id=9"
a href="index.php?id=8"
a href="index.php?id=7"

看看demo program