PHP Preg_split 选择 HTML 标签的内部内容

Question

我有一个字符串，其中包含各种 HTML 标签中的文本。我需要自己清理 HTML 标签，所以 < 和 > 之间的数据使得

<p class="MsoNormal" style="text-align: justify;">1939&nbsp;After considerable negotiation between the Kemp estate and the Dunwich Trusts, the charter was purchased and returned to Dunwich.</p>

变成

<p>1939&nbsp;After considerable negotiation between the Kemp estate and the Dunwich Trusts, the charter was purchased and returned to Dunwich.</p>

我用

做了这个

$value = preg_replace("/<p[^>]+>/", "<p>", $value);

但我需要在字符串中保留 <a> 标签的内容，但还要清除多余的内容，例如 style 内容。

我打算通过运行循环并提取锚标记，然后处理每个锚标记，在空格处拆分并保持以 href= 开头的分解数组值，title=等等等等

但现在我的问题是：

如何使用 Preg_split 正则表达式拆分字符串以获取 <a> 标记的内容？

如果我这样做

$value = preg_split("/<a[^>]+>/", $value);

则值returns锚标签外部的内容，而不是锚标签内部。我不知道锚标签里面是什么，所以只能根据 <a.......>

我想从一个字符串中创建一个锚标记数组，这样：

<h2>Headlines</h2>
<a href="index.php?id=11">Charter Returned to Dunwich in 1939</a>  
<a href="index.php?id=10">Thomas Gardner Visits Dunwich</a>  
<a href="index.php?id=9">Treasure Chest Purchases</a>  
<a href="index.php?id=8">Dunwich Charter 1215</a>  
<a href="index.php?id=7">Why did Dunwich have a Charter?</a>  
</div>

可以给我：

$array[0] = 'a href="index.php?id=11"';
$array[1] = 'a href="index.php?id=10"';
$array[2] = 'a href="index.php?id=9"';
$array[3] = 'a href="index.php?id=8"';
$array[4] = 'a href="index.php?id=7"';

Answer 1

仅使用 preg_match_all:

$re = "/<a[^>]+>/"; 
$str = "<h2>Headlines</h2>\n<a href=\"index.php?id=11\">Charter Returned to Dunwich in 1939</a>  \n<a href=\"index.php?id=10\">Thomas Gardner Visits Dunwich</a>  \n<a href=\"index.php?id=9\">Treasure Chest Purchases</a>  \n<a href=\"index.php?id=8\">Dunwich Charter 1215</a>  \n<a href=\"index.php?id=7\">Why did Dunwich have a Charter?</a>  \n</div> "; 
preg_match_all($re, $str, $matches);

$matches 将包含：

a href="index.php?id=11"
a href="index.php?id=10"
a href="index.php?id=9"
a href="index.php?id=8"
a href="index.php?id=7"

看看demo program。

PHP Preg_split 选择 HTML 标签的内部内容

PHP Preg_split selecting the internal contents of an HTML tag

php

regex

preg-split