PHP Preg_split 选择 HTML 标签的内部内容
PHP Preg_split selecting the internal contents of an HTML tag
我有一个字符串,其中包含各种 HTML 标签中的文本。我需要自己清理 HTML 标签,所以 <
和 >
之间的数据使得
<p class="MsoNormal" style="text-align: justify;">1939 After considerable negotiation between the Kemp estate and the Dunwich Trusts, the charter was purchased and returned to Dunwich.</p>
变成
<p>1939 After considerable negotiation between the Kemp estate and the Dunwich Trusts, the charter was purchased and returned to Dunwich.</p>
我用
做了这个
$value = preg_replace("/<p[^>]+>/", "<p>", $value);
但我需要在字符串中保留 <a>
标签的内容,但还要清除多余的内容,例如 style
内容。
我打算通过 运行 循环并提取锚标记,然后处理每个锚标记,在空格处拆分并保持以 href=
开头的分解数组值,title=
等等等等
但现在我的问题是:
如何使用 Preg_split 正则表达式拆分字符串以获取 <a>
标记的内容?
如果我这样做
$value = preg_split("/<a[^>]+>/", $value);
则值returns锚标签外部的内容,而不是锚标签内部。我不知道锚标签里面是什么,所以只能根据 <a
.......>
我想从一个字符串中创建一个锚标记数组,这样:
<h2>Headlines</h2>
<a href="index.php?id=11">Charter Returned to Dunwich in 1939</a>
<a href="index.php?id=10">Thomas Gardner Visits Dunwich</a>
<a href="index.php?id=9">Treasure Chest Purchases</a>
<a href="index.php?id=8">Dunwich Charter 1215</a>
<a href="index.php?id=7">Why did Dunwich have a Charter?</a>
</div>
可以给我:
$array[0] = 'a href="index.php?id=11"';
$array[1] = 'a href="index.php?id=10"';
$array[2] = 'a href="index.php?id=9"';
$array[3] = 'a href="index.php?id=8"';
$array[4] = 'a href="index.php?id=7"';
仅使用 preg_match_all
:
$re = "/<a[^>]+>/";
$str = "<h2>Headlines</h2>\n<a href=\"index.php?id=11\">Charter Returned to Dunwich in 1939</a> \n<a href=\"index.php?id=10\">Thomas Gardner Visits Dunwich</a> \n<a href=\"index.php?id=9\">Treasure Chest Purchases</a> \n<a href=\"index.php?id=8\">Dunwich Charter 1215</a> \n<a href=\"index.php?id=7\">Why did Dunwich have a Charter?</a> \n</div> ";
preg_match_all($re, $str, $matches);
$matches
将包含:
a href="index.php?id=11"
a href="index.php?id=10"
a href="index.php?id=9"
a href="index.php?id=8"
a href="index.php?id=7"
看看demo program。
我有一个字符串,其中包含各种 HTML 标签中的文本。我需要自己清理 HTML 标签,所以 <
和 >
之间的数据使得
<p class="MsoNormal" style="text-align: justify;">1939 After considerable negotiation between the Kemp estate and the Dunwich Trusts, the charter was purchased and returned to Dunwich.</p>
变成
<p>1939 After considerable negotiation between the Kemp estate and the Dunwich Trusts, the charter was purchased and returned to Dunwich.</p>
我用
做了这个$value = preg_replace("/<p[^>]+>/", "<p>", $value);
但我需要在字符串中保留 <a>
标签的内容,但还要清除多余的内容,例如 style
内容。
我打算通过 运行 循环并提取锚标记,然后处理每个锚标记,在空格处拆分并保持以 href=
开头的分解数组值,title=
等等等等
但现在我的问题是:
如何使用 Preg_split 正则表达式拆分字符串以获取 <a>
标记的内容?
如果我这样做
$value = preg_split("/<a[^>]+>/", $value);
则值returns锚标签外部的内容,而不是锚标签内部。我不知道锚标签里面是什么,所以只能根据 <a
.......>
我想从一个字符串中创建一个锚标记数组,这样:
<h2>Headlines</h2>
<a href="index.php?id=11">Charter Returned to Dunwich in 1939</a>
<a href="index.php?id=10">Thomas Gardner Visits Dunwich</a>
<a href="index.php?id=9">Treasure Chest Purchases</a>
<a href="index.php?id=8">Dunwich Charter 1215</a>
<a href="index.php?id=7">Why did Dunwich have a Charter?</a>
</div>
可以给我:
$array[0] = 'a href="index.php?id=11"';
$array[1] = 'a href="index.php?id=10"';
$array[2] = 'a href="index.php?id=9"';
$array[3] = 'a href="index.php?id=8"';
$array[4] = 'a href="index.php?id=7"';
仅使用 preg_match_all
:
$re = "/<a[^>]+>/";
$str = "<h2>Headlines</h2>\n<a href=\"index.php?id=11\">Charter Returned to Dunwich in 1939</a> \n<a href=\"index.php?id=10\">Thomas Gardner Visits Dunwich</a> \n<a href=\"index.php?id=9\">Treasure Chest Purchases</a> \n<a href=\"index.php?id=8\">Dunwich Charter 1215</a> \n<a href=\"index.php?id=7\">Why did Dunwich have a Charter?</a> \n</div> ";
preg_match_all($re, $str, $matches);
$matches
将包含:
a href="index.php?id=11"
a href="index.php?id=10"
a href="index.php?id=9"
a href="index.php?id=8"
a href="index.php?id=7"
看看demo program。