正则表达式删除任何 p 标签之前的文本

Question

我在 PHP 的字符串中有一个 HTML 片段。它是一些 css 文本后跟一个或多个 p-tag 封闭段落。

 .cs2E86D3A6{text-align:center; blarblarblar}<p>First paragraph. Keep this text</p><p>Second paragraph. Keep this text</p><p>Last paragraph.</p>

（恰好是strip_tags的结果。）我想删除First paragraph之前的任何垃圾文本，所以剩下的就是p标签中的那些。

我试过了

preg_replace('@^.*(?=<p>)@','', $mystring)

但它只给了我最后一个 Last paragraph。

会告诉我一个可以完成任务的正则表达式。

Answer 1

尝试使用函数strstr

strstr($mystring, '<p>');

它 returns 从 '' 开始到字符串结尾的所有内容。

strstr 文档

Answer 2

您需要延迟重复 任何字符，直到到达第一个 。你的 .* 是 greedy，这意味着它将匹配尽可能多的字符，包括 s，只要有  ] 接下来。因此，它当前会匹配到字符串中的最后一个 。将 ? 放在 * 或 + 之后，使重复变得懒惰而不是贪婪：

$orig = '.cs2E86D3A6{text-align:center; blarblarblar}<p>First paragraph. Keep this text</p><p>Second paragraph. Keep this text</p><p>Last paragraph.</p>';
print(preg_replace('@^.*?(?=<p>)@','', $orig))

正则表达式删除任何 p 标签之前的文本

Regexp to remove text before any p tags

php

regex

html-parsing

strip-tags