将没有 class 标签的标题 <hn> 替换为 <p>
Replacing Heading <hn> without class Tags with <p>
我想替换不包含 class 属性的 hn
标签。这个想法是匹配 hn
之后的任何内容,除了包含 class="something"
+ 并以 >
结尾的字符串
这是我的第一次尝试:
<?php
$content = <<<HTML
<h1 style="color:black">test1</h1>
<H2 class="green">test2</H2>
<h5 class="red">test</h5>
<h5 class="">test test</h5>
HTML;
$content = preg_replace('#<h([1-6])((?!class).)*?>(.*?)<\/h[1-6]>#si', '<p class="heading-" ></p>', $content);
echo ($content);
结果是:
<p class="heading-1" ">test1</p>
<H2 class="green">test2</H2>
<h5 class="red">test</h5>
<h5 class="">test test</h5>
应该是:
<p class="heading-1" style="color:black">test1</p>
<H2 class="green">test2</H2>
<h5 class="red">test</h5>
<h5 class="">test test</h5>
知道为什么 $2 映射到 "
值而不是 style="color:black"
您的捕获组必须添加到稍微不同的地方。
将 ((?!class).)*?
替换为 ((?:(?!class).)*?)
。
使用
'#<h([1-6])\s*((?:(?!class).)*?)>(.*?)</h[1-6]>#si'
参见 regex proof。
解释
--------------------------------------------------------------------------------
<h '<h'
--------------------------------------------------------------------------------
( group and capture to :
--------------------------------------------------------------------------------
[1-6] any character of: '1' to '6'
--------------------------------------------------------------------------------
) end of
--------------------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
( group and capture to :
--------------------------------------------------------------------------------
(?: group, but do not capture (0 or more
times (matching the least amount
possible)):
--------------------------------------------------------------------------------
(?! look ahead to see if there is not:
--------------------------------------------------------------------------------
class 'class'
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
. any character except \n
--------------------------------------------------------------------------------
)*? end of grouping
--------------------------------------------------------------------------------
) end of
--------------------------------------------------------------------------------
> '>'
--------------------------------------------------------------------------------
( group and capture to :
--------------------------------------------------------------------------------
.*? any character except \n (0 or more times
(matching the least amount possible))
--------------------------------------------------------------------------------
) end of
--------------------------------------------------------------------------------
</h '</h'
--------------------------------------------------------------------------------
[1-6] any character of: '1' to '6'
--------------------------------------------------------------------------------
> '>'
我想替换不包含 class 属性的 hn
标签。这个想法是匹配 hn
之后的任何内容,除了包含 class="something"
+ 并以 >
这是我的第一次尝试:
<?php
$content = <<<HTML
<h1 style="color:black">test1</h1>
<H2 class="green">test2</H2>
<h5 class="red">test</h5>
<h5 class="">test test</h5>
HTML;
$content = preg_replace('#<h([1-6])((?!class).)*?>(.*?)<\/h[1-6]>#si', '<p class="heading-" ></p>', $content);
echo ($content);
结果是:
<p class="heading-1" ">test1</p>
<H2 class="green">test2</H2>
<h5 class="red">test</h5>
<h5 class="">test test</h5>
应该是:
<p class="heading-1" style="color:black">test1</p>
<H2 class="green">test2</H2>
<h5 class="red">test</h5>
<h5 class="">test test</h5>
知道为什么 $2 映射到 "
值而不是 style="color:black"
您的捕获组必须添加到稍微不同的地方。
将 ((?!class).)*?
替换为 ((?:(?!class).)*?)
。
使用
'#<h([1-6])\s*((?:(?!class).)*?)>(.*?)</h[1-6]>#si'
参见 regex proof。
解释
--------------------------------------------------------------------------------
<h '<h'
--------------------------------------------------------------------------------
( group and capture to :
--------------------------------------------------------------------------------
[1-6] any character of: '1' to '6'
--------------------------------------------------------------------------------
) end of
--------------------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
( group and capture to :
--------------------------------------------------------------------------------
(?: group, but do not capture (0 or more
times (matching the least amount
possible)):
--------------------------------------------------------------------------------
(?! look ahead to see if there is not:
--------------------------------------------------------------------------------
class 'class'
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
. any character except \n
--------------------------------------------------------------------------------
)*? end of grouping
--------------------------------------------------------------------------------
) end of
--------------------------------------------------------------------------------
> '>'
--------------------------------------------------------------------------------
( group and capture to :
--------------------------------------------------------------------------------
.*? any character except \n (0 or more times
(matching the least amount possible))
--------------------------------------------------------------------------------
) end of
--------------------------------------------------------------------------------
</h '</h'
--------------------------------------------------------------------------------
[1-6] any character of: '1' to '6'
--------------------------------------------------------------------------------
> '>'