正则表达式从 php 中的字体系列以外的样式属性中删除所有属性
RegEx to remove all properties from style attribute except font-family in php
我想在 php
中删除样式属性中除 font-family 之外的所有属性
我试过了
style=(.*)font-[^;]+;
示例html
<div style='margin: 0px 14.3906px 0px 28.7969px; padding: 0px; width: 436.797px; float: left; font-family: "Open Sans", Arial, sans-serif;'><p style="margin-right: 0px; margin-bottom: 15px; margin-left: 0px; padding: 0px; text-align: justify;"><strong style="margin: 0px; padding: 0px;">Lorem Ipsum</strong> is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.</p><div><br></div></div><div style='margin: 0px 28.7969px 0px 14.3906px; padding: 0px; width: 436.797px; float: right; font-family: "Open Sans", Arial, sans-serif;'></div>
预期输出
<div style='font-family: "Open Sans", Arial, sans-serif;'><p><strong>Lorem Ipsum</strong> is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.</p><div><br></div></div><div style='font-family: "Open Sans", Arial, sans-serif;'></div>
但未按预期工作。这有什么需要改变的吗?
我想你必须用两个替换来做到这一点。第一次替换将仅在 HTML 样式包含字体系列规范的标签中保留字体系列样式:
使用以下正则表达式:
style=(['"])(?:[^>]*)(font-family:[^;]+;)(?:[^>]*)
替换:
style=
style=
匹配 'style='
['"])
匹配捕获组 1 中的单引号或双引号
(?:[^>]*)
匹配 0 个或多个不是 '>' 的字符或不是捕获组 1 的单引号或双引号(这确保我们不会扫描到当前 HTML标签)
(font-family:[^;]+;
匹配字体系列声明
(?:[^>]*)
匹配样式声明的其余部分(除了开头引号之外的所有字符确保我们不会扫描当前的 HTML 标签)
</code> 匹配开头引号(捕获组 1 中的任何内容,即单引号或双引号)</li>
</ol>
<p>下一个正则表达式将完全删除那些 HTML 开头不包含字体系列规范的标签的样式规范:</p>
<p>使用正则表达式:</p>
<pre><code>\sstyle=(['"])(?![^>]*font-family:)(?:[^>]*)
并替换空字符串,''
\sstyle=
匹配后跟 'style-'. 的空格
(['"])
匹配捕获组 1 中的单引号或双引号。
(?![^>]*font-family:)
一个否定的前瞻性断言,接下来的内容不是:0 个或多个字符与开头引号(捕获组 1 是什么)或“>”后跟“字体系列”不匹配: .也就是说,这个样式规范不包含'font-family:'.
(?:[^>]*)
匹配 0 个或多个不匹配开头引号(捕获组 1 是什么)或“>”的字符。
</code> 匹配开头的引号字符(捕获组 1 中的字符。</li>
</ol>
<p><strong>PHP代码</strong></p>
<pre><code><?php
$html = <<<EOF
<div style='margin: 0px 14.3906px 0px 28.7969px; padding: 0px; width: 436.797px; float: left; font-family: "Open Sans", Arial, sans-serif;'><p style="margin-right: 0px; margin-bottom: 15px; margin-left: 0px; padding: 0px; text-align: justify;"><strong style="margin: 0px; padding: 0px;">Lorem Ipsum</strong> is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.</p><div><br></div></div><div style='margin: 0px 28.7969px 0px 14.3906px; padding: 0px; width: 436.797px; float: right; font-family: "Open Sans", Arial, sans-serif;'></div>
EOF;
$html = preg_replace('/style=([\'"])(?:[^>]*)(font-family:[^;]+;)(?:[^>]*)/', 'style=', $html);
$html = preg_replace('/\sstyle=([\'"])(?![^>]*font-family:)(?:[^>]*)/', '', $html);
echo $html;
打印:
<div style='font-family: "Open Sans", Arial, sans-serif;'><p><strong>Lorem Ipsum</strong> is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.</p><div><br></div></div><div style='font-family: "Open Sans", Arial, sans-serif;'></div>
您可以考虑使用 DOMDocument 从所有元素中获取例如样式属性。
如果有样式,则可以使用模式来捕获捕获组中的字体系列部分,并在替换中使用该组。
.*?\b(font-[^;]+;?).*|.*
模式匹配:
.*?
匹配尽可能少的字符
\b(
一个字边界,开始抓包第1组
font-[^;]+;?
匹配 font-
,然后匹配 ;
以外的 1+ 个字符,后跟可选的 ;
)
关闭组 1
.*
匹配行的其余部分
|
.*
匹配整行
例如
$data = <<<DATA
<div style='margin: 0px 14.3906px 0px 28.7969px; padding: 0px; width: 436.797px; float: left; font-family: "Open Sans", Arial, sans-serif;'><p style="margin-right: 0px; margin-bottom: 15px; margin-left: 0px; padding: 0px; text-align: justify;"><strong style="margin: 0px; padding: 0px;">Lorem Ipsum</strong> is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.</p><div><br></div></div><div style='margin: 0px 28.7969px 0px 14.3906px; padding: 0px; width: 436.797px; float: right; font-family: "Open Sans", Arial, sans-serif;'></div>
DATA;
$dom = new DOMDocument();
$dom->loadHTML($data, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
foreach($dom->getElementsByTagName('*') as $element ){
if ($element->hasAttribute('style')) {
$style = $element->getAttribute('style');
$replacement = preg_replace("/.*?\b(font-[^;]+;?).*|.*/", "", $style);
if (trim($replacement) !== "") {
$element->setAttribute('style', $replacement);
} else {
$element->removeAttribute('style');
}
}
}
echo $dom->saveHTML();
输出
<div style='font-family: "Open Sans", Arial, sans-serif;'><p><strong>Lorem Ipsum</strong> is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.</p><div><br></div><div style='font-family: "Open Sans", Arial, sans-serif;'></div></div>
我想在 php
中删除样式属性中除 font-family 之外的所有属性我试过了
style=(.*)font-[^;]+;
示例html
<div style='margin: 0px 14.3906px 0px 28.7969px; padding: 0px; width: 436.797px; float: left; font-family: "Open Sans", Arial, sans-serif;'><p style="margin-right: 0px; margin-bottom: 15px; margin-left: 0px; padding: 0px; text-align: justify;"><strong style="margin: 0px; padding: 0px;">Lorem Ipsum</strong> is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.</p><div><br></div></div><div style='margin: 0px 28.7969px 0px 14.3906px; padding: 0px; width: 436.797px; float: right; font-family: "Open Sans", Arial, sans-serif;'></div>
预期输出
<div style='font-family: "Open Sans", Arial, sans-serif;'><p><strong>Lorem Ipsum</strong> is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.</p><div><br></div></div><div style='font-family: "Open Sans", Arial, sans-serif;'></div>
但未按预期工作。这有什么需要改变的吗?
我想你必须用两个替换来做到这一点。第一次替换将仅在 HTML 样式包含字体系列规范的标签中保留字体系列样式:
使用以下正则表达式:
style=(['"])(?:[^>]*)(font-family:[^;]+;)(?:[^>]*)
替换:
style=
style=
匹配 'style='['"])
匹配捕获组 1 中的单引号或双引号
(?:[^>]*)
匹配 0 个或多个不是 '>' 的字符或不是捕获组 1 的单引号或双引号(这确保我们不会扫描到当前 HTML标签)(font-family:[^;]+;
匹配字体系列声明(?:[^>]*)
匹配样式声明的其余部分(除了开头引号之外的所有字符确保我们不会扫描当前的 HTML 标签)</code> 匹配开头引号(捕获组 1 中的任何内容,即单引号或双引号)</li> </ol> <p>下一个正则表达式将完全删除那些 HTML 开头不包含字体系列规范的标签的样式规范:</p> <p>使用正则表达式:</p> <pre><code>\sstyle=(['"])(?![^>]*font-family:)(?:[^>]*)
并替换空字符串,''
\sstyle=
匹配后跟 'style-'. 的空格
(['"])
匹配捕获组 1 中的单引号或双引号。(?![^>]*font-family:)
一个否定的前瞻性断言,接下来的内容不是:0 个或多个字符与开头引号(捕获组 1 是什么)或“>”后跟“字体系列”不匹配: .也就是说,这个样式规范不包含'font-family:'.(?:[^>]*)
匹配 0 个或多个不匹配开头引号(捕获组 1 是什么)或“>”的字符。</code> 匹配开头的引号字符(捕获组 1 中的字符。</li> </ol> <p><strong>PHP代码</strong></p> <pre><code><?php $html = <<<EOF <div style='margin: 0px 14.3906px 0px 28.7969px; padding: 0px; width: 436.797px; float: left; font-family: "Open Sans", Arial, sans-serif;'><p style="margin-right: 0px; margin-bottom: 15px; margin-left: 0px; padding: 0px; text-align: justify;"><strong style="margin: 0px; padding: 0px;">Lorem Ipsum</strong> is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.</p><div><br></div></div><div style='margin: 0px 28.7969px 0px 14.3906px; padding: 0px; width: 436.797px; float: right; font-family: "Open Sans", Arial, sans-serif;'></div> EOF; $html = preg_replace('/style=([\'"])(?:[^>]*)(font-family:[^;]+;)(?:[^>]*)/', 'style=', $html); $html = preg_replace('/\sstyle=([\'"])(?![^>]*font-family:)(?:[^>]*)/', '', $html); echo $html;
打印:
<div style='font-family: "Open Sans", Arial, sans-serif;'><p><strong>Lorem Ipsum</strong> is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.</p><div><br></div></div><div style='font-family: "Open Sans", Arial, sans-serif;'></div>
您可以考虑使用 DOMDocument 从所有元素中获取例如样式属性。
如果有样式,则可以使用模式来捕获捕获组中的字体系列部分,并在替换中使用该组。
.*?\b(font-[^;]+;?).*|.*
模式匹配:
.*?
匹配尽可能少的字符\b(
一个字边界,开始抓包第1组font-[^;]+;?
匹配font-
,然后匹配;
以外的 1+ 个字符,后跟可选的;
)
关闭组 1.*
匹配行的其余部分|
.*
匹配整行
例如
$data = <<<DATA
<div style='margin: 0px 14.3906px 0px 28.7969px; padding: 0px; width: 436.797px; float: left; font-family: "Open Sans", Arial, sans-serif;'><p style="margin-right: 0px; margin-bottom: 15px; margin-left: 0px; padding: 0px; text-align: justify;"><strong style="margin: 0px; padding: 0px;">Lorem Ipsum</strong> is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.</p><div><br></div></div><div style='margin: 0px 28.7969px 0px 14.3906px; padding: 0px; width: 436.797px; float: right; font-family: "Open Sans", Arial, sans-serif;'></div>
DATA;
$dom = new DOMDocument();
$dom->loadHTML($data, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
foreach($dom->getElementsByTagName('*') as $element ){
if ($element->hasAttribute('style')) {
$style = $element->getAttribute('style');
$replacement = preg_replace("/.*?\b(font-[^;]+;?).*|.*/", "", $style);
if (trim($replacement) !== "") {
$element->setAttribute('style', $replacement);
} else {
$element->removeAttribute('style');
}
}
}
echo $dom->saveHTML();
输出
<div style='font-family: "Open Sans", Arial, sans-serif;'><p><strong>Lorem Ipsum</strong> is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.</p><div><br></div><div style='font-family: "Open Sans", Arial, sans-serif;'></div></div>