保持现有 HTML 个实体不变，但转换双引号和单引号

Question

我正在使用 PHP 代码生成我的元描述标签，如下所示：

<meta name="description" content="<?php
echo $this->utf->clean_string(word_limiter(strip_tags(trim($paperResult['file_content'])),27));
?>

这是元描述输出的示例：

<meta name="description" content="blah blah &#182; &#8230; blah blah "words in quotation marks" blah blah "more words in quotation marks" blah blah" />

该示例元描述中的两个 HTML 实体是一个段落符号 (¶)，后跟一个省略号 (…)。它们在源文本中已经是 HTML 实体形式，所以我希望它们保持不变。问题是我还需要将描述中的引号转换为 " 以防止元标记被破坏。我尝试的每个 combination/configuration 要么不起作用，要么破坏我的网站，因为我的代码有误。例如，当我尝试以下代码时，引号会根据需要转换为它们的 HTML 实体，但段落符号和省略号实体会中断，因为现有 HTML 实体开头的 & 字符转换为 &。这给我留下了一个破碎的 ¶ (&#182;) 和一个破碎的 … (&#8230;) :

 echo $this->utf->clean_string(word_limiter(htmlspecialchars(strip_tags(trim($paperResult['file_content']))),27));

几天来，我一直在努力弄清楚这个问题。我在 Stack Overflow 中进行了广泛搜索，但无济于事。我只需要现有的 HTML 实体保持不变，并将引号转换为它们的 HTML 实体 (")。我研究了 ENT_QUOTES option 并且我知道解决方案可能存在于其中，但我不知道如何将它合并到我的特定代码行中。我希望你们 PHP 大师们能够怜悯这个饱受折磨的灵魂！非常感谢您的帮助。

谢谢！

Answer 1

我不能确定，因为你没有告诉我们所有其他函数的作用，但你似乎可以这样做：

<meta name="description" content="<?=htmlspecialchars(html_entity_decode(word_limiter($paperResult['file_content'], 27)))?>"/>

所以限制你的字数，把任何实体变成字符，然后再把任何特殊字符变回实体。没有必要为了安全而剥离标签等，因为 htmlspecialchars 将确保任何输出都可以安全地包含在 HTML.

中

Answer 2

如果是 "content" 属性的内容你可以这样做

$str = 'blah blah &#182; &#8230; blah blah "words in quotation marks" blah blah "more words in quotation marks" blah blah';
echo htmlentities($str, ENT_QUOTES, "UTF-8", false);

输出

blah blah &#182; &#8230; blah blah &quot;words in quotation marks&quot; blah blah &quot;more words in quotation marks&quot; blah blah

Sandbox

这里的关键是第四个参数

string htmlentities ( string $string [, int $flags = ENT_COMPAT | ENT_HTML401 [, string $encoding = ini_get("default_charset") [, bool $double_encode = TRUE ]]] )

具体

double_encode When double_encode is turned off PHP will not encode existing html entities. The default is to convert everything.

这样它就不会对 & 符号进行双重编码。

htmlspecialchars 也有一个双重编码参数。

htmlspecialchars ( string $string [, int $flags = ENT_COMPAT | ENT_HTML401 [, string $encoding = ini_get("default_charset") [, bool $double_encode = TRUE ]]] )

$str = 'blah blah &#182; &#8230; blah blah "words in quotation marks" blah blah "more words in quotation marks" blah blah';
echo htmlspecialchars($str, ENT_QUOTES, "UTF-8", false);

输出

blah blah &#182; &#8230; blah blah &quot;words in quotation marks&quot; blah blah &quot;more words in quotation marks&quot; blah blah

Sandbox

如果是整个tag，那么就得把里面的内容拉出来修改，然后替换掉，这样才能保留<和>，但是在里面不清楚问题是不是这样。

PS htmlspecialchars和htmlentities区别不大，主要是éaccute之类的口音，如果我没记错的话，htmlentities 也对它们进行编码。

更新

I need the solution to be incorporated into my particular format of PHP code (i.e., a single line of PHP that maintains my existing functions/functionality), as miken32 brilliantly did above

将其放入您的代码中，

<meta name="description" content="<?=htmlspecialchars(word_limiter(trim($paperResult['file_content']),27),ENT_QUOTES,"UTF-8",false);?>"/>

更新2

使用 preg_replace('/[\r\n]+/', ' ', $string) 删除 \r\n 或 \n 一次或多次 +。但是这样做可能会更好preg_replace(['/[\r\n]+/', '/\s+/'], ' ', $string)。这也会删除空格上的运行。

 <meta name="description" content="<?=htmlspecialchars(word_limiter(preg_replace('/[\r\n]+/', ' ', trim($paperResult['file_content'])),27),ENT_QUOTES,"UTF-8",false);?>"/>

基本上，它相当于使文本更短的任何事情，您可能想在 word_limiter 之前做（无论那是什么）。以及任何让它变长的事情，比如将 " 更改为 &quote; 你可能想在之后（也许）做。这对我来说似乎更合乎逻辑。

干杯！

保持现有 HTML 个实体不变，但转换双引号和单引号

Leave existing HTML entities as-is, but convert double-quotes and single-quotes

html

php

html-entities

htmlspecialchars