将 htmlentities 应用于剥离标签
Apply htmlentities to stripped tags
研究过的链接:
How do you apply htmlentities selectively?
和
PHP function to strip tags, except a list of whitelisted tags and attributes
它们很接近,但不像预期的那样。
我尝试了什么?
<?php
define('CHARSET', 'UTF-8');
define('REPLACE_FLAGS', ENT_HTML5);
function htmlcleaned($string) {
$string = htmlentities($string);
return str_replace(
array("<i>", "<b>", "</i>", "</b>", "<p>", "</p>"),
array("<i>", "<b>", "</i>", "</b>", "<p>", "</p>"), $string);
}
echo htmlcleaned("<p>How are you?</p><p><b>This is bold</b></p><p><i>This is italic</i></p><p><u>This is underline</u></p><p><br></p><ul><li>This is list item 1</li><li>This is list item 2</li></ul><p><br></p><ol><li>This is ordered list item 1</li><li>This is ordered list item 2</li></ol><p><a target='_blank' style='color: #1c5c76;' href='http://www.google.com'>http://www.google.com</a></p><p>This is plain text again.<br></p><script>alert('attempt csrf');</script><p><p>This is P tag example</p></p>");
?>
我想达到什么目的?
如果输入是:
<b><script>alert("something");</script></b>
那么输出将是:
<b><script&rt;("something");</script$rt;</b>
没有具体的黑名单,但是有具体的白名单。
此功能可能对您有所帮助,但尚未经过严格测试。它将对除您指定的标签之外的所有标签执行 htmlentities
function html_entity_decode_matches($matches){
return html_entity_decode($matches[0]);
}
function htmlentities_exclude($string, $exclude_array){
$string = htmlentities($string); //htmlentities all
$ent_sl = ">"; //>
if (is_array($exclude_array) AND !empty($exclude_array)){
foreach($exclude_array as $exc){
$exc = str_replace(array("<", ">"), "", $exc);
$ent = str_replace("/", "\/", htmlentities("<{$exc}"));
$ent_e = str_replace("/", "\/", htmlentities("</{$exc}>"));
//do decode on <tag...>
$string = preg_replace_callback("/{$ent}(.*?){$ent_sl}/", "html_entity_decode_matches", $string);
//do decode on <\tag>
$string = preg_replace_callback("/{$ent_e}/", "html_entity_decode_matches", $string);
}
}
return $string;
}
echo htmlentities_exclude('<b><script>alert("something");</script></b>', array("<b>"));
Output:
<b><script>alert("something");</script></b>
您可以使用 PHP DOM 对象来实现此目的,首先您创建一个元素(在您的例子中是 < b>)并提供编码字符串作为其主体(内部 HTML) 如下所示,
<?php
define('CHARSET', 'UTF-8');
define('REPLACE_FLAGS', ENT_HTML5);
function htmlcleaned($string) {
return str_replace(array("<", ">"), array("<", ">"), $string);
}
$dom = new DOMDocument('1.0', 'utf-8');
$element = $dom->createElement('b', htmlcleaned('<script>alert("something");</script>'));
$dom->appendChild($element);
$html = $dom->saveXML();
echo $html;
?>
您可以使用内置函数而不是像这样创建一个函数,
<?php
define('CHARSET', 'UTF-8');
define('REPLACE_FLAGS', ENT_HTML5);
$dom = new DOMDocument('1.0', 'utf-8');
$element = $dom->createElement('b', htmlspecialchars('<script>alert("something");</script>', ENT_NOQUOTES));
$dom->appendChild($element);
$html = $dom->saveXML();
echo $html;
?>
研究过的链接:
How do you apply htmlentities selectively? 和 PHP function to strip tags, except a list of whitelisted tags and attributes
它们很接近,但不像预期的那样。
我尝试了什么?
<?php
define('CHARSET', 'UTF-8');
define('REPLACE_FLAGS', ENT_HTML5);
function htmlcleaned($string) {
$string = htmlentities($string);
return str_replace(
array("<i>", "<b>", "</i>", "</b>", "<p>", "</p>"),
array("<i>", "<b>", "</i>", "</b>", "<p>", "</p>"), $string);
}
echo htmlcleaned("<p>How are you?</p><p><b>This is bold</b></p><p><i>This is italic</i></p><p><u>This is underline</u></p><p><br></p><ul><li>This is list item 1</li><li>This is list item 2</li></ul><p><br></p><ol><li>This is ordered list item 1</li><li>This is ordered list item 2</li></ol><p><a target='_blank' style='color: #1c5c76;' href='http://www.google.com'>http://www.google.com</a></p><p>This is plain text again.<br></p><script>alert('attempt csrf');</script><p><p>This is P tag example</p></p>");
?>
我想达到什么目的?
如果输入是:
<b><script>alert("something");</script></b>
那么输出将是:
<b><script&rt;("something");</script$rt;</b>
没有具体的黑名单,但是有具体的白名单。
此功能可能对您有所帮助,但尚未经过严格测试。它将对除您指定的标签之外的所有标签执行 htmlentities
function html_entity_decode_matches($matches){
return html_entity_decode($matches[0]);
}
function htmlentities_exclude($string, $exclude_array){
$string = htmlentities($string); //htmlentities all
$ent_sl = ">"; //>
if (is_array($exclude_array) AND !empty($exclude_array)){
foreach($exclude_array as $exc){
$exc = str_replace(array("<", ">"), "", $exc);
$ent = str_replace("/", "\/", htmlentities("<{$exc}"));
$ent_e = str_replace("/", "\/", htmlentities("</{$exc}>"));
//do decode on <tag...>
$string = preg_replace_callback("/{$ent}(.*?){$ent_sl}/", "html_entity_decode_matches", $string);
//do decode on <\tag>
$string = preg_replace_callback("/{$ent_e}/", "html_entity_decode_matches", $string);
}
}
return $string;
}
echo htmlentities_exclude('<b><script>alert("something");</script></b>', array("<b>"));
Output:
<b><script>alert("something");</script></b>
您可以使用 PHP DOM 对象来实现此目的,首先您创建一个元素(在您的例子中是 < b>)并提供编码字符串作为其主体(内部 HTML) 如下所示,
<?php
define('CHARSET', 'UTF-8');
define('REPLACE_FLAGS', ENT_HTML5);
function htmlcleaned($string) {
return str_replace(array("<", ">"), array("<", ">"), $string);
}
$dom = new DOMDocument('1.0', 'utf-8');
$element = $dom->createElement('b', htmlcleaned('<script>alert("something");</script>'));
$dom->appendChild($element);
$html = $dom->saveXML();
echo $html;
?>
您可以使用内置函数而不是像这样创建一个函数,
<?php
define('CHARSET', 'UTF-8');
define('REPLACE_FLAGS', ENT_HTML5);
$dom = new DOMDocument('1.0', 'utf-8');
$element = $dom->createElement('b', htmlspecialchars('<script>alert("something");</script>', ENT_NOQUOTES));
$dom->appendChild($element);
$html = $dom->saveXML();
echo $html;
?>