将字符串的每个字母包装在标签中,避免使用 HTML 标签
Wrap each letter of a string in a tag, avoiding HTML tags
我想构建一个函数,它接受一个字符串并将其每个字母包装在 <span>
中,空格和 HTML 标签除外(在我的例子中,<br>
标签)。
所以:
"Hi <br> there."
...应该变成
"<span>H</span><span>i</span> <br> <span>t</span><span>h</span><span>e</span><span>r</span><span>e</span><span>.</span>"
我没有找到自己的解决方案,所以我环顾四周,发现很难找到我正在寻找的东西。
我找到的最接近的答案是 Neverever 的回答 here。
然而,它似乎并没有那么好用,因为 <br>
标签的每个字符都被包裹在 <span>
中,并且它与 éèàï 等重音字符不匹配。
我该如何处理?
为什么用正则表达式解析 HTML 标签似乎错了?
您可以尝试类似...
<?php
$str = "Hi <br> there.";
$newstr = "";
$notintag = true;
for ($i = 0; $i < strlen($str); $i++) {
if (substr($str,$i,1) == "<") {
$notintag = false;
}
if (($notintag) and (substr($str,$i,1) != " ")) {
$newstr .= "<span>" . substr($str,$i,1) . "</span>";
} else {
$newstr .= substr($str,$i,1);
}
if (substr($str,$i,1) == ">") {
$notintag = true;
}
}
echo $newstr;
?>
您可以考虑使用 DOMDocument to parse HTML and wrap only chars within the value of DOMText 个节点。请参阅代码中的注释。
// Define source
$source = 'Hï <br/> thérè.';
// Create DOM document and load HTML string, hinting that it is UTF-8 encoded.
// We need a root element for this so we wrap the source in a temporary <div>.
$hint = '<meta http-equiv="content-type" content="text/html; charset=utf-8">';
$dom = new DOMDocument();
$dom->loadHTML($hint . "<div>" . $source . "</div>");
// Get contents of temporary root node
$root = $dom->getElementsByTagName('div')->item(0);
// Loop through children
$next = $root->firstChild;
while ($node = $next) {
$next = $node->nextSibling; // Save for next while iteration
// We are only interested in text nodes (not <br/> etc)
if ($node->nodeType == XML_TEXT_NODE) {
// Wrap each character of the text node (e.g. "Hi ") in a <span> of
// its own, e.g. "<span>H</span><span>i</span><span> </span>"
foreach (preg_split('/(?<!^)(?!$)/u', $node->nodeValue) as $char) {
$span = $dom->createElement('span', $char);
$root->insertBefore($span, $node);
}
// Drop text node (e.g. "Hi ") leaving only <span> wrapped chars
$root->removeChild($node);
}
}
// Back to string via SimpleXMLElement (so that the output is more similar to
// the source than would be the case with $root->C14N() etc), removing temporary
// root <div> element and space-only spans as well.
$withSpans = simplexml_import_dom($root)->asXML();
$withSpans = preg_replace('#^<div>|</div>$#', '', $withSpans);
$withSpans = preg_replace('#<span> </span>#', ' ', $withSpans);
echo $withSpans, PHP_EOL;
输出:
<span>H</span><span>ï</span> <br/> <span>t</span><span>h</span><span>é</span><span>r</span><span>è</span><span>.</span>
您可以使用 ([^\s>])(?!(?:[^<>]*)?>)
正则表达式获得结果。要启用 Unicode 支持,只需将其与 u
选项一起使用:
<?php
$re = "/([^\s>])(?!(?:[^<>]*)?>)/u";
$str = "Hi <br> there.";
$subst = "<span></span>";
$result = preg_replace($re, $subst, $str);
echo $result;
?>
在这里你可以找到regex explanation and demo。
参见sample program without Unicode support and here is one with Unicode support(区别在于u
选项)。
我想构建一个函数,它接受一个字符串并将其每个字母包装在 <span>
中,空格和 HTML 标签除外(在我的例子中,<br>
标签)。
所以:
"Hi <br> there."
...应该变成
"<span>H</span><span>i</span> <br> <span>t</span><span>h</span><span>e</span><span>r</span><span>e</span><span>.</span>"
我没有找到自己的解决方案,所以我环顾四周,发现很难找到我正在寻找的东西。
我找到的最接近的答案是 Neverever 的回答 here。
然而,它似乎并没有那么好用,因为 <br>
标签的每个字符都被包裹在 <span>
中,并且它与 éèàï 等重音字符不匹配。
我该如何处理? 为什么用正则表达式解析 HTML 标签似乎错了?
您可以尝试类似...
<?php
$str = "Hi <br> there.";
$newstr = "";
$notintag = true;
for ($i = 0; $i < strlen($str); $i++) {
if (substr($str,$i,1) == "<") {
$notintag = false;
}
if (($notintag) and (substr($str,$i,1) != " ")) {
$newstr .= "<span>" . substr($str,$i,1) . "</span>";
} else {
$newstr .= substr($str,$i,1);
}
if (substr($str,$i,1) == ">") {
$notintag = true;
}
}
echo $newstr;
?>
您可以考虑使用 DOMDocument to parse HTML and wrap only chars within the value of DOMText 个节点。请参阅代码中的注释。
// Define source
$source = 'Hï <br/> thérè.';
// Create DOM document and load HTML string, hinting that it is UTF-8 encoded.
// We need a root element for this so we wrap the source in a temporary <div>.
$hint = '<meta http-equiv="content-type" content="text/html; charset=utf-8">';
$dom = new DOMDocument();
$dom->loadHTML($hint . "<div>" . $source . "</div>");
// Get contents of temporary root node
$root = $dom->getElementsByTagName('div')->item(0);
// Loop through children
$next = $root->firstChild;
while ($node = $next) {
$next = $node->nextSibling; // Save for next while iteration
// We are only interested in text nodes (not <br/> etc)
if ($node->nodeType == XML_TEXT_NODE) {
// Wrap each character of the text node (e.g. "Hi ") in a <span> of
// its own, e.g. "<span>H</span><span>i</span><span> </span>"
foreach (preg_split('/(?<!^)(?!$)/u', $node->nodeValue) as $char) {
$span = $dom->createElement('span', $char);
$root->insertBefore($span, $node);
}
// Drop text node (e.g. "Hi ") leaving only <span> wrapped chars
$root->removeChild($node);
}
}
// Back to string via SimpleXMLElement (so that the output is more similar to
// the source than would be the case with $root->C14N() etc), removing temporary
// root <div> element and space-only spans as well.
$withSpans = simplexml_import_dom($root)->asXML();
$withSpans = preg_replace('#^<div>|</div>$#', '', $withSpans);
$withSpans = preg_replace('#<span> </span>#', ' ', $withSpans);
echo $withSpans, PHP_EOL;
输出:
<span>H</span><span>ï</span> <br/> <span>t</span><span>h</span><span>é</span><span>r</span><span>è</span><span>.</span>
您可以使用 ([^\s>])(?!(?:[^<>]*)?>)
正则表达式获得结果。要启用 Unicode 支持,只需将其与 u
选项一起使用:
<?php
$re = "/([^\s>])(?!(?:[^<>]*)?>)/u";
$str = "Hi <br> there.";
$subst = "<span></span>";
$result = preg_replace($re, $subst, $str);
echo $result;
?>
在这里你可以找到regex explanation and demo。
参见sample program without Unicode support and here is one with Unicode support(区别在于u
选项)。