PHP - 在字符串中的其他标签 (<p>) 之间插入标签 (<div>)
PHP - Insert tags (<div>) between an other tags (<p>) in a string
我在 php 中有一个从请求中获得的字符串(实际上,它是来自 CKEDITOR 一个所见即所得文本编辑器的字符串),我正在尝试插入标签 (div
)在其他标签 (p
) 中,也从 p
> div
之前获取数据属性。
通过这个例子会更好理解:
$String =
<p>
<div class="ST" data-start="1" data-end="5">
<span>Blabla1 </span><span>Blabla2</span>
</div>
</p>
<p>
Blabla3 Blabla4
</p>
<p>
<div class="ST" data-start="6" data-end="10">
<span>Blabla10 </span><span>Blabla20</span>
</div>
</p>
在这里,第一个和最后一个 <p>
可以!但我想做的是第二次 <p>
。
我需要将“Blabla3 Blabla4
”放入 <div class="ST">
中,并使用之前 <div>
中的 data-start
和 data-end
属性(此处 data-start = 0
和 data-end = 5
最后得到这个:
<p>
<div class="ST" data-start="1" data-end="5">
<span>Blabla1 </span><span>Blabla2</span>
</div>
</p>
<p>
<div class="ST" data-start="1" data-end="5">
Blabla3 Blabla4
</div>
</p>
<p>
<div class="ST" data-start="6" data-end="10">
<span>Blabla10 </span><span>Blabla20</span>
</div>
</p>
字符串也可以这样(以<p>
开头)这种情况下,把data-start
和data-end
放到0
:
<p>
Blabla3 Blabla4
</p>
<p>
<div data-start="0" data-end="5">
<span>Blabla1 </span><span>Blabla2</span>
</div>
</p>
<p>
<div data-start="6" data-end="10">
<span>Blabla10 </span><span>Blabla20</span>
</div>
</p>
或像这样(有 2 个或更多 <p>
)在这种情况下,将 data-start
和 data-end
都放在 1
和 5
中,例如上一个:
<p>
<div data-start="1" data-end="5">
<span>Blabla1 </span><span>Blabla2</span>
</div>
</p>
<p>
Blabla3 Blabla4
</p>
<p>
Blabla5 Blabla6
</p>
<p>
<div data-start="6" data-end="10">
<span>Blabla10 </span><span>Blabla20</span>
</div>
</p>
我不知道如何操作字符串...可能正在使用正则表达式?
感谢您的帮助!
编辑 1
我试过了:
$value
=
string
'<p><show class="st" data-time-end="1.25" data-time-moy="0.12125" data-time-start="0.28" id="1"><word class="word" data-time-end="1.25" data-time-start="0.28">TEST1 </word><word class="word" data-time-end="1.25" data-time-start="1.25"> </word></show><show class="st" data-time-end="1.25" data-time-moy="0.13857142857143" data-time-start="0.28" id="11"><word class="word" data-time-end="1.25" data-time-start="0.28">TEST2. </word><word class="word" data-time-end="1.25" data-time-start="1.25"> </word></show><show class="st" data-time-end="1.25" data-time-moy="0.194" data-time-start="0.28" id="12"><word class="word" data-time-end="1.444" data-time-start="0.28">TEST3 </word></show></p>
<p>TESTTTT</p>' (length=709)
我的代码(我使用的是 symfony2 和一个 Transformer):
public function reverseTransform($value)
{
$value_purified = strip_tags($value, '<p><show><strong><span><word><em><u>'); // Allow just tags bellow
// Create a DOM with $value
$dom = new DOMDocument();
$dom->preserveWhiteSpace = false;
$dom->formatOutput = true;
libxml_use_internal_errors(true); // autorise les balises non conforme html5
$dom->loadHTML($value_purified); // Charge le string $value dans le DOM $dom
libxml_use_internal_errors(false); // refuse les balises non conforme html5
var_dump($dom);
$pTags = $dom->getElementsByTagName('p');
var_dump($pTags);
foreach ($pTags as $pTag) {
var_dump($pTag);
$valuePTagFull = $this->DOMinnerHTML($pTag);
if (strpos($valuePTagFull,'<show') === false) {
$valuePTagFull = "<show class='st'>".$valuePTagFull."</show>";
}
var_dump($valuePTagFull);
}
$value_purified = strip_tags($value, '<show><strong><span><word><em><u>'); // Allow tags bellow (delete the <p> tag)
var_dump($value_purified);
}
private function DOMinnerHTML(DOMNode $element)
{
$innerHTML = "";
$children = $element->childNodes;
foreach ($children as $child) {
$innerHTML .= $element->ownerDocument->saveHTML($child);
}
return $innerHTML;
}
这是我的 var_dumps:
1/
var_dump($dom);
object(DOMDocument)[1000]
public 'doctype' => string '(object value omitted)' (length=22)
public 'implementation' => string '(object value omitted)' (length=22)
public 'documentElement' => string '(object value omitted)' (length=22)
public 'actualEncoding' => null
public 'encoding' => null
public 'xmlEncoding' => null
public 'standalone' => boolean true
public 'xmlStandalone' => boolean true
public 'version' => null
public 'xmlVersion' => null
public 'strictErrorChecking' => boolean true
public 'documentURI' => null
public 'config' => null
public 'formatOutput' => boolean true
public 'validateOnParse' => boolean false
public 'resolveExternals' => boolean false
public 'preserveWhiteSpace' => boolean false
public 'recover' => boolean false
public 'substituteEntities' => boolean false
public 'nodeName' => string '#document' (length=9)
public 'nodeValue' => null
public 'nodeType' => int 13
public 'parentNode' => null
public 'childNodes' => string '(object value omitted)' (length=22)
public 'firstChild' => string '(object value omitted)' (length=22)
public 'lastChild' => string '(object value omitted)' (length=22)
public 'previousSibling' => null
public 'attributes' => null
public 'ownerDocument' => null
public 'namespaceURI' => null
public 'prefix' => string '' (length=0)
public 'localName' => null
public 'baseURI' => null
public 'textContent' => string 'TEST1 TEST2. TEST3 TESTTTT' (length=32)
2/ 没关系,因为在我的字符串中我有 2 个 <p>
标签和 var_dump(pTags)
returns 我 int2
var_dump(pTags);
object(DOMNodeList)[1001]
public 'length' => int 2
3/ 在这里我们可以看到带有 var_dump($pTag);
的 2 个 <p>
标签
var_dump($pTag);
object(DOMElement)[1040]
public 'tagName' => string 'p' (length=1)
public 'schemaTypeInfo' => null
public 'nodeName' => string 'p' (length=1)
public 'nodeValue' => string 'TEST1 TEST2. TEST3 ' (length=21)
public 'nodeType' => int 1
public 'parentNode' => string '(object value omitted)' (length=22)
public 'childNodes' => string '(object value omitted)' (length=22)
public 'firstChild' => string '(object value omitted)' (length=22)
public 'lastChild' => string '(object value omitted)' (length=22)
public 'previousSibling' => null
public 'nextSibling' => string '(object value omitted)' (length=22)
public 'attributes' => string '(object value omitted)' (length=22)
public 'ownerDocument' => string '(object value omitted)' (length=22)
public 'namespaceURI' => null
public 'prefix' => string '' (length=0)
public 'localName' => string 'p' (length=1)
public 'baseURI' => null
public 'textContent' => string 'TEST1 TEST2. TEST3 ' (length=21)
object(DOMElement)[1062]
public 'tagName' => string 'p' (length=1)
public 'schemaTypeInfo' => null
public 'nodeName' => string 'p' (length=1)
public 'nodeValue' => string 'TESTTTT' (length=7)
public 'nodeType' => int 1
public 'parentNode' => string '(object value omitted)' (length=22)
public 'childNodes' => string '(object value omitted)' (length=22)
public 'firstChild' => string '(object value omitted)' (length=22)
public 'lastChild' => string '(object value omitted)' (length=22)
public 'previousSibling' => string '(object value omitted)' (length=22)
public 'attributes' => string '(object value omitted)' (length=22)
public 'ownerDocument' => string '(object value omitted)' (length=22)
public 'namespaceURI' => null
public 'prefix' => string '' (length=0)
public 'localName' => string 'p' (length=1)
public 'baseURI' => null
public 'textContent' => string 'TESTTTT' (length=7)
4/这里,如果<p>
标签没有<show>
标签,我将<show>
标签添加到<p>
标签中。它适用于我的第二个 <p>
标签,其中最初没有 <show>
标签:
var_dump($valuePTagFull);
string '<show class='st'>TESTTTT</show>' (length=31)
5/ 但是我这里有个问题。当我在代码末尾执行 var_dump($value_purified);
时,他告诉我:
string '<show class="st" data-time-end="1.25" data-time-moy="0.12125" data-time-start="0.28" id="1"><word class="word" data-time-end="1.25" data-time-start="0.28">TEST1 </word><word class="word" data-time-end="1.25" data-time-start="1.25"> </word></show><show class="st" data-time-end="1.25" data-time-moy="0.13857142857143" data-time-start="0.28" id="11"><word class="word" data-time-end="1.25" data-time-start="0.28">TEST2. </word><word class="word" data-time-end="1.25" data-time-start="1.25"> </word></show><show class="st" data-time-end="1.25" data-time-moy="0.194" data-time-start="0.28" id="12"><word class="word" data-time-end="1.444" data-time-start="0.28">TEST3 </word></show>
TESTTTT' (length=695)
为什么最后 'TESTTT' 这个词不在 <show>
标签之间??而在 var_dump($valuePTagFull);
中,<show>
标签位于 ...?
如果它是有效的 html,您可以使用 loadHTML 函数并更快地处理您的字符串:http://php.net/manual/en/domdocument.loadhtml.php
这是一种通过操作 DOMDocument 来获得所需结果的解决方案。详情见评论:
class foo
{
public function reverseTransform($value)
{
$dom = new DOMDocument();
$dom->preserveWhiteSpace = false;
$dom->formatOutput = true;
// Load contents wrapped in a temporary root node
$dom->loadXML('<root>' . $value . '</root>');
// Use an XPath query to get all P elements
$xPath = new DOMXPath($dom);
$pTags = $xPath->query('//p');
// Loop through the P elements
$dataStart = 0;
$dataEnd = 0;
foreach ($pTags as $pTag) {
// Get any DIV elements inside the P
$divs = $xPath->query('./div', $pTag);
if ($divs->length > 0) {
// This P element already has a div. Grab the
// data-start/end attributes for later
$div = $divs->item(0);
$dataStart = $div->getAttribute('data-start');
$dataEnd = $div->getAttribute('data-end');
}
else {
// Create a new DIV element and set attributes
$div = $dom->createElement('div');
$div->setAttribute('class', 'ST');
$div->setAttribute('data-start', $dataStart);
$div->setAttribute('data-end', $dataEnd);
// Move all children of P into DIV
$child = $pTag->firstChild;
while ($child) {
$nextChild = $child->nextSibling;
$div->insertBefore($child);
$child = $nextChild;
}
// Move the DIV inside the P element
$pTag->appendChild($div);
}
}
// Get HTML, removing temporary root element
$html = preg_replace(
'#.*?<root>\s*(.*)\s*</root>#s', '',
$dom->saveXML()
);
return $html;
}
}
$string = <<<EOS
<p>
Blabla1 Blabla2
</p>
<p>
<div data-start="1" data-end="5">
<span>Blabla3 </span><span>Blabla4</span>
</div>
</p>
<p>
Blabla5 Blabla6
</p>
<p>
Blabla7 Blabla8
</p>
<p>
<div data-start="6" data-end="10">
<span>Blabla9 </span><span>Blabla10</span>
</div>
</p>
<p>
Blabla11 Blabla12
</p>
EOS;
echo (new foo)->reverseTransform($string), PHP_EOL;
输出(为清楚起见缩进):
<p>
<div class="ST" data-start="0" data-end="0">
Blabla1 Blabla2
</div>
</p>
<p>
<div data-start="1" data-end="5">
<span>Blabla3 </span>
<span>Blabla4</span>
</div>
</p>
<p>
<div class="ST" data-start="1" data-end="5">
Blabla5 Blabla6
</div>
</p>
<p>
<div class="ST" data-start="1" data-end="5">
Blabla7 Blabla8
</div>
</p>
<p>
<div data-start="6" data-end="10">
<span>Blabla9 </span>
<span>Blabla10</span>
</div>
</p>
<p>
<div class="ST" data-start="6" data-end="10">
Blabla11 Blabla12
</div>
</p>
我在 php 中有一个从请求中获得的字符串(实际上,它是来自 CKEDITOR 一个所见即所得文本编辑器的字符串),我正在尝试插入标签 (div
)在其他标签 (p
) 中,也从 p
> div
之前获取数据属性。
通过这个例子会更好理解:
$String =
<p>
<div class="ST" data-start="1" data-end="5">
<span>Blabla1 </span><span>Blabla2</span>
</div>
</p>
<p>
Blabla3 Blabla4
</p>
<p>
<div class="ST" data-start="6" data-end="10">
<span>Blabla10 </span><span>Blabla20</span>
</div>
</p>
在这里,第一个和最后一个 <p>
可以!但我想做的是第二次 <p>
。
我需要将“Blabla3 Blabla4
”放入 <div class="ST">
中,并使用之前 <div>
中的 data-start
和 data-end
属性(此处 data-start = 0
和 data-end = 5
最后得到这个:
<p>
<div class="ST" data-start="1" data-end="5">
<span>Blabla1 </span><span>Blabla2</span>
</div>
</p>
<p>
<div class="ST" data-start="1" data-end="5">
Blabla3 Blabla4
</div>
</p>
<p>
<div class="ST" data-start="6" data-end="10">
<span>Blabla10 </span><span>Blabla20</span>
</div>
</p>
字符串也可以这样(以<p>
开头)这种情况下,把data-start
和data-end
放到0
:
<p>
Blabla3 Blabla4
</p>
<p>
<div data-start="0" data-end="5">
<span>Blabla1 </span><span>Blabla2</span>
</div>
</p>
<p>
<div data-start="6" data-end="10">
<span>Blabla10 </span><span>Blabla20</span>
</div>
</p>
或像这样(有 2 个或更多 <p>
)在这种情况下,将 data-start
和 data-end
都放在 1
和 5
中,例如上一个:
<p>
<div data-start="1" data-end="5">
<span>Blabla1 </span><span>Blabla2</span>
</div>
</p>
<p>
Blabla3 Blabla4
</p>
<p>
Blabla5 Blabla6
</p>
<p>
<div data-start="6" data-end="10">
<span>Blabla10 </span><span>Blabla20</span>
</div>
</p>
我不知道如何操作字符串...可能正在使用正则表达式?
感谢您的帮助!
编辑 1
我试过了:
$value
=
string
'<p><show class="st" data-time-end="1.25" data-time-moy="0.12125" data-time-start="0.28" id="1"><word class="word" data-time-end="1.25" data-time-start="0.28">TEST1 </word><word class="word" data-time-end="1.25" data-time-start="1.25"> </word></show><show class="st" data-time-end="1.25" data-time-moy="0.13857142857143" data-time-start="0.28" id="11"><word class="word" data-time-end="1.25" data-time-start="0.28">TEST2. </word><word class="word" data-time-end="1.25" data-time-start="1.25"> </word></show><show class="st" data-time-end="1.25" data-time-moy="0.194" data-time-start="0.28" id="12"><word class="word" data-time-end="1.444" data-time-start="0.28">TEST3 </word></show></p>
<p>TESTTTT</p>' (length=709)
我的代码(我使用的是 symfony2 和一个 Transformer):
public function reverseTransform($value)
{
$value_purified = strip_tags($value, '<p><show><strong><span><word><em><u>'); // Allow just tags bellow
// Create a DOM with $value
$dom = new DOMDocument();
$dom->preserveWhiteSpace = false;
$dom->formatOutput = true;
libxml_use_internal_errors(true); // autorise les balises non conforme html5
$dom->loadHTML($value_purified); // Charge le string $value dans le DOM $dom
libxml_use_internal_errors(false); // refuse les balises non conforme html5
var_dump($dom);
$pTags = $dom->getElementsByTagName('p');
var_dump($pTags);
foreach ($pTags as $pTag) {
var_dump($pTag);
$valuePTagFull = $this->DOMinnerHTML($pTag);
if (strpos($valuePTagFull,'<show') === false) {
$valuePTagFull = "<show class='st'>".$valuePTagFull."</show>";
}
var_dump($valuePTagFull);
}
$value_purified = strip_tags($value, '<show><strong><span><word><em><u>'); // Allow tags bellow (delete the <p> tag)
var_dump($value_purified);
}
private function DOMinnerHTML(DOMNode $element)
{
$innerHTML = "";
$children = $element->childNodes;
foreach ($children as $child) {
$innerHTML .= $element->ownerDocument->saveHTML($child);
}
return $innerHTML;
}
这是我的 var_dumps: 1/ var_dump($dom);
object(DOMDocument)[1000]
public 'doctype' => string '(object value omitted)' (length=22)
public 'implementation' => string '(object value omitted)' (length=22)
public 'documentElement' => string '(object value omitted)' (length=22)
public 'actualEncoding' => null
public 'encoding' => null
public 'xmlEncoding' => null
public 'standalone' => boolean true
public 'xmlStandalone' => boolean true
public 'version' => null
public 'xmlVersion' => null
public 'strictErrorChecking' => boolean true
public 'documentURI' => null
public 'config' => null
public 'formatOutput' => boolean true
public 'validateOnParse' => boolean false
public 'resolveExternals' => boolean false
public 'preserveWhiteSpace' => boolean false
public 'recover' => boolean false
public 'substituteEntities' => boolean false
public 'nodeName' => string '#document' (length=9)
public 'nodeValue' => null
public 'nodeType' => int 13
public 'parentNode' => null
public 'childNodes' => string '(object value omitted)' (length=22)
public 'firstChild' => string '(object value omitted)' (length=22)
public 'lastChild' => string '(object value omitted)' (length=22)
public 'previousSibling' => null
public 'attributes' => null
public 'ownerDocument' => null
public 'namespaceURI' => null
public 'prefix' => string '' (length=0)
public 'localName' => null
public 'baseURI' => null
public 'textContent' => string 'TEST1 TEST2. TEST3 TESTTTT' (length=32)
2/ 没关系,因为在我的字符串中我有 2 个 <p>
标签和 var_dump(pTags)
returns 我 int2
var_dump(pTags);
object(DOMNodeList)[1001]
public 'length' => int 2
3/ 在这里我们可以看到带有 var_dump($pTag);
<p>
标签
var_dump($pTag);
object(DOMElement)[1040]
public 'tagName' => string 'p' (length=1)
public 'schemaTypeInfo' => null
public 'nodeName' => string 'p' (length=1)
public 'nodeValue' => string 'TEST1 TEST2. TEST3 ' (length=21)
public 'nodeType' => int 1
public 'parentNode' => string '(object value omitted)' (length=22)
public 'childNodes' => string '(object value omitted)' (length=22)
public 'firstChild' => string '(object value omitted)' (length=22)
public 'lastChild' => string '(object value omitted)' (length=22)
public 'previousSibling' => null
public 'nextSibling' => string '(object value omitted)' (length=22)
public 'attributes' => string '(object value omitted)' (length=22)
public 'ownerDocument' => string '(object value omitted)' (length=22)
public 'namespaceURI' => null
public 'prefix' => string '' (length=0)
public 'localName' => string 'p' (length=1)
public 'baseURI' => null
public 'textContent' => string 'TEST1 TEST2. TEST3 ' (length=21)
object(DOMElement)[1062]
public 'tagName' => string 'p' (length=1)
public 'schemaTypeInfo' => null
public 'nodeName' => string 'p' (length=1)
public 'nodeValue' => string 'TESTTTT' (length=7)
public 'nodeType' => int 1
public 'parentNode' => string '(object value omitted)' (length=22)
public 'childNodes' => string '(object value omitted)' (length=22)
public 'firstChild' => string '(object value omitted)' (length=22)
public 'lastChild' => string '(object value omitted)' (length=22)
public 'previousSibling' => string '(object value omitted)' (length=22)
public 'attributes' => string '(object value omitted)' (length=22)
public 'ownerDocument' => string '(object value omitted)' (length=22)
public 'namespaceURI' => null
public 'prefix' => string '' (length=0)
public 'localName' => string 'p' (length=1)
public 'baseURI' => null
public 'textContent' => string 'TESTTTT' (length=7)
4/这里,如果<p>
标签没有<show>
标签,我将<show>
标签添加到<p>
标签中。它适用于我的第二个 <p>
标签,其中最初没有 <show>
标签:
var_dump($valuePTagFull);
string '<show class='st'>TESTTTT</show>' (length=31)
5/ 但是我这里有个问题。当我在代码末尾执行 var_dump($value_purified);
时,他告诉我:
string '<show class="st" data-time-end="1.25" data-time-moy="0.12125" data-time-start="0.28" id="1"><word class="word" data-time-end="1.25" data-time-start="0.28">TEST1 </word><word class="word" data-time-end="1.25" data-time-start="1.25"> </word></show><show class="st" data-time-end="1.25" data-time-moy="0.13857142857143" data-time-start="0.28" id="11"><word class="word" data-time-end="1.25" data-time-start="0.28">TEST2. </word><word class="word" data-time-end="1.25" data-time-start="1.25"> </word></show><show class="st" data-time-end="1.25" data-time-moy="0.194" data-time-start="0.28" id="12"><word class="word" data-time-end="1.444" data-time-start="0.28">TEST3 </word></show>
TESTTTT' (length=695)
为什么最后 'TESTTT' 这个词不在 <show>
标签之间??而在 var_dump($valuePTagFull);
中,<show>
标签位于 ...?
如果它是有效的 html,您可以使用 loadHTML 函数并更快地处理您的字符串:http://php.net/manual/en/domdocument.loadhtml.php
这是一种通过操作 DOMDocument 来获得所需结果的解决方案。详情见评论:
class foo
{
public function reverseTransform($value)
{
$dom = new DOMDocument();
$dom->preserveWhiteSpace = false;
$dom->formatOutput = true;
// Load contents wrapped in a temporary root node
$dom->loadXML('<root>' . $value . '</root>');
// Use an XPath query to get all P elements
$xPath = new DOMXPath($dom);
$pTags = $xPath->query('//p');
// Loop through the P elements
$dataStart = 0;
$dataEnd = 0;
foreach ($pTags as $pTag) {
// Get any DIV elements inside the P
$divs = $xPath->query('./div', $pTag);
if ($divs->length > 0) {
// This P element already has a div. Grab the
// data-start/end attributes for later
$div = $divs->item(0);
$dataStart = $div->getAttribute('data-start');
$dataEnd = $div->getAttribute('data-end');
}
else {
// Create a new DIV element and set attributes
$div = $dom->createElement('div');
$div->setAttribute('class', 'ST');
$div->setAttribute('data-start', $dataStart);
$div->setAttribute('data-end', $dataEnd);
// Move all children of P into DIV
$child = $pTag->firstChild;
while ($child) {
$nextChild = $child->nextSibling;
$div->insertBefore($child);
$child = $nextChild;
}
// Move the DIV inside the P element
$pTag->appendChild($div);
}
}
// Get HTML, removing temporary root element
$html = preg_replace(
'#.*?<root>\s*(.*)\s*</root>#s', '',
$dom->saveXML()
);
return $html;
}
}
$string = <<<EOS
<p>
Blabla1 Blabla2
</p>
<p>
<div data-start="1" data-end="5">
<span>Blabla3 </span><span>Blabla4</span>
</div>
</p>
<p>
Blabla5 Blabla6
</p>
<p>
Blabla7 Blabla8
</p>
<p>
<div data-start="6" data-end="10">
<span>Blabla9 </span><span>Blabla10</span>
</div>
</p>
<p>
Blabla11 Blabla12
</p>
EOS;
echo (new foo)->reverseTransform($string), PHP_EOL;
输出(为清楚起见缩进):
<p>
<div class="ST" data-start="0" data-end="0">
Blabla1 Blabla2
</div>
</p>
<p>
<div data-start="1" data-end="5">
<span>Blabla3 </span>
<span>Blabla4</span>
</div>
</p>
<p>
<div class="ST" data-start="1" data-end="5">
Blabla5 Blabla6
</div>
</p>
<p>
<div class="ST" data-start="1" data-end="5">
Blabla7 Blabla8
</div>
</p>
<p>
<div data-start="6" data-end="10">
<span>Blabla9 </span>
<span>Blabla10</span>
</div>
</p>
<p>
<div class="ST" data-start="6" data-end="10">
Blabla11 Blabla12
</div>
</p>