PHP - 在字符串中的其他标签 (<p>) 之间插入标签 (<div>)

PHP - Insert tags (<div>) between an other tags (<p>) in a string

我在 php 中有一个从请求中获得的字符串(实际上,它是来自 CKEDITOR 一个所见即所得文本编辑器的字符串),我正在尝试插入标签 (div)在其他标签 (p) 中,也从 p > div 之前获取数据属性。

通过这个例子会更好理解:

$String =
<p>
    <div class="ST" data-start="1" data-end="5">
        <span>Blabla1 </span><span>Blabla2</span>
    </div>
</p>
<p>
    Blabla3 Blabla4
</p>
<p>
    <div class="ST" data-start="6" data-end="10">
        <span>Blabla10 </span><span>Blabla20</span>
    </div>
</p>

在这里,第一个和最后一个 <p> 可以!但我想做的是第二次 <p>

我需要将“Blabla3 Blabla4”放入 <div class="ST"> 中,并使用之前 <div> 中的 data-startdata-end 属性(此处 data-start = 0data-end = 5 最后得到这个:

<p>
    <div class="ST" data-start="1" data-end="5">
        <span>Blabla1 </span><span>Blabla2</span>
    </div>
</p>
<p>
    <div class="ST" data-start="1" data-end="5">
       Blabla3 Blabla4
    </div>
</p>
<p>
    <div class="ST" data-start="6" data-end="10">
        <span>Blabla10 </span><span>Blabla20</span>
    </div>
</p>

字符串也可以这样(以<p>开头)这种情况下,把data-startdata-end放到0:

<p>
    Blabla3 Blabla4
</p>
<p>
    <div data-start="0" data-end="5">
        <span>Blabla1 </span><span>Blabla2</span>
    </div>
</p>
<p>
    <div data-start="6" data-end="10">
        <span>Blabla10 </span><span>Blabla20</span>
    </div>
</p>

或像这样(有 2 个或更多 <p>)在这种情况下,将 data-startdata-end 都放在 15 中,例如上一个:

<p>
    <div data-start="1" data-end="5">
        <span>Blabla1 </span><span>Blabla2</span>
    </div>
</p>
<p>
    Blabla3 Blabla4
</p>
<p>
    Blabla5 Blabla6
</p>
<p>
    <div data-start="6" data-end="10">
        <span>Blabla10 </span><span>Blabla20</span>
    </div>
</p>

我不知道如何操作字符串...可能正在使用正则表达式?

感谢您的帮助!

编辑 1

我试过了:

$value =

string 
'<p><show class="st" data-time-end="1.25" data-time-moy="0.12125" data-time-start="0.28" id="1"><word class="word" data-time-end="1.25" data-time-start="0.28">TEST1&nbsp; </word><word class="word" data-time-end="1.25" data-time-start="1.25"> </word></show><show class="st" data-time-end="1.25" data-time-moy="0.13857142857143" data-time-start="0.28" id="11"><word class="word" data-time-end="1.25" data-time-start="0.28">TEST2. </word><word class="word" data-time-end="1.25" data-time-start="1.25"> </word></show><show class="st" data-time-end="1.25" data-time-moy="0.194" data-time-start="0.28" id="12"><word class="word" data-time-end="1.444" data-time-start="0.28">TEST3 </word></show></p>

    <p>TESTTTT</p>' (length=709)

我的代码(我使用的是 symfony2 和一个 Transformer):

public function reverseTransform($value)
{
        $value_purified = strip_tags($value, '<p><show><strong><span><word><em><u>'); // Allow just tags bellow

        // Create a DOM with $value
        $dom = new DOMDocument();
        $dom->preserveWhiteSpace = false;
        $dom->formatOutput = true;
        libxml_use_internal_errors(true); // autorise les balises non conforme html5
        $dom->loadHTML($value_purified); // Charge le string $value dans le DOM $dom
        libxml_use_internal_errors(false); // refuse les balises non conforme html5

        var_dump($dom);

        $pTags = $dom->getElementsByTagName('p');
        var_dump($pTags); 

        foreach ($pTags as $pTag) {
            var_dump($pTag);
            $valuePTagFull = $this->DOMinnerHTML($pTag);
            if (strpos($valuePTagFull,'<show') === false) {
                $valuePTagFull = "<show class='st'>".$valuePTagFull."</show>";
            } 
            var_dump($valuePTagFull);
        }

        $value_purified = strip_tags($value, '<show><strong><span><word><em><u>'); // Allow tags bellow (delete the <p> tag)
        var_dump($value_purified);
}

private function DOMinnerHTML(DOMNode $element)
{
    $innerHTML = "";
    $children = $element->childNodes;
    foreach ($children as $child) {
        $innerHTML .= $element->ownerDocument->saveHTML($child);
    }
    return $innerHTML;
}

这是我的 var_dumps: 1/ var_dump($dom);

object(DOMDocument)[1000]
  public 'doctype' => string '(object value omitted)' (length=22)
  public 'implementation' => string '(object value omitted)' (length=22)
  public 'documentElement' => string '(object value omitted)' (length=22)
  public 'actualEncoding' => null
  public 'encoding' => null
  public 'xmlEncoding' => null
  public 'standalone' => boolean true
  public 'xmlStandalone' => boolean true
  public 'version' => null
  public 'xmlVersion' => null
  public 'strictErrorChecking' => boolean true
  public 'documentURI' => null
  public 'config' => null
  public 'formatOutput' => boolean true
  public 'validateOnParse' => boolean false
  public 'resolveExternals' => boolean false
  public 'preserveWhiteSpace' => boolean false
  public 'recover' => boolean false
  public 'substituteEntities' => boolean false
  public 'nodeName' => string '#document' (length=9)
  public 'nodeValue' => null
  public 'nodeType' => int 13
  public 'parentNode' => null
  public 'childNodes' => string '(object value omitted)' (length=22)
  public 'firstChild' => string '(object value omitted)' (length=22)
  public 'lastChild' => string '(object value omitted)' (length=22)
  public 'previousSibling' => null
  public 'attributes' => null
  public 'ownerDocument' => null
  public 'namespaceURI' => null
  public 'prefix' => string '' (length=0)
  public 'localName' => null
  public 'baseURI' => null
  public 'textContent' => string 'TEST1  TEST2. TEST3 TESTTTT' (length=32)

2/ 没关系,因为在我的字符串中我有 2 个 <p> 标签和 var_dump(pTags) returns 我 int2

var_dump(pTags);
object(DOMNodeList)[1001]
     public 'length' => int 2

3/ 在这里我们可以看到带有 var_dump($pTag);

的 2 个 <p> 标签
var_dump($pTag);
object(DOMElement)[1040]
  public 'tagName' => string 'p' (length=1)
  public 'schemaTypeInfo' => null
  public 'nodeName' => string 'p' (length=1)
  public 'nodeValue' => string 'TEST1  TEST2. TEST3 ' (length=21)
  public 'nodeType' => int 1
  public 'parentNode' => string '(object value omitted)' (length=22)
  public 'childNodes' => string '(object value omitted)' (length=22)
  public 'firstChild' => string '(object value omitted)' (length=22)
  public 'lastChild' => string '(object value omitted)' (length=22)
  public 'previousSibling' => null
  public 'nextSibling' => string '(object value omitted)' (length=22)
  public 'attributes' => string '(object value omitted)' (length=22)
  public 'ownerDocument' => string '(object value omitted)' (length=22)
  public 'namespaceURI' => null
  public 'prefix' => string '' (length=0)
  public 'localName' => string 'p' (length=1)
  public 'baseURI' => null
  public 'textContent' => string 'TEST1  TEST2. TEST3 ' (length=21)



object(DOMElement)[1062]
      public 'tagName' => string 'p' (length=1)
      public 'schemaTypeInfo' => null
      public 'nodeName' => string 'p' (length=1)
      public 'nodeValue' => string 'TESTTTT' (length=7)
      public 'nodeType' => int 1
      public 'parentNode' => string '(object value omitted)' (length=22)
      public 'childNodes' => string '(object value omitted)' (length=22)
      public 'firstChild' => string '(object value omitted)' (length=22)
      public 'lastChild' => string '(object value omitted)' (length=22)
      public 'previousSibling' => string '(object value omitted)' (length=22)
      public 'attributes' => string '(object value omitted)' (length=22)
      public 'ownerDocument' => string '(object value omitted)' (length=22)
      public 'namespaceURI' => null
      public 'prefix' => string '' (length=0)
      public 'localName' => string 'p' (length=1)
      public 'baseURI' => null
      public 'textContent' => string 'TESTTTT' (length=7)

4/这里,如果<p>标签没有<show>标签,我将<show>标签添加到<p>标签中。它适用于我的第二个 <p> 标签,其中最初没有 <show> 标签:

var_dump($valuePTagFull);
string '<show class='st'>TESTTTT</show>' (length=31)

5/ 但是我这里有个问题。当我在代码末尾执行 var_dump($value_purified); 时,他告诉我:

string '<show class="st" data-time-end="1.25" data-time-moy="0.12125" data-time-start="0.28" id="1"><word class="word" data-time-end="1.25" data-time-start="0.28">TEST1&nbsp; </word><word class="word" data-time-end="1.25" data-time-start="1.25"> </word></show><show class="st" data-time-end="1.25" data-time-moy="0.13857142857143" data-time-start="0.28" id="11"><word class="word" data-time-end="1.25" data-time-start="0.28">TEST2. </word><word class="word" data-time-end="1.25" data-time-start="1.25"> </word></show><show class="st" data-time-end="1.25" data-time-moy="0.194" data-time-start="0.28" id="12"><word class="word" data-time-end="1.444" data-time-start="0.28">TEST3 </word></show>

TESTTTT' (length=695)

为什么最后 'TESTTT' 这个词不在 <show> 标签之间??而在 var_dump($valuePTagFull); 中,<show> 标签位于 ...?

如果它是有效的 html,您可以使用 loadHTML 函数并更快地处理您的字符串:http://php.net/manual/en/domdocument.loadhtml.php

这是一种通过操作 DOMDocument 来获得所需结果的解决方案。详情见评论:

class foo
{
    public function reverseTransform($value)
    {
        $dom = new DOMDocument();
        $dom->preserveWhiteSpace = false;
        $dom->formatOutput = true;

        // Load contents wrapped in a temporary root node
        $dom->loadXML('<root>' . $value . '</root>');

        // Use an XPath query to get all P elements
        $xPath = new DOMXPath($dom);
        $pTags = $xPath->query('//p');

        // Loop through the P elements
        $dataStart = 0;
        $dataEnd   = 0;

        foreach ($pTags as $pTag) {
            // Get any DIV elements inside the P
            $divs = $xPath->query('./div', $pTag);

            if ($divs->length > 0) {
                // This P element already has a div. Grab the
                // data-start/end attributes for later
                $div = $divs->item(0);
                $dataStart = $div->getAttribute('data-start');
                $dataEnd   = $div->getAttribute('data-end');
            }
            else {
                // Create a new DIV element and set attributes
                $div = $dom->createElement('div');
                $div->setAttribute('class',      'ST');
                $div->setAttribute('data-start', $dataStart);
                $div->setAttribute('data-end',   $dataEnd);

                // Move all children of P into DIV
                $child = $pTag->firstChild;
                while ($child) {
                    $nextChild = $child->nextSibling;
                    $div->insertBefore($child);
                    $child = $nextChild;
                }

                // Move the DIV inside the P element
                $pTag->appendChild($div);
            }
        }
        // Get HTML, removing temporary root element
        $html = preg_replace(
            '#.*?<root>\s*(.*)\s*</root>#s', '',
            $dom->saveXML()
        );
        return $html;
    }
}

$string = <<<EOS
<p>
    Blabla1 Blabla2
</p>
<p>
    <div data-start="1" data-end="5">
        <span>Blabla3 </span><span>Blabla4</span>
    </div>
</p>
<p>
    Blabla5 Blabla6
</p>
<p>
    Blabla7 Blabla8
</p>
<p>
    <div data-start="6" data-end="10">
        <span>Blabla9 </span><span>Blabla10</span>
    </div>
</p>
<p>
    Blabla11 Blabla12
</p>
EOS;

echo (new foo)->reverseTransform($string), PHP_EOL;

输出(为清楚起见缩进):

<p>
    <div class="ST" data-start="0" data-end="0">
        Blabla1 Blabla2
    </div>
</p>
<p>
    <div data-start="1" data-end="5">
        <span>Blabla3 </span>
        <span>Blabla4</span>
    </div>
</p>
<p>
    <div class="ST" data-start="1" data-end="5">
        Blabla5 Blabla6
    </div>
</p>
<p>
    <div class="ST" data-start="1" data-end="5">
        Blabla7 Blabla8
    </div>
</p>
<p>
    <div data-start="6" data-end="10">
        <span>Blabla9 </span>
        <span>Blabla10</span>
    </div>
</p>
<p>
    <div class="ST" data-start="6" data-end="10">
        Blabla11 Blabla12
    </div>
</p>