在到达字符串中的第一个 p 标签之前删除每个 li 标签

Question

假设我有一个包含一些 HTML 的字符串。我想在到达第一个 p 标签之前删除每个 li 标签。

如何实现这样的目标？

示例字符串：

$str = "<img src='something.png'/>some_text_here<li>needs_to_be_removed</li>
        <li>also_needs_to_be_removed</li>some_other_text<p>finally</p>more_text_here
        <li>this_should_not_be_removed</li>";`

需要删除前两个 li 标签。

Answer 1

你可以用PHP的DOMdocument使用下面的遍历函数

$doc = new DOMDocument();
$doc->loadHTML($str);
$foundp = false;
showDOMNode($doc);
//now $doc contains the string you want
$newstr = $doc->saveHTML();


function showDOMNode(DOMNode &$domNode) {
    global $foundp;
    foreach ($domNode->childNodes as $node)
    {
        if ($node->nodeName == "li" && $foundp==false){
            //delete this node
            $domNode->removeChild($node);
        }
        else if ($node->nodeName == "p"){
            //stop here
            $foundp = true;
            return;
        }
        else if($node->hasChildNodes() && $foundp==false) {
            //recursively
            showDOMNode($node);
        }
    }    
}

Answer 2

我建议使用 php praser 库会更好更快。我个人在我的项目中使用这个 https://github.com/paquettg/php-html-parser。它提供类似

的 API

   $child->nextSibling()
   $content->innerHtml,
   $content->firstChild()

还有更多可以派上用场的。

你可以为所有元素做一个 foreach 循环，在它们里面注册 "li" 标签，如果第三次出现，你找到一个 "p" 标签，你可以删除 $child->以前的兄弟姐妹（）；

Answer 3

这是您需要的。简单有效：

$mystring = "mystringwith<li>toberemovedstring</li><li>againremove</li><p>do not remove me</p>";//the string you provide
$findme   = '<li>';//the string you want to search in $mystring
$findpee = '<p>';//haha pee also where to end it
$pos = strpos($mystring, $findme);//first position of <li>
$pospee = strpos($mystring, $findpee);// then position of pee.. get it :)
//Then we remove it
$result=substr_replace ( $mystring ,"" , $pos, ($pospee-$pos));

    echo $result;

编辑：PHP 沙盒

http://sandbox.onlinephpfunctions.com/code/e534259e2312682a04b64c6e3aae1521422aacd2

你也可以在这里查看结果

Answer 4

使用 XPath：

$str = "<img src='something.png'/>some_text_here<li>needs_to_be_removed</li>
        <li>also_needs_to_be_removed</li>some_other_text<p>finally</p>more_text_here
        <li>this_should_not_be_removed</li>";

libxml_use_internal_errors(true);
$dom = new DOMDocument;
$dom->loadHTML('<div>' . $str .'</div>', LIBXML_HTML_NODEFDTD | LIBXML_HTML_NOIMPLIED);
             // ^---------------^----- add a root element
$xp = new DOMXPath($dom);

$lis = $xp->query('//p[1]/preceding-sibling::li');

foreach ($lis as $li) {
    $li->parentNode->removeChild($li);
}

$result = '';
// add each child node of the root element to the result
foreach ($dom->getElementsByTagName('div')->item(0)->childNodes as $child) {
    $result .= $dom->saveHTML($child);
}

在到达字符串中的第一个 p 标签之前删除每个 li 标签

Removing every li tag before reaching the first p tag in string

php

regex

string

domdocument