PHP DOM 解析器移动结束 Div 标记
PHP DOM Parser Moving Closing Div Tag
这是我的代码:
$myHtml = '
<div class="div-class">
<p>text</p>
<p><a href="#">text</a></p>
</div>
<ul class="some-class">
<li><a href="#" target="_blank" title="something something"><img src="" alt=""></a>
</li>
<li><a href="" target="_blank" title=""><img src="" alt=""></a>
</li>
<li><a href="" target="_blank" title=""><img src=""></a>
</li>
</ul>
';
$doc = new \DOMDocument();
$doc->loadHTML($myHtml, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$xpath = new \DOMXPath($doc);
$anchors = $xpath->query("//a[@title='something something']");
$list = $xpath->query("//ul[@class='some-class']")[0];
foreach ($anchors as $a) {
$list->removeChild($a->parentNode);
}
var_dump($doc->saveHTML());
本质上,我试图删除一个包含标题为 'something something' 的锚标记的列表项。但是,当我在应用更改后保存 html 时,列表会移动到 div 标记内。为什么会这样?谢谢
loadHTML()
尝试更正句法,它不喜欢 ul
元素没有父元素,因此将它移到 div
中。如果将其全部包裹在 body
标记周围,它将正常工作。
loadHTML()
实际上应该在必要时自动为您进行包装,但是您设置了 LIBXML_HTML_NOIMPLIED
标志,这会禁用它。
<?php
$myHtml = '
<html>
<body>
<div class="div-class">
<p>text</p>
<p><a href="#">text</a></p>
</div>
<ul class="some-class">
<li><a href="#" target="_blank" title="something something"><img src="" alt=""></a>
</li>
<li><a href="" target="_blank" title=""><img src="" alt=""></a>
</li>
<li><a href="" target="_blank" title=""><img src=""></a>
</li>
</ul>
</body>
</html>
';
$doc = new \DOMDocument();
$doc->loadHTML($myHtml, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$xpath = new \DOMXPath($doc);
$anchors = $xpath->query("//a[@title='something something']");
$list = $xpath->query("//ul[@class='some-class']")[0];
foreach ($anchors as $a) {
$list->removeChild($a->parentNode);
}
var_dump($doc->saveHTML());
或者,没有 LIBXML_HTML_NOIMPLIED
标志:
<?php
$myHtml = '
<div class="div-class">
<p>text</p>
<p><a href="#">text</a></p>
</div>
<ul class="some-class">
<li><a href="#" target="_blank" title="something something"><img src="" alt=""></a>
</li>
<li><a href="" target="_blank" title=""><img src="" alt=""></a>
</li>
<li><a href="" target="_blank" title=""><img src=""></a>
</li>
</ul>
';
$doc = new \DOMDocument();
$doc->loadHTML($myHtml, LIBXML_HTML_NODEFDTD);
var_dump (libxml_get_errors());
$xpath = new \DOMXPath($doc);
$anchors = $xpath->query("//a[@title='something something']");
$list = $xpath->query("//ul[@class='some-class']")[0];
foreach ($anchors as $a) {
$list->removeChild($a->parentNode);
}
var_dump($doc->saveHTML());
这是我的代码:
$myHtml = '
<div class="div-class">
<p>text</p>
<p><a href="#">text</a></p>
</div>
<ul class="some-class">
<li><a href="#" target="_blank" title="something something"><img src="" alt=""></a>
</li>
<li><a href="" target="_blank" title=""><img src="" alt=""></a>
</li>
<li><a href="" target="_blank" title=""><img src=""></a>
</li>
</ul>
';
$doc = new \DOMDocument();
$doc->loadHTML($myHtml, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$xpath = new \DOMXPath($doc);
$anchors = $xpath->query("//a[@title='something something']");
$list = $xpath->query("//ul[@class='some-class']")[0];
foreach ($anchors as $a) {
$list->removeChild($a->parentNode);
}
var_dump($doc->saveHTML());
本质上,我试图删除一个包含标题为 'something something' 的锚标记的列表项。但是,当我在应用更改后保存 html 时,列表会移动到 div 标记内。为什么会这样?谢谢
loadHTML()
尝试更正句法,它不喜欢 ul
元素没有父元素,因此将它移到 div
中。如果将其全部包裹在 body
标记周围,它将正常工作。
loadHTML()
实际上应该在必要时自动为您进行包装,但是您设置了 LIBXML_HTML_NOIMPLIED
标志,这会禁用它。
<?php
$myHtml = '
<html>
<body>
<div class="div-class">
<p>text</p>
<p><a href="#">text</a></p>
</div>
<ul class="some-class">
<li><a href="#" target="_blank" title="something something"><img src="" alt=""></a>
</li>
<li><a href="" target="_blank" title=""><img src="" alt=""></a>
</li>
<li><a href="" target="_blank" title=""><img src=""></a>
</li>
</ul>
</body>
</html>
';
$doc = new \DOMDocument();
$doc->loadHTML($myHtml, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$xpath = new \DOMXPath($doc);
$anchors = $xpath->query("//a[@title='something something']");
$list = $xpath->query("//ul[@class='some-class']")[0];
foreach ($anchors as $a) {
$list->removeChild($a->parentNode);
}
var_dump($doc->saveHTML());
或者,没有 LIBXML_HTML_NOIMPLIED
标志:
<?php
$myHtml = '
<div class="div-class">
<p>text</p>
<p><a href="#">text</a></p>
</div>
<ul class="some-class">
<li><a href="#" target="_blank" title="something something"><img src="" alt=""></a>
</li>
<li><a href="" target="_blank" title=""><img src="" alt=""></a>
</li>
<li><a href="" target="_blank" title=""><img src=""></a>
</li>
</ul>
';
$doc = new \DOMDocument();
$doc->loadHTML($myHtml, LIBXML_HTML_NODEFDTD);
var_dump (libxml_get_errors());
$xpath = new \DOMXPath($doc);
$anchors = $xpath->query("//a[@title='something something']");
$list = $xpath->query("//ul[@class='some-class']")[0];
foreach ($anchors as $a) {
$list->removeChild($a->parentNode);
}
var_dump($doc->saveHTML());