PHP DOM 如何从 UL 获取项目和子项目

PHP DOM How to get items and sub-items from UL

我正在尝试从以下菜单中获取所有带有锚标记的项目和子项目:

<nav class="header-nav" id="headerLara">
 <div class="menu-hauptmenu-container">
  <ul id="head_nav_ul" class="menu">
   <li class="menu-item menu-item-type-custom menu-item-object-custom menu-item-has-children menu-item-4">
    <a>First Menu</a>
    <ul class="sub-menu">
     <li class="menu-item menu-item-type-post_type menu-item-object-page menu-item-14002">
      <a href="http://example.com/fm1">F menu 1</a>
     </li>
     <li class="menu-item menu-item-type-post_type menu-item-object-post menu-item-12718">
      <a href="http://example.com/fm2">F menu 2</a>
     </li>
    </ul>
   </li>
   <li class="menu-item menu-item-type-custom menu-item-object-custom menu-item-has-children menu-item-6">
    <a>Second Menu</a>
    <ul class="sub-menu">
     <li class="menu-item menu-item-type-post_type menu-item-object-page menu-item-1257">
      <a href="http://example.com/sm1">S menu 1</a>
     </li>
     <li class="menu-item menu-item-type-post_type menu-item-object-page menu-item-5420">
      <a href="http://example.com/sm2">S menu 2</a>
     </li>
    </ul>
   </li>
   <li class="menu-item menu-item-type-custom menu-item-object-custom menu-item-12821">
    <a href="http://example.com/m3">Third Menu</a>
   </li>
  </ul>
 </div>
</nav>

现在我想要像这样的输出:

<nav class="header-nav" id="headerLara">
 <div class="menu-hauptmenu-container">
  <ul>
   <li>
    <a class="has-child">First Menu</a>
    <ul>
     <li>
      <a href="http://example.com/fm1">F menu 1</a>
     </li>
     <li>
      <a href="http://example.com/fm2">F menu 2</a>
     </li>
    </ul>
   </li>
   <li>
    <a class="has-child">Second Menu</a>
    <ul>
     <li>
      <a href="http://example.com/sm1">S menu 1</a>
     </li>
     <li>
      <a href="http://example.com/sm2">S menu 2</a>
     </li>
    </ul>
   </li>
   <li>
    <a href="http://example.com/m3">Third Menu</a>
   </li>
  </ul>
 </div>
</nav>

我做了一些研发并尝试使用以下 PHP 代码:

    <?php
$doc = new DomDocument;
$doc->validateOnParse = true;
$doc->loadHtml(file_get_contents('http://example.com/blabla.php'));
$header = $doc->getElementById('headerLara');

$mainUls = $header->getElementsByTagName('ul');
foreach ($mainUls as $mainUl) {
    echo '<ul>';
    $mainLis = $mainUl->getElementsByTagName('li');
    foreach ($mainLis as $mainLi) {
    echo '<li>';
    $mainAnc = $mainLi->getElementsByTagName('a');
    $href = $mainAnc->item(0)->getAttribute('href');
    echo '<a class="has-child" href="'.$href.'">'.$mainAnc->item(0)->nodeValue.'</a>';   
    $secUls = $mainLi->getElementsByTagName('ul');
    if($secUls->length < 2){
        foreach ($secUls as $secUl) {
            echo '<ul>';
            $secLis = $secUl->getElementsByTagName('li');
            foreach ($secLis as $secLi) {
                echo '<li>';
                $secAnc = $mainLi->getElementsByTagName('a');
                $shref = $secAnc->item(0)->getAttribute('href');
                echo '<a href="'.$shref.'">'.$secAnc->item(0)->nodeValue.'</a>';  
                echo '</li>';
            }
            echo '</ul>';
        }
    }
    echo '</li>';
    }
    echo '</ul>';
}
?> 

但这对我不起作用,return 输出如下:

<ul>
 <li>
  <a class="has-child" href="">First Menu</a>
  <ul>
   <li>
    <a href="">First Menu</a>
   </li>
   <li>
    <a href="">First Menu</a>
   </li>
  </ul>
 </li>
 <li>
  <a class="has-child" href="http://example.com/fm1">F menu 1</a>
 </li>
 <li>
  <a class="has-child" href="http://example.com/fm2">F menu 2</a>
 </li>
 <li>
  <a class="has-child" href="">Second Menu</a>
  <ul>
   <li>
    <a href="">Second Menu</a>
   </li>
   <li>
    <a href="">Second Menu</a>
   </li>
  </ul>
 </li>
 <li>
  <a class="has-child" href="http://example.com/sm1">S menu 1</a>
 </li>
 <li>
  <a class="has-child" href="http://example.com/sm2">S menu 2</a>
 </li>
</ul>

我检查了很多看起来与我的问题相似的链接,但没有发现任何帮助。

如何获得正确的输出,在此先感谢。

有一些小错误(从错误的节点拾取),但有两个主要问题。

首先是getElementsByTagName()选择all个带有该标签名的子元素,不限于直接子节点,所以每次都会多出标签超出您的预期。在此代码中,它使用 XPath,因为 DOMDocument 没有方便的方法来执行 只是称为 的直接子节点,因此 XPath 仅使用上下文节点作为你的起点和类似 a 的东西只说 <a> 标签,它们是上下文节点的直接后代。

另一个(主要)是您正在使用 echo 语句构建输出。这可能有效,但也容易出现拼写错误、无效结构等。此代码使用 DOM API 调用来创建文档。

$doc = new DomDocument;
$doc->validateOnParse = true;
$doc->loadHtml($html);
$xp = new DOMXPath($doc);

$header = $doc->getElementById('headerLara');
$mainUls = $xp->query('div/ul', $header);
foreach ($mainUls as $mainUl) {
    $mainULE = $doc->createElement("ul");
    $mainLis = $xp->query('li', $mainUl);
    foreach ($mainLis as $mainLi) {
        $li = $doc->createElement("li");
        $mainAnc = $xp->query('a', $mainLi)[0];

        $href = $mainAnc->getAttribute('href');
        $a = $doc->createElement("a", htmlspecialchars($mainAnc->nodeValue));
        $href = $mainAnc->getAttribute('href');
        if ( !empty($href) )    {
            $a->setAttribute("href", $href);
        }
        $li->appendChild($a);
        $secUls = $xp->query('ul', $mainLi);
        if($secUls->length < 2){
            foreach ($secUls as $secUl) {
                $a->setAttribute("class", "has-child");
                $secULE = $doc->createElement("ul");
                $secLis = $xp->query('li', $secUl);
                foreach ($secLis as $secLi) {
                    $secLIE = $doc->createElement("li");
                    $secAnc = $xp->query('a', $secLi);
                    $shref = $secAnc[0]->getAttribute('href');
                    $secA = $doc->createElement("a", htmlspecialchars($secAnc[0]->nodeValue));
                    $secA->setAttribute("href", $shref);
                    $secLIE->appendChild($secA);
                    $secULE->appendChild($secLIE);
                }
                $li->appendChild($secULE);
            }
        }
        $mainULE->appendChild($li);
    }
    echo PHP_EOL.PHP_EOL.">>>>".$doc->saveHTML($mainULE);
    // Next line replaces existing HTML
    //$mainUl->parentNode->replaceChild($mainULE,$mainUl);
}