解析 PHP 中的 h2 和下一个标签

Parse the h2 and the next tag in PHP

我需要从以下字符串创建一个数组。

$body = '<h2>Heading one</h2>
         <p>Lorem ipsum dolor</p>

         <h2>Heading two</h2>
         <ul>
           <li>list item one.</li>
           <li>List item two.</li>
         </ul>

         <h2>Heading three</h2>
         <table class="table">
           <tbody>
             <tr>
               <td>Table data one</td>
               <td>Description of table data one</td>
             </tr>
             <tr>
               <td>Table data two</td>
               <td>Description of table data two</td>
             </tr>
           </tbody>
         </table>';

我可以使用 h2 标记作为第一个索引来获取 'question' 值。

$dom = new \DOMDocument();
$dom->loadHTML($body);
$xPath = new \DOMXpath($dom);

$question_answer = [];
$tags = $dom->getElementsByTagName('h2');
foreach ($tags as $tag) {
  $next_element = $xPath->query('./following-sibling::p', $tag);
  $question_answer[] = [
    'question' => $tag->nodeValue,
    'answer' =>  $next_element->item(0)->nodeValue,
  ];
}

echo '<pre>';
print_r($question_answer);
echo '</pre>';

结合@Kevin 的建议,该建议对 p 标签非常有效并产生以下输出:

Array
(
    [0] => Array
        (
            [question] => Heading one
            [answer] => Lorem ipsum dolor
        )

    [1] => Array
        (
            [question] => Heading two
            [answer] => 
        )

    [2] => Array
        (
            [question] => Heading three
            [answer] => 
        )

)

现在我只需要解决 answer 下一个标签是无序列表还是 table 的问题。对于 tables,我只对 td 标签感兴趣。

由于您在每个 h2 标签上进行迭代,因此使用相对于当前标签的 following-sibling::p

foreach ($tags as $tag) {
    $next_element = $xPath->query('./following-sibling::p', $tag);
    if ($next_element->length <= 0) continue; //skip it if p not found
    $question_answer[] = [
        'question' => $tag->nodeValue,
        'answer' => $next_element->item(0)->nodeValue,
    ];
}

我们暂时排除 table 标记,因为它可能与此用例无关。内容如下:

$body = '<h2>Heading one</h2>
       <p>Lorem ipsum dolor</p>

       <h2>Heading two</h2>
       <ul>
         <li>List item one.</li>
         <li>List item two.</li>
       </ul>';

这里是解决代码:

$dom = new \DOMDocument();
$dom->loadHTML($body);
$xPath = new \DOMXpath($dom);

$question_answer = [];
$tags = $dom->getElementsByTagName('h2');
foreach ($tags as $tag) {
  $possible_answer = $xPath->query('./following-sibling::p | ./following-sibling::ul', $tag);

  if ($possible_answer->length <= 0) {
    continue;
  }

  if ($possible_answer->item(0)->tagName === 'p') {
    $answer = $possible_answer->item(0)->nodeValue;
    $question_answer[] = [
      'question' => $tag->nodeValue,
      'answer' => $answer,
    ];
  }

  elseif ($possible_answer->item(0)->tagName === 'ul') {
    $li_dom = [];
    foreach ($possible_answer->item(0)->getElementsByTagName('li') as $li) {
      $li_dom[] = $li->nodeValue;
    }
    $li_dom = implode(" ", $li_dom);

      $question_answer[] = [
        'question' => $tag->nodeValue,
        'answer' => $li_dom,
      ];
    }
  }

echo '<pre>';
print_r($question_answer);
echo '</pre>';

这是输出:

Array
(
    [0] => Array
        (
            [question] => Heading one
            [answer] => Lorem ipsum dolor
        )

    [1] => Array
        (
            [question] => Heading two
            [answer] => List item one. List item two.
        )

)