解析 PHP 中的 h2 和下一个标签
Parse the h2 and the next tag in PHP
我需要从以下字符串创建一个数组。
$body = '<h2>Heading one</h2>
<p>Lorem ipsum dolor</p>
<h2>Heading two</h2>
<ul>
<li>list item one.</li>
<li>List item two.</li>
</ul>
<h2>Heading three</h2>
<table class="table">
<tbody>
<tr>
<td>Table data one</td>
<td>Description of table data one</td>
</tr>
<tr>
<td>Table data two</td>
<td>Description of table data two</td>
</tr>
</tbody>
</table>';
我可以使用 h2
标记作为第一个索引来获取 'question'
值。
$dom = new \DOMDocument();
$dom->loadHTML($body);
$xPath = new \DOMXpath($dom);
$question_answer = [];
$tags = $dom->getElementsByTagName('h2');
foreach ($tags as $tag) {
$next_element = $xPath->query('./following-sibling::p', $tag);
$question_answer[] = [
'question' => $tag->nodeValue,
'answer' => $next_element->item(0)->nodeValue,
];
}
echo '<pre>';
print_r($question_answer);
echo '</pre>';
结合@Kevin 的建议,该建议对 p 标签非常有效并产生以下输出:
Array
(
[0] => Array
(
[question] => Heading one
[answer] => Lorem ipsum dolor
)
[1] => Array
(
[question] => Heading two
[answer] =>
)
[2] => Array
(
[question] => Heading three
[answer] =>
)
)
现在我只需要解决 answer
下一个标签是无序列表还是 table 的问题。对于 tables,我只对 td 标签感兴趣。
由于您在每个 h2
标签上进行迭代,因此使用相对于当前标签的 following-sibling::p
。
foreach ($tags as $tag) {
$next_element = $xPath->query('./following-sibling::p', $tag);
if ($next_element->length <= 0) continue; //skip it if p not found
$question_answer[] = [
'question' => $tag->nodeValue,
'answer' => $next_element->item(0)->nodeValue,
];
}
我们暂时排除 table 标记,因为它可能与此用例无关。内容如下:
$body = '<h2>Heading one</h2>
<p>Lorem ipsum dolor</p>
<h2>Heading two</h2>
<ul>
<li>List item one.</li>
<li>List item two.</li>
</ul>';
这里是解决代码:
$dom = new \DOMDocument();
$dom->loadHTML($body);
$xPath = new \DOMXpath($dom);
$question_answer = [];
$tags = $dom->getElementsByTagName('h2');
foreach ($tags as $tag) {
$possible_answer = $xPath->query('./following-sibling::p | ./following-sibling::ul', $tag);
if ($possible_answer->length <= 0) {
continue;
}
if ($possible_answer->item(0)->tagName === 'p') {
$answer = $possible_answer->item(0)->nodeValue;
$question_answer[] = [
'question' => $tag->nodeValue,
'answer' => $answer,
];
}
elseif ($possible_answer->item(0)->tagName === 'ul') {
$li_dom = [];
foreach ($possible_answer->item(0)->getElementsByTagName('li') as $li) {
$li_dom[] = $li->nodeValue;
}
$li_dom = implode(" ", $li_dom);
$question_answer[] = [
'question' => $tag->nodeValue,
'answer' => $li_dom,
];
}
}
echo '<pre>';
print_r($question_answer);
echo '</pre>';
这是输出:
Array
(
[0] => Array
(
[question] => Heading one
[answer] => Lorem ipsum dolor
)
[1] => Array
(
[question] => Heading two
[answer] => List item one. List item two.
)
)
我需要从以下字符串创建一个数组。
$body = '<h2>Heading one</h2>
<p>Lorem ipsum dolor</p>
<h2>Heading two</h2>
<ul>
<li>list item one.</li>
<li>List item two.</li>
</ul>
<h2>Heading three</h2>
<table class="table">
<tbody>
<tr>
<td>Table data one</td>
<td>Description of table data one</td>
</tr>
<tr>
<td>Table data two</td>
<td>Description of table data two</td>
</tr>
</tbody>
</table>';
我可以使用 h2
标记作为第一个索引来获取 'question'
值。
$dom = new \DOMDocument();
$dom->loadHTML($body);
$xPath = new \DOMXpath($dom);
$question_answer = [];
$tags = $dom->getElementsByTagName('h2');
foreach ($tags as $tag) {
$next_element = $xPath->query('./following-sibling::p', $tag);
$question_answer[] = [
'question' => $tag->nodeValue,
'answer' => $next_element->item(0)->nodeValue,
];
}
echo '<pre>';
print_r($question_answer);
echo '</pre>';
结合@Kevin 的建议,该建议对 p 标签非常有效并产生以下输出:
Array
(
[0] => Array
(
[question] => Heading one
[answer] => Lorem ipsum dolor
)
[1] => Array
(
[question] => Heading two
[answer] =>
)
[2] => Array
(
[question] => Heading three
[answer] =>
)
)
现在我只需要解决 answer
下一个标签是无序列表还是 table 的问题。对于 tables,我只对 td 标签感兴趣。
由于您在每个 h2
标签上进行迭代,因此使用相对于当前标签的 following-sibling::p
。
foreach ($tags as $tag) {
$next_element = $xPath->query('./following-sibling::p', $tag);
if ($next_element->length <= 0) continue; //skip it if p not found
$question_answer[] = [
'question' => $tag->nodeValue,
'answer' => $next_element->item(0)->nodeValue,
];
}
我们暂时排除 table 标记,因为它可能与此用例无关。内容如下:
$body = '<h2>Heading one</h2>
<p>Lorem ipsum dolor</p>
<h2>Heading two</h2>
<ul>
<li>List item one.</li>
<li>List item two.</li>
</ul>';
这里是解决代码:
$dom = new \DOMDocument();
$dom->loadHTML($body);
$xPath = new \DOMXpath($dom);
$question_answer = [];
$tags = $dom->getElementsByTagName('h2');
foreach ($tags as $tag) {
$possible_answer = $xPath->query('./following-sibling::p | ./following-sibling::ul', $tag);
if ($possible_answer->length <= 0) {
continue;
}
if ($possible_answer->item(0)->tagName === 'p') {
$answer = $possible_answer->item(0)->nodeValue;
$question_answer[] = [
'question' => $tag->nodeValue,
'answer' => $answer,
];
}
elseif ($possible_answer->item(0)->tagName === 'ul') {
$li_dom = [];
foreach ($possible_answer->item(0)->getElementsByTagName('li') as $li) {
$li_dom[] = $li->nodeValue;
}
$li_dom = implode(" ", $li_dom);
$question_answer[] = [
'question' => $tag->nodeValue,
'answer' => $li_dom,
];
}
}
echo '<pre>';
print_r($question_answer);
echo '</pre>';
这是输出:
Array ( [0] => Array ( [question] => Heading one [answer] => Lorem ipsum dolor ) [1] => Array ( [question] => Heading two [answer] => List item one. List item two. ) )