Simple HTML Dom 是否支持 :has like 解析?
Does Simple HTML Dom support :has like parsing?
我必须像这样解析 HTML 结构:
<div class='container>
<div class='inner-div'>
<span class='text'>...</span>
<div class='author'>
<span data-author='Alpha'>...</span>
</div>
<div class='summary'>
<span data-summary='Exclusive'>Text 1</span>
</div>
</div>
<div class='inner-div'>
<span class='text'>...</span>
<div class='author'>
<span data-author='Beta'>...</span>
</div>
<div class='summary'>
<span data-summary='Non-Exclusive'>Text 2</span>
</div>
</div>
<div class='inner-div'>
<span class='text'>...</span>
<div class='author'>
<span data-author='Gamma'>...</span>
</div>
<div class='summary'>
<span data-summary='Exclusive'>Text 3</span>
</div>
</div>
<div class='inner-div'>
<span class='text'>...</span>
<div class='author'>
<span data-author='Delta'>...</span>
</div>
<div class='summary'>
<span data-summary='Non-Exclusive'>Text 4</span>
</div>
</div>
...
<div class='inner-div'>
<span class='text'>...</span>
<div class='author'>
<span data-author='Zeta'>...</span>
</div>
<div class='summary'>
<span data-summary='Exclusive'>Text 5</span>
</div>
</div>
</div>
我希望获得作者不是 'Alpha' 的第一个 'Exclusive' 摘要。在上面的示例中,它将是 'Text 3'。我如何使用 Simple HTML DOM 甚至 XML DOM?
来解析它
附录:我正在寻找使用 PHP 简单 HTML Dom 库解析 HTML。我知道如何在 jQuery 中解析它,但是简单 HTML Dom 库似乎不支持 (:has).
的任何等效项
最后还是自己解决了。对于寻找解决方案的任何人,这就是我所做的。
$node = $html->find("span[data-summary='Exclusive']",0);
if ($node->parent()->parent()->find('div.author span',0)['data-author'] == 'Alpha') {
$node = $html->find("span[data-summary='Exclusive']",1);
}
return $node->innertext;
不,但是这里有一个 simple html dom replacement that 可以(顺便说一句,你想要 :not
而不是 :has
):
include_once('advanced_html_dom.php');
$html = str_get_html($str);
echo $html->find('.author:not(> [data-author=Alpha]) ~ .summary > [data-summary=Exclusive]', 0);
我必须像这样解析 HTML 结构:
<div class='container>
<div class='inner-div'>
<span class='text'>...</span>
<div class='author'>
<span data-author='Alpha'>...</span>
</div>
<div class='summary'>
<span data-summary='Exclusive'>Text 1</span>
</div>
</div>
<div class='inner-div'>
<span class='text'>...</span>
<div class='author'>
<span data-author='Beta'>...</span>
</div>
<div class='summary'>
<span data-summary='Non-Exclusive'>Text 2</span>
</div>
</div>
<div class='inner-div'>
<span class='text'>...</span>
<div class='author'>
<span data-author='Gamma'>...</span>
</div>
<div class='summary'>
<span data-summary='Exclusive'>Text 3</span>
</div>
</div>
<div class='inner-div'>
<span class='text'>...</span>
<div class='author'>
<span data-author='Delta'>...</span>
</div>
<div class='summary'>
<span data-summary='Non-Exclusive'>Text 4</span>
</div>
</div>
...
<div class='inner-div'>
<span class='text'>...</span>
<div class='author'>
<span data-author='Zeta'>...</span>
</div>
<div class='summary'>
<span data-summary='Exclusive'>Text 5</span>
</div>
</div>
</div>
我希望获得作者不是 'Alpha' 的第一个 'Exclusive' 摘要。在上面的示例中,它将是 'Text 3'。我如何使用 Simple HTML DOM 甚至 XML DOM?
来解析它附录:我正在寻找使用 PHP 简单 HTML Dom 库解析 HTML。我知道如何在 jQuery 中解析它,但是简单 HTML Dom 库似乎不支持 (:has).
的任何等效项最后还是自己解决了。对于寻找解决方案的任何人,这就是我所做的。
$node = $html->find("span[data-summary='Exclusive']",0);
if ($node->parent()->parent()->find('div.author span',0)['data-author'] == 'Alpha') {
$node = $html->find("span[data-summary='Exclusive']",1);
}
return $node->innertext;
不,但是这里有一个 simple html dom replacement that 可以(顺便说一句,你想要 :not
而不是 :has
):
include_once('advanced_html_dom.php');
$html = str_get_html($str);
echo $html->find('.author:not(> [data-author=Alpha]) ~ .summary > [data-summary=Exclusive]', 0);