如何 select 在 XPath 中有条件地紧跟在元素之后的 text()?
How to select the text() immediately following an element conditionally in XPath?
我有以下结构,其中子节点的顺序是随机的:
<span id="outer">
<div style="color:blue">51</div>
<span class="main">Gill</span>0
<span style="color:red">11</span>
<span></span>James
<div style="color:red">158</div>
<div class="sub">Mary</div>
</span>
我正在尝试根据条件将字符串连接在一起(在中间留下 space):
- 如果样式颜色是"blue"然后将节点值添加到字符串
- 如果class是"main"则将节点值添加到字符串
- 所有没有包含在标签中的text()都会被添加到string中,但是按照遍历所有子节点的顺序。
上述结构的示例输出应为:
51 Gill 0 James
我在PHP中写了下面的内容来遍历元素。如果冗长,可以跳过阅读这一部分。 主要关注 select text() 节点值的 $expression,如果它立即出现在元素之后:
$nodes = $xpath->query("//span[@id='outer']/*");
$str_out = "";
foreach($nodes as $node)
{
if($node->hasAttribute('class')
{
if($node->getAttribute('class')=="main")
$str_out .= $node->nodeValue . " ";
}
else if($node->hasAttribute('style')
{
$node_style = $node->getAttribute('style');
preg_match('~color:(.*)~', $node_style, $temp);
if( $temp[1] == "red" )
$str_out .= $node->nodeValue . " ";
}
// Now evaluate if the IMMEDIATELY next sibling is text()
$next_node = $xpath->query('.//following-sibling::*[1]', $node);
if($next_node->length)
{
$next_node = $next_node->item(0);
$next_node_name = $next_node->nodeName;
$next_node_value = $next_node->nodeValue;
$current_node_name = $node->nodeName;
$expression = ".//following-sibling::text()[1][preceding-sibling::".$current_node_name." and following-sibling::".$next_node_name."[contains(text(),'".$next_node_value."')]]";
$text_node = $xpath->query($expression, $node);
if($text_node->length)
{
$str_out .= $text_node->item(0)->nodeValue . " ";
}
}
}
echo $str_out;
如前所述,主要重点是捕获 text() 节点值(如果它紧接在元素之后)。我想编写一个执行以下操作的 XPATH 表达式:
1. Select元素之后的第一个text()节点
2.检查此text()节点是否在自身节点(当前节点)和紧随其后的节点之间。
例如在这个块中:
<span></span>James
<div style="color:red">158</div>
James 位于跨度和 div 节点之间。所以我们将它添加到字符串中。
但是在这个区块中:
<span style="color:red">11</span>
<span></span>James
<div style="color:red">158</div>
James 仍然会被 following-sibling[1] 语句相对于第一个 span 元素(color:red)selected
不应添加此内容。
请在 PHP 代码中查看我的 $expression,我试图在其中捕获此过程,但它不起作用。
$expression = ".//following-sibling::text()[1][preceding-sibling::".$current_node_name." and following-sibling::".$next_node_name."[contains(text(),'".$next_node_value."')]]";
您可以通过以下方式实现此目的:
<?php
$xmldoc = new DOMDocument();
$xmldoc->loadXML(<<<XML
<span id="outer">
<div style="color:blue">51</div>
<span class="main">Gill</span>0
<span style="color:red">11</span>
<span></span>James
<div style="color:red">158</div>
<div class="sub">Mary</div>
</span>
XML
);
$xpath = new Domxpath($xmldoc);
$nodes = $xpath->query("//span[@id='outer']/*");
$str_out = "";
foreach ($nodes as $node)
{
if ($node->hasAttribute('class'))
{
if ($node->getAttribute('class') == "main")
$str_out .= $node->nodeValue . " ";
}
else if ($node->hasAttribute('style'))
{
$node_style = $node->getAttribute('style');
preg_match('~color:(.*)~', $node_style, $temp);
if ($temp[1] == "blue")
$str_out .= $node->nodeValue . " ";
}
// Now evaluate if the IMMEDIATELY next sibling is text()
$next_node = $xpath->query('./following-sibling::node()[1]/self::text()[normalize-space()]', $node);
if ($next_node->length)
{
$str_out .= trim($next_node->item(0)->nodeValue) . " ";
}
}
echo $str_out;
XPath 查询:
./following-sibling::node()[1]/self::text()[normalize-space()]
说:
.
来自上下文节点
following-sibling::node()[1]
取第一个跟随的兄弟节点(无论是文本节点还是元素(甚至是注释))
self::text()[normalize-space()]
如果它是一个文本节点并且不仅仅包含空格 ,则取 "current" 节点
输出为:
51 Gill 0 James
这也将处理您可以在父元素的最后一个子元素之后有一个文本节点的场景 <span id="outer">
。
Xpath 支持轴。使用它们,您可以指定最初将匹配哪些节点。默认轴是 child
,@
是 attribute
的缩写。在这种情况下,您需要的轴是 following-sibling
和 self
.
如果您使用 span[@class = "main"]
指定标记节点,您可以将其扩展为 span[@class = "main"]/following-sibling::node()[1]
并获取以下节点。确保它是带有 span[@class = "main"]/following-sibling::node()[1]/self::text()
的文本节点
目前您正在迭代所有节点,但除了 style
属性外,您可以直接在 Xpath 中匹配值。对于样式条件,您可以使用回调 PHP:
$xml = <<<'XML'
<span id="outer">
<div style="color:blue">51</div>
<span class="main">Gill</span>0
<span style="color:red">11</span>
<span></span>James
<div style="color:red">158</div>
<div class="sub">Mary</div>
</span>
XML;
function getStyleProperty($node, $name) {
if (is_array($node)) {
$node = $node[0];
}
if ($node instanceof DOMElement) {
$pattern = sprintf(
'(\b%s:\s*([^;]*)\s*(;|$))', preg_quote($name)
);
if (preg_match($pattern, $node->getAttribute('style'), $matches)) {
return $matches[1];
}
}
return '';
}
$document = new DOMDocument();
$document->loadXml($xml);
$xpath = new DOMXpath($document);
$xpath->registerNamespace('php', 'http://php.net/xpath');
$xpath->registerPHPFunctions(['getStyleProperty']);
foreach ($xpath->evaluate('//span[@id="outer"]')as $outer) {
var_dump(
$xpath->evaluate('string(div[php:function("getStyleProperty", ., "color") = "blue"])', $outer),
$xpath->evaluate('string(span[@class = "main"])', $outer),
$xpath->evaluate('string(span[@class = "main"]/following-sibling::text()[1])', $outer),
$xpath->evaluate('string(span[not(@class or @style)]/following-sibling::node()[1]/self::text())', $outer)
);
}
输出:
string(2) "51"
string(4) "Gill"
string(10) "0
"
string(11) "James
"
我有以下结构,其中子节点的顺序是随机的:
<span id="outer">
<div style="color:blue">51</div>
<span class="main">Gill</span>0
<span style="color:red">11</span>
<span></span>James
<div style="color:red">158</div>
<div class="sub">Mary</div>
</span>
我正在尝试根据条件将字符串连接在一起(在中间留下 space):
- 如果样式颜色是"blue"然后将节点值添加到字符串
- 如果class是"main"则将节点值添加到字符串
- 所有没有包含在标签中的text()都会被添加到string中,但是按照遍历所有子节点的顺序。
上述结构的示例输出应为:
51 Gill 0 James
我在PHP中写了下面的内容来遍历元素。如果冗长,可以跳过阅读这一部分。 主要关注 select text() 节点值的 $expression,如果它立即出现在元素之后:
$nodes = $xpath->query("//span[@id='outer']/*");
$str_out = "";
foreach($nodes as $node)
{
if($node->hasAttribute('class')
{
if($node->getAttribute('class')=="main")
$str_out .= $node->nodeValue . " ";
}
else if($node->hasAttribute('style')
{
$node_style = $node->getAttribute('style');
preg_match('~color:(.*)~', $node_style, $temp);
if( $temp[1] == "red" )
$str_out .= $node->nodeValue . " ";
}
// Now evaluate if the IMMEDIATELY next sibling is text()
$next_node = $xpath->query('.//following-sibling::*[1]', $node);
if($next_node->length)
{
$next_node = $next_node->item(0);
$next_node_name = $next_node->nodeName;
$next_node_value = $next_node->nodeValue;
$current_node_name = $node->nodeName;
$expression = ".//following-sibling::text()[1][preceding-sibling::".$current_node_name." and following-sibling::".$next_node_name."[contains(text(),'".$next_node_value."')]]";
$text_node = $xpath->query($expression, $node);
if($text_node->length)
{
$str_out .= $text_node->item(0)->nodeValue . " ";
}
}
}
echo $str_out;
如前所述,主要重点是捕获 text() 节点值(如果它紧接在元素之后)。我想编写一个执行以下操作的 XPATH 表达式: 1. Select元素之后的第一个text()节点 2.检查此text()节点是否在自身节点(当前节点)和紧随其后的节点之间。
例如在这个块中:
<span></span>James
<div style="color:red">158</div>
James 位于跨度和 div 节点之间。所以我们将它添加到字符串中。
但是在这个区块中:
<span style="color:red">11</span>
<span></span>James
<div style="color:red">158</div>
James 仍然会被 following-sibling[1] 语句相对于第一个 span 元素(color:red)selected
不应添加此内容。
请在 PHP 代码中查看我的 $expression,我试图在其中捕获此过程,但它不起作用。
$expression = ".//following-sibling::text()[1][preceding-sibling::".$current_node_name." and following-sibling::".$next_node_name."[contains(text(),'".$next_node_value."')]]";
您可以通过以下方式实现此目的:
<?php
$xmldoc = new DOMDocument();
$xmldoc->loadXML(<<<XML
<span id="outer">
<div style="color:blue">51</div>
<span class="main">Gill</span>0
<span style="color:red">11</span>
<span></span>James
<div style="color:red">158</div>
<div class="sub">Mary</div>
</span>
XML
);
$xpath = new Domxpath($xmldoc);
$nodes = $xpath->query("//span[@id='outer']/*");
$str_out = "";
foreach ($nodes as $node)
{
if ($node->hasAttribute('class'))
{
if ($node->getAttribute('class') == "main")
$str_out .= $node->nodeValue . " ";
}
else if ($node->hasAttribute('style'))
{
$node_style = $node->getAttribute('style');
preg_match('~color:(.*)~', $node_style, $temp);
if ($temp[1] == "blue")
$str_out .= $node->nodeValue . " ";
}
// Now evaluate if the IMMEDIATELY next sibling is text()
$next_node = $xpath->query('./following-sibling::node()[1]/self::text()[normalize-space()]', $node);
if ($next_node->length)
{
$str_out .= trim($next_node->item(0)->nodeValue) . " ";
}
}
echo $str_out;
XPath 查询:
./following-sibling::node()[1]/self::text()[normalize-space()]
说:
.
来自上下文节点following-sibling::node()[1]
取第一个跟随的兄弟节点(无论是文本节点还是元素(甚至是注释))self::text()[normalize-space()]
如果它是一个文本节点并且不仅仅包含空格 ,则取 "current" 节点
输出为:
51 Gill 0 James
这也将处理您可以在父元素的最后一个子元素之后有一个文本节点的场景 <span id="outer">
。
Xpath 支持轴。使用它们,您可以指定最初将匹配哪些节点。默认轴是 child
,@
是 attribute
的缩写。在这种情况下,您需要的轴是 following-sibling
和 self
.
如果您使用 span[@class = "main"]
指定标记节点,您可以将其扩展为 span[@class = "main"]/following-sibling::node()[1]
并获取以下节点。确保它是带有 span[@class = "main"]/following-sibling::node()[1]/self::text()
目前您正在迭代所有节点,但除了 style
属性外,您可以直接在 Xpath 中匹配值。对于样式条件,您可以使用回调 PHP:
$xml = <<<'XML'
<span id="outer">
<div style="color:blue">51</div>
<span class="main">Gill</span>0
<span style="color:red">11</span>
<span></span>James
<div style="color:red">158</div>
<div class="sub">Mary</div>
</span>
XML;
function getStyleProperty($node, $name) {
if (is_array($node)) {
$node = $node[0];
}
if ($node instanceof DOMElement) {
$pattern = sprintf(
'(\b%s:\s*([^;]*)\s*(;|$))', preg_quote($name)
);
if (preg_match($pattern, $node->getAttribute('style'), $matches)) {
return $matches[1];
}
}
return '';
}
$document = new DOMDocument();
$document->loadXml($xml);
$xpath = new DOMXpath($document);
$xpath->registerNamespace('php', 'http://php.net/xpath');
$xpath->registerPHPFunctions(['getStyleProperty']);
foreach ($xpath->evaluate('//span[@id="outer"]')as $outer) {
var_dump(
$xpath->evaluate('string(div[php:function("getStyleProperty", ., "color") = "blue"])', $outer),
$xpath->evaluate('string(span[@class = "main"])', $outer),
$xpath->evaluate('string(span[@class = "main"]/following-sibling::text()[1])', $outer),
$xpath->evaluate('string(span[not(@class or @style)]/following-sibling::node()[1]/self::text())', $outer)
);
}
输出:
string(2) "51"
string(4) "Gill"
string(10) "0
"
string(11) "James
"