使用 XMLReader 查找节点并从当前节点和后续子节点中检索 XML
Use XMLReader to find node and retrieve XML from current node and following children
我正在尝试根据 <id>
元素从一个巨大的 XML 文件中检索一个特定节点。我使用过 DOMDocument,但它并不理想,因为它首先加载整个文档。文档中大约有 1400 <item>
个节点。这是文档的简化版本:
<main>
<body>
...
<sub>
...
<items>
...
<item>
<name>Abc</name>
...
<id>123</id>
<calls>
<call>
<name>Monkey</name>
<text>Monkeys r cool</text>
...
</call>
<call>
<name>Pig</name>
<text>Pigs too!</text>
...
</call>
</calls>
<cones>
<cone>
<name>Lorem</name>
<text>Lorem ipsum</text>
...
</cone>
<cone>
<name>More</name>
<text>Placeholder</text>
...
</cone>
</cones>
<a>true</a>
</item>
<item>
<name>Def</name>
...
<id>456</id>
<calls>
<call>
<name>aa</name>
<text>aa</text>
...
</call>
<call>
<name>bb</name>
<text>bb</text>
...
</call>
</calls>
<cones>
<cone>
<name>cc</name>
<text>cc</text>
...
</cone>
<cone>
<name>dd</name>
<text>dd</text>
...
</cone>
</cones>
<a>true</a>
</item>
</items>
</sub>
</body>
</main>
所以基本上我试图通过匹配 <id>
元素来检索当前节点及其子节点的数据。我曾尝试在 XMLReader 上查找教程,但似乎找不到那么多。这是我到目前为止尝试过的:
$xml = new XMLReader();
$xml->open('doc.xml');
while($xml->read()) {
if($xml->nodeType == XMLREADER::ELEMENT && $xml->localName == 'id') {
$xml->read();
echo $xml->value;
}
}
这会找到每个 <id>
元素,但我想找到一个特定的元素并从当前节点及其子节点读取数据。也许使用示例来查找节点和 readInnerXml()
来获取数据
我不是专家,所以非常感谢任何帮助/推动正确的方向:D
如果所有 item
元素都是兄弟元素,您可以使用 XMLReader::read()
查找第一个元素并使用 XMLReader::next()
迭代它们。
然后使用XMLReader::expand()
将item
及其后代加载到DOM,使用Xpath从中读取数据。
$searchForID = '123';
$reader = new XMLReader();
$reader->open('data:text/xml;base64,'.base64_encode(getXMLString()));
$document = new DOMDocument();
$xpath = new DOMXpath($document);
// look for the first "item" element node
while (
$reader->read() && $reader->localName !== 'item'
) {
continue;
}
// iterate "item" sibling elements
while ($reader->localName === 'item') {
// expand into DOM
$item = $reader->expand($document);
// if the node has a child "id" with the searched contents
if ($xpath->evaluate("count(self::*[id = '$searchForID']) > 0", $item)) {
var_dump(
[
// fetch node text content as string
'name' => $xpath->evaluate('string(name)', $item),
// fetch list of "call" elements and map them
'calls' => array_map(
function(DOMElement $call) use ($xpath) {
return [
'name' => $xpath->evaluate('string(name)', $call),
'text' => $xpath->evaluate('string(text)', $call)
];
},
iterator_to_array(
$xpath->evaluate('calls/call', $item)
)
)
]
);
}
$reader->next('item');
}
$reader->close();
XML 具有命名空间
如果 XML 使用命名空间(如您在评论中链接的那个),您将不得不考虑它。
对于 XMLReader,这意味着不仅要验证 localName
(没有任何名称空间的节点名称 prefix/alias),还要验证 namespaceURI
。
对于 DOM 方法,这意味着使用命名空间感知方法(带有后缀 NS)并为 Xpath 表达式注册您自己的 alias/prefix。
$searchForID = '2755';
$reader = new XMLReader();
$reader->open('data:text/xml;base64,'.base64_encode(getXMLString()));
// the namespace uri
$xmlns_siri = 'http://www.siri.org.uk/siri';
$document = new DOMDocument();
$xpath = new DOMXpath($document);
// register an alias for the siri namespace
$xpath->registerNamespace('siri', $xmlns_siri);
// look for the first "item" element node
while (
$reader->read() &&
(
$reader->localName !== 'EstimatedVehicleJourney' ||
$reader->namespaceURI !== $xmlns_siri
)
) {
continue;
}
// iterate "item" sibling elements
while ($reader->localName === 'EstimatedVehicleJourney') {
// validate the namespace of the node
if ($reader->namespaceURI === $xmlns_siri) {
// expand into DOM
$item = $reader->expand($document);
// if the node has a child "VehicleRef" with the searched contents
// note the use of the registered namespace alias
if ($xpath->evaluate("count(self::*[siri:VehicleRef = '$searchForID']) > 0", $item)) {
var_dump(
[
// fetch node text content as string
'name' => $xpath->evaluate('string(siri:OriginName)', $item),
// fetch list of "call" elements and map them
'calls' => array_map(
function(DOMElement $call) use ($xpath) {
return [
'name' => $xpath->evaluate('string(siri:StopPointName)', $call),
'reference' => $xpath->evaluate('string(siri:StopPointRef)', $call)
];
},
iterator_to_array(
$xpath->evaluate('siri:RecordedCalls/siri:RecordedCall', $item)
)
)
]
);
}
}
$reader->next('EstimatedVehicleJourney');
}
$reader->close();
我正在尝试根据 <id>
元素从一个巨大的 XML 文件中检索一个特定节点。我使用过 DOMDocument,但它并不理想,因为它首先加载整个文档。文档中大约有 1400 <item>
个节点。这是文档的简化版本:
<main>
<body>
...
<sub>
...
<items>
...
<item>
<name>Abc</name>
...
<id>123</id>
<calls>
<call>
<name>Monkey</name>
<text>Monkeys r cool</text>
...
</call>
<call>
<name>Pig</name>
<text>Pigs too!</text>
...
</call>
</calls>
<cones>
<cone>
<name>Lorem</name>
<text>Lorem ipsum</text>
...
</cone>
<cone>
<name>More</name>
<text>Placeholder</text>
...
</cone>
</cones>
<a>true</a>
</item>
<item>
<name>Def</name>
...
<id>456</id>
<calls>
<call>
<name>aa</name>
<text>aa</text>
...
</call>
<call>
<name>bb</name>
<text>bb</text>
...
</call>
</calls>
<cones>
<cone>
<name>cc</name>
<text>cc</text>
...
</cone>
<cone>
<name>dd</name>
<text>dd</text>
...
</cone>
</cones>
<a>true</a>
</item>
</items>
</sub>
</body>
</main>
所以基本上我试图通过匹配 <id>
元素来检索当前节点及其子节点的数据。我曾尝试在 XMLReader 上查找教程,但似乎找不到那么多。这是我到目前为止尝试过的:
$xml = new XMLReader();
$xml->open('doc.xml');
while($xml->read()) {
if($xml->nodeType == XMLREADER::ELEMENT && $xml->localName == 'id') {
$xml->read();
echo $xml->value;
}
}
这会找到每个 <id>
元素,但我想找到一个特定的元素并从当前节点及其子节点读取数据。也许使用示例来查找节点和 readInnerXml()
来获取数据
我不是专家,所以非常感谢任何帮助/推动正确的方向:D
如果所有 item
元素都是兄弟元素,您可以使用 XMLReader::read()
查找第一个元素并使用 XMLReader::next()
迭代它们。
然后使用XMLReader::expand()
将item
及其后代加载到DOM,使用Xpath从中读取数据。
$searchForID = '123';
$reader = new XMLReader();
$reader->open('data:text/xml;base64,'.base64_encode(getXMLString()));
$document = new DOMDocument();
$xpath = new DOMXpath($document);
// look for the first "item" element node
while (
$reader->read() && $reader->localName !== 'item'
) {
continue;
}
// iterate "item" sibling elements
while ($reader->localName === 'item') {
// expand into DOM
$item = $reader->expand($document);
// if the node has a child "id" with the searched contents
if ($xpath->evaluate("count(self::*[id = '$searchForID']) > 0", $item)) {
var_dump(
[
// fetch node text content as string
'name' => $xpath->evaluate('string(name)', $item),
// fetch list of "call" elements and map them
'calls' => array_map(
function(DOMElement $call) use ($xpath) {
return [
'name' => $xpath->evaluate('string(name)', $call),
'text' => $xpath->evaluate('string(text)', $call)
];
},
iterator_to_array(
$xpath->evaluate('calls/call', $item)
)
)
]
);
}
$reader->next('item');
}
$reader->close();
XML 具有命名空间
如果 XML 使用命名空间(如您在评论中链接的那个),您将不得不考虑它。
对于 XMLReader,这意味着不仅要验证 localName
(没有任何名称空间的节点名称 prefix/alias),还要验证 namespaceURI
。
对于 DOM 方法,这意味着使用命名空间感知方法(带有后缀 NS)并为 Xpath 表达式注册您自己的 alias/prefix。
$searchForID = '2755';
$reader = new XMLReader();
$reader->open('data:text/xml;base64,'.base64_encode(getXMLString()));
// the namespace uri
$xmlns_siri = 'http://www.siri.org.uk/siri';
$document = new DOMDocument();
$xpath = new DOMXpath($document);
// register an alias for the siri namespace
$xpath->registerNamespace('siri', $xmlns_siri);
// look for the first "item" element node
while (
$reader->read() &&
(
$reader->localName !== 'EstimatedVehicleJourney' ||
$reader->namespaceURI !== $xmlns_siri
)
) {
continue;
}
// iterate "item" sibling elements
while ($reader->localName === 'EstimatedVehicleJourney') {
// validate the namespace of the node
if ($reader->namespaceURI === $xmlns_siri) {
// expand into DOM
$item = $reader->expand($document);
// if the node has a child "VehicleRef" with the searched contents
// note the use of the registered namespace alias
if ($xpath->evaluate("count(self::*[siri:VehicleRef = '$searchForID']) > 0", $item)) {
var_dump(
[
// fetch node text content as string
'name' => $xpath->evaluate('string(siri:OriginName)', $item),
// fetch list of "call" elements and map them
'calls' => array_map(
function(DOMElement $call) use ($xpath) {
return [
'name' => $xpath->evaluate('string(siri:StopPointName)', $call),
'reference' => $xpath->evaluate('string(siri:StopPointRef)', $call)
];
},
iterator_to_array(
$xpath->evaluate('siri:RecordedCalls/siri:RecordedCall', $item)
)
)
]
);
}
}
$reader->next('EstimatedVehicleJourney');
}
$reader->close();