使用 XMLReader 查找节点并从当前节点和后续子节点中检索 XML

Use XMLReader to find node and retrieve XML from current node and following children

我正在尝试根据 <id> 元素从一个巨大的 XML 文件中检索一个特定节点。我使用过 DOMDocument,但它并不理想,因为它首先加载整个文档。文档中大约有 1400 <item> 个节点。这是文档的简化版本:

<main>
  <body>
    ...
    <sub>
      ...
      <items>
        ...
        <item>
          <name>Abc</name>
          ...
          <id>123</id>
            <calls>
              <call>
                <name>Monkey</name>
                <text>Monkeys r cool</text>
                ...
              </call>
              <call>
                <name>Pig</name>
                <text>Pigs too!</text>
                ...
              </call>
            </calls>
            <cones>
              <cone>
                <name>Lorem</name>
                <text>Lorem ipsum</text>
                ...
              </cone>
              <cone>
                <name>More</name>
                <text>Placeholder</text>
                ...
              </cone>
            </cones>
          <a>true</a>
        </item>
        <item>
          <name>Def</name>
          ...
          <id>456</id>
            <calls>
              <call>
                <name>aa</name>
                <text>aa</text>
                ...
              </call>
              <call>
                <name>bb</name>
                <text>bb</text>
                ...
              </call>
            </calls>
            <cones>
              <cone>
                <name>cc</name>
                <text>cc</text>
                ...
              </cone>
              <cone>
                <name>dd</name>
                <text>dd</text>
                ...
              </cone>
            </cones>
          <a>true</a>
        </item>
      </items>
    </sub>
  </body>
</main>

所以基本上我试图通过匹配 <id> 元素来检索当前节点及其子节点的数据。我曾尝试在 XMLReader 上查找教程,但似乎找不到那么多。这是我到目前为止尝试过的:

$xml = new XMLReader();
$xml->open('doc.xml');

while($xml->read()) {
    if($xml->nodeType == XMLREADER::ELEMENT && $xml->localName == 'id') {
        $xml->read();
        echo $xml->value;
  }
}

这会找到每个 <id> 元素,但我想找到一个特定的元素并从当前节点及其子节点读取数据。也许使用示例来查找节点和 readInnerXml() 来获取数据

我不是专家,所以非常感谢任何帮助/推动正确的方向:D

如果所有 item 元素都是兄弟元素,您可以使用 XMLReader::read() 查找第一个元素并使用 XMLReader::next() 迭代它们。

然后使用XMLReader::expand()item及其后代加载到DOM,使用Xpath从中读取数据。

$searchForID = '123';

$reader = new XMLReader();
$reader->open('data:text/xml;base64,'.base64_encode(getXMLString()));

$document = new DOMDocument();
$xpath = new DOMXpath($document);

// look for the first "item" element node
while (
  $reader->read() && $reader->localName !== 'item'
) {
  continue;
}

// iterate "item" sibling elements
while ($reader->localName === 'item') {
  // expand into DOM
  $item = $reader->expand($document);
  // if the node has a child "id" with the searched contents
  if ($xpath->evaluate("count(self::*[id = '$searchForID']) > 0", $item)) {
    var_dump(
      [
        // fetch node text content as string  
        'name' => $xpath->evaluate('string(name)', $item),
        // fetch list of "call" elements and map them
        'calls' => array_map(
          function(DOMElement $call) use ($xpath) {
            return [
              'name' => $xpath->evaluate('string(name)', $call),
              'text' => $xpath->evaluate('string(text)', $call)
            ];
          },
          iterator_to_array(
            $xpath->evaluate('calls/call', $item)
          )
        )
      ] 
    );
  }
  $reader->next('item');
}
$reader->close();

XML 具有命名空间

如果 XML 使用命名空间(如您在评论中链接的那个),您将不得不考虑它。

对于 XMLReader,这意味着不仅要验证 localName(没有任何名称空间的节点名称 prefix/alias),还要验证 namespaceURI

对于 DOM 方法,这意味着使用命名空间感知方法(带有后缀 NS)并为 Xpath 表达式注册您自己的 alias/prefix。

$searchForID = '2755';

$reader = new XMLReader();
$reader->open('data:text/xml;base64,'.base64_encode(getXMLString()));

// the namespace uri
$xmlns_siri = 'http://www.siri.org.uk/siri';

$document = new DOMDocument();
$xpath = new DOMXpath($document);
// register an alias for the siri namespace 
$xpath->registerNamespace('siri', $xmlns_siri);

// look for the first "item" element node
while (
  $reader->read() && 
  (
    $reader->localName !== 'EstimatedVehicleJourney' ||
    $reader->namespaceURI !== $xmlns_siri
  )
) {
  continue;
}

// iterate "item" sibling elements
while ($reader->localName === 'EstimatedVehicleJourney') {
    // validate the namespace of the node
  if ($reader->namespaceURI === $xmlns_siri) {
    // expand into DOM
    $item = $reader->expand($document);
    // if the node has a child "VehicleRef" with the searched contents
    // note the use of the registered namespace alias
    if ($xpath->evaluate("count(self::*[siri:VehicleRef = '$searchForID']) > 0", $item)) {
      var_dump(
        [
          // fetch node text content as string  
          'name' => $xpath->evaluate('string(siri:OriginName)', $item),
          // fetch list of "call" elements and map them
          'calls' => array_map(
            function(DOMElement $call) use ($xpath) {
              return [
                'name' => $xpath->evaluate('string(siri:StopPointName)', $call),
                'reference' => $xpath->evaluate('string(siri:StopPointRef)', $call)
              ];
            },
            iterator_to_array(
              $xpath->evaluate('siri:RecordedCalls/siri:RecordedCall', $item)
            )
          )
        ] 
      );
    }
  }
  $reader->next('EstimatedVehicleJourney');
}
$reader->close();