DOM 节点和 XMLreader->expand() 节点之间的 PHP 有什么区别?
What is the difference in PHP between DOM nodes and XMLreader->expand() Nodes?
我重写了一个脚本,该脚本使用 PHP DOM 函数来遍历具有如下结构的 XML 文件:
<file>
<record>
<Source>
<SourcePlace>
<Country>Germany</Country>
</SourcePlace>
</Source>
<Person>
<Name>
<firstname>John</firstname>
<lastname>Doe<lastname>
</Name>
</Person>
</record>
<record>
..
</record>
</file>
我已将其替换为一个脚本,该脚本使用 XMLreader 查找每条单独的记录并将其转换为 DOM 文档,然后对其进行迭代。通过检查节点是否有子节点来完成迭代:
function findLeaves($node) {
echo "nodeType: ".$node->nodeType.", nodeName:". $node->nodeName."\n";
if($node->hasChildNodes() ) {
foreach($node->childNodes as $element) {
findLeaves($element)
}
}
ELSE { <do something with leave> }
}
问题是 findLeaves() 函数的行为在两者之间发生了变化。在 DOM 下,没有值的节点(如 Source)没有#text 子节点。上面的输出将是:
nodeType:1, nodeName:Source
nodeType:1, nodeName:SourcePlace
nodeType:1, nodeName:Country
nodeType:3, nodeName:#text ```
在 XML 读者下变成:
nodeType: 1, nodeName:Source
nodeType: 3, nodeName:#text
nodeType: 1, nodeName:SourcePlace
nodeType: 3, nodeName:#text
nodeType: 1, nodeName:Country
我在进入这个函数之前检查了数据的 saveXML() 结果,但它看起来是一样的,除了一些额外的空格。造成差异的原因可能是什么?
在 DOM 下的 findleaves() 函数之前加载文件的代码:
$xmlDoc = new DOMDocument();
$xmlDoc->preserveWhiteSpace = false;
$xmlDoc->load($file);
$xpath = new DOMXPath($xmlDoc);
$records = $xpath->query('//record');
foreach($records as $record) {
foreach ($xpath->query('.//Source', $record) as $source_record) {
findleaves($source_record);
}
}
在 XMLreader:
下的 findleaves() 函数之前加载文件的代码
$xmlDoc = new XMLReader()
$xmlDoc->open($file)
while ($xmlDoc->read() ) {
if ($xmlDoc->nodeType == XMLReader::ELEMENT && $xmlDoc->name == 'record') {
$record_node = $xmlDoc->expand();
$recordDOM = new DomDocument();
$n = $recordDOM->importNode($record_node,true);
$recordDOM->appendChild($n);document
$recordDOM->preserveWhiteSpace = false;
$xpath = new DOMXPath($recordDOM);
$records = $xpath->query('//record');
foreach($records as $record) {
foreach ($xpath->query('.//Source', $record) as $source_record) {
findleaves($source_record);
}
}
属性 DOMDocument::$preserveWhiteSpace
影响 load/parse 函数。因此,如果您使用 XMLReader::expand()
文档的 属性 无效 - 您不会 load 将 XML 字符串放入其中。
您已经在使用 Xpath。 .//*[not(*) and normalize-space(.) !== ""]
将 select 没有元素子元素且没有任何文本内容(预期为空格)的元素节点。
这里是一个例子(包括其他优化):
$xml = <<<'XML'
<file>
<record>
<Source>
<SourcePlace>
<Country>Germany</Country>
</SourcePlace>
</Source>
<Person>
<Name>
<firstname>John</firstname>
<lastname>Doe</lastname>
</Name>
</Person>
</record>
</file>
XML;
$reader = new XMLReader();
$reader->open('data://text/plain;base64,'.base64_encode($xml));
$document = new DOMDocument();
$xpath = new DOMXpath($document);
// find first record
while ($reader->read() && $reader->localName !== 'record') {
continue;
}
while ($reader->localName === 'record') {
// expand node into prepared document
$record = $reader->expand($document);
// match elements without child elements and empty text content
// ignore text nodes with only white space
$expression = './Source//*[not(*) and normalize-space() != ""]';
foreach ($xpath->evaluate($expression, $record) as $leaf) {
var_dump($leaf->localName, $leaf->textContent);
}
// move to the next record sibling
$reader->next('record');
}
$reader->close();
输出:
string(7) "Country"
string(7) "Germany"
我重写了一个脚本,该脚本使用 PHP DOM 函数来遍历具有如下结构的 XML 文件:
<file>
<record>
<Source>
<SourcePlace>
<Country>Germany</Country>
</SourcePlace>
</Source>
<Person>
<Name>
<firstname>John</firstname>
<lastname>Doe<lastname>
</Name>
</Person>
</record>
<record>
..
</record>
</file>
我已将其替换为一个脚本,该脚本使用 XMLreader 查找每条单独的记录并将其转换为 DOM 文档,然后对其进行迭代。通过检查节点是否有子节点来完成迭代:
function findLeaves($node) {
echo "nodeType: ".$node->nodeType.", nodeName:". $node->nodeName."\n";
if($node->hasChildNodes() ) {
foreach($node->childNodes as $element) {
findLeaves($element)
}
}
ELSE { <do something with leave> }
}
问题是 findLeaves() 函数的行为在两者之间发生了变化。在 DOM 下,没有值的节点(如 Source)没有#text 子节点。上面的输出将是:
nodeType:1, nodeName:Source
nodeType:1, nodeName:SourcePlace
nodeType:1, nodeName:Country
nodeType:3, nodeName:#text ```
在 XML 读者下变成:
nodeType: 1, nodeName:Source
nodeType: 3, nodeName:#text
nodeType: 1, nodeName:SourcePlace
nodeType: 3, nodeName:#text
nodeType: 1, nodeName:Country
我在进入这个函数之前检查了数据的 saveXML() 结果,但它看起来是一样的,除了一些额外的空格。造成差异的原因可能是什么?
在 DOM 下的 findleaves() 函数之前加载文件的代码:
$xmlDoc = new DOMDocument();
$xmlDoc->preserveWhiteSpace = false;
$xmlDoc->load($file);
$xpath = new DOMXPath($xmlDoc);
$records = $xpath->query('//record');
foreach($records as $record) {
foreach ($xpath->query('.//Source', $record) as $source_record) {
findleaves($source_record);
}
}
在 XMLreader:
下的 findleaves() 函数之前加载文件的代码$xmlDoc = new XMLReader()
$xmlDoc->open($file)
while ($xmlDoc->read() ) {
if ($xmlDoc->nodeType == XMLReader::ELEMENT && $xmlDoc->name == 'record') {
$record_node = $xmlDoc->expand();
$recordDOM = new DomDocument();
$n = $recordDOM->importNode($record_node,true);
$recordDOM->appendChild($n);document
$recordDOM->preserveWhiteSpace = false;
$xpath = new DOMXPath($recordDOM);
$records = $xpath->query('//record');
foreach($records as $record) {
foreach ($xpath->query('.//Source', $record) as $source_record) {
findleaves($source_record);
}
}
属性 DOMDocument::$preserveWhiteSpace
影响 load/parse 函数。因此,如果您使用 XMLReader::expand()
文档的 属性 无效 - 您不会 load 将 XML 字符串放入其中。
您已经在使用 Xpath。 .//*[not(*) and normalize-space(.) !== ""]
将 select 没有元素子元素且没有任何文本内容(预期为空格)的元素节点。
这里是一个例子(包括其他优化):
$xml = <<<'XML'
<file>
<record>
<Source>
<SourcePlace>
<Country>Germany</Country>
</SourcePlace>
</Source>
<Person>
<Name>
<firstname>John</firstname>
<lastname>Doe</lastname>
</Name>
</Person>
</record>
</file>
XML;
$reader = new XMLReader();
$reader->open('data://text/plain;base64,'.base64_encode($xml));
$document = new DOMDocument();
$xpath = new DOMXpath($document);
// find first record
while ($reader->read() && $reader->localName !== 'record') {
continue;
}
while ($reader->localName === 'record') {
// expand node into prepared document
$record = $reader->expand($document);
// match elements without child elements and empty text content
// ignore text nodes with only white space
$expression = './Source//*[not(*) and normalize-space() != ""]';
foreach ($xpath->evaluate($expression, $record) as $leaf) {
var_dump($leaf->localName, $leaf->textContent);
}
// move to the next record sibling
$reader->next('record');
}
$reader->close();
输出:
string(7) "Country"
string(7) "Germany"