DOMDocument 缺少 HTML 个标签

Question

我玩一个名为 Tribalwars 的在线游戏，现在正在尝试编写一个报告解析器。典型的报告如下所示：

https://enp2.tribalwars.net/public_report/395cf3cc373a3b8873c20fa018f1aa07

我有两个函数改编自 php.net，现在看起来如下：

function has_child($p)
{
    if ($p->hasChildNodes())
    {
        foreach ($p->childNodes as $c)
        {
            if ($c->nodeType == XML_ELEMENT_NODE)
            {
                return true;
            }
        }
    }
    return false;
}

function show_node($x)
{
    foreach ($x->childNodes as $p)
    {
        if ($this->has_child($p))
        {
            $this->show_node($p);
        }
        elseif ($p->nodeType == XML_ELEMENT_NODE)
        {
            if (trim($p->nodeValue) !== '')
            {
                $temp = explode("\n", $p->nodeValue);
                if (count($temp) == 1)
                {
                    $this->reportdata[] = trim($temp[0]);
                }
                else
                {
                    foreach ($temp as $k => $v)
                    {
                        if (trim($v) !== '')
                        {
                            $this->reportdata[] = trim($v);
                        }
                    }
                }
            }
        }
    }
}

它return的结果格式如下：

Array
(
    [0] => MASHAD (27000) attacks 40-014-Devil...
    [1] => May 11, 2016  19:27:12
    [2] => MASHAD has won
    [3] => Attacker's luck
    ...
    [76] => Espionage
    [77] => Resources scouted:
    [78] => Building
    ...
    [112] => Haul:
    [113] => .
    [114] => .
    [115] => .
    [116] => .
    [117] => .
    ...
    [120] => https://enp2.tribalwars.net/public_report/395...
)

在大多数情况下，这是有效的，但一些数据在解析过程中丢失了。如果您在 link 查看报告，您将看到 "Resources scouted" 和 "Haul" 部分。顺便说一下，这两个部分都包含 <span>。出于某种原因，函数 return 的数组中缺少这两个部分。（参见数组项 77 和数组项 113 - 118）。第 113 - 118 行仅显示格式奇怪的数字的 .，第 77 行什么也没有。

在调用show_node()函数的函数中，我做了一点处理，把不需要的DOM代码扔掉：

$temp = explode('<h1>Publicized report</h1>', $report[0]['reportdata']);
$rep = $temp[1];
$temp = explode('For quick copy and paste', $rep);
$rep = '<report>' . $temp[0] . '</report>';
$x = new DOMDocument();
$x->loadHTML($rep);
$this->show_node($x->getElementsByTagName('report')->item(0));

如果我在调用 show_node() 函数之前执行 $rep 的输出，我需要的 Haul 和 Resources scouted 的信息就会存在。

可能是什么问题？

Answer 1

似乎 DOMDocument 对其进入文档的深度或其他内容有限制。那个或上面的递归代码是错误的。因此，我确定了未被解析的代码段，看到它的格式正确，然后继续删除我不需要的 str_replace() 的子代码，最终得到了我的值大批。不管怎样，这个问题现在已经解决了。

DOMDocument 缺少 HTML 个标签

DOMDocument missing HTML tags

php

domdocument