使用 Qt 5.8 中的 QXmlStreamReader 和 MSVC 2015 解析 HTML

Parse HTML with the QXmlStreamReader in Qt 5.8 with MSVC 2015

我尝试从 Qt 中的网页获取一些数据。由于 QWebKit is unmaintained 我想使用 QXmlStreamReader 但我收到某些网页的错误消息。

例如:XML Parse Error "Opening and ending tag mismatch."http://www.google.com

<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>302 Moved</TITLE></HEAD><BODY>
<H1>302 Moved</H1>
The document has moved
<A HREF="http://www.google.de/?gfe_rd=cr&amp;ei=toP_WMrVKoHKXuvxnsAO">here</A>.
</BODY></HTML>

我得到 HTMLHEADmetaTITLE

有效 html 页上的其他错误消息:

这是我的代码:

webpage = new QXmlStreamReader(data);

//emit got_webpage(&QString(data));

QStringList test;

while (!webpage->atEnd() && !webpage->hasError())
{
    QXmlStreamReader::TokenType token = webpage->readNext();

    if (token == QXmlStreamReader::StartDocument)
        continue;

    if (token == QXmlStreamReader::StartElement)
    {
        test << webpage->name().toString();
        /*if (webpage->name() == "H1")
        {
            emit got_webpage(webpage)
        }*/
    }
}

emit got_webpage(&test.join("\n"));

if (webpage->hasError())
{
    // TODO: Error handling...
    qDebug() << "XML Parse Error " << webpage->errorString();
}

webpage->clear();
delete webpage;

顾名思义,QXmlStreamReader就是用来解析XML的。 HTML不是基于XML,所以不能用QXmlStreamReader解析。

也就是说,如果您可以将 HTML 转换为 XHTML, you will be able to parse it with QXmlStreamReader. However, Qt has no built-in method of performing this conversion. It is possible to convert arbitrary HTML to XHTML with 3rd party libraries such as tidylib