使用 Qt 5.8 中的 QXmlStreamReader 和 MSVC 2015 解析 HTML
Parse HTML with the QXmlStreamReader in Qt 5.8 with MSVC 2015
我尝试从 Qt 中的网页获取一些数据。由于 QWebKit is unmaintained 我想使用 QXmlStreamReader
但我收到某些网页的错误消息。
例如:XML Parse Error "Opening and ending tag mismatch."
在 http://www.google.com
<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>302 Moved</TITLE></HEAD><BODY>
<H1>302 Moved</H1>
The document has moved
<A HREF="http://www.google.de/?gfe_rd=cr&ei=toP_WMrVKoHKXuvxnsAO">here</A>.
</BODY></HTML>
我得到 HTML
、HEAD
、meta
和 TITLE
。
有效 html 页上的其他错误消息:
- XML 解析错误 "Expected '-' or 'DOCTYPE', but got '[a-zA-Z]'."
- XML 解析错误 "Entity 'raquo' not declared."
这是我的代码:
webpage = new QXmlStreamReader(data);
//emit got_webpage(&QString(data));
QStringList test;
while (!webpage->atEnd() && !webpage->hasError())
{
QXmlStreamReader::TokenType token = webpage->readNext();
if (token == QXmlStreamReader::StartDocument)
continue;
if (token == QXmlStreamReader::StartElement)
{
test << webpage->name().toString();
/*if (webpage->name() == "H1")
{
emit got_webpage(webpage)
}*/
}
}
emit got_webpage(&test.join("\n"));
if (webpage->hasError())
{
// TODO: Error handling...
qDebug() << "XML Parse Error " << webpage->errorString();
}
webpage->clear();
delete webpage;
顾名思义,QXmlStreamReader
就是用来解析XML的。 HTML不是基于XML,所以不能用QXmlStreamReader
解析。
也就是说,如果您可以将 HTML 转换为 XHTML, you will be able to parse it with QXmlStreamReader
. However, Qt has no built-in method of performing this conversion. It is possible to convert arbitrary HTML to XHTML with 3rd party libraries such as tidylib。
我尝试从 Qt 中的网页获取一些数据。由于 QWebKit is unmaintained 我想使用 QXmlStreamReader
但我收到某些网页的错误消息。
例如:XML Parse Error "Opening and ending tag mismatch."
在 http://www.google.com
<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>302 Moved</TITLE></HEAD><BODY>
<H1>302 Moved</H1>
The document has moved
<A HREF="http://www.google.de/?gfe_rd=cr&ei=toP_WMrVKoHKXuvxnsAO">here</A>.
</BODY></HTML>
我得到 HTML
、HEAD
、meta
和 TITLE
。
有效 html 页上的其他错误消息:
- XML 解析错误 "Expected '-' or 'DOCTYPE', but got '[a-zA-Z]'."
- XML 解析错误 "Entity 'raquo' not declared."
这是我的代码:
webpage = new QXmlStreamReader(data);
//emit got_webpage(&QString(data));
QStringList test;
while (!webpage->atEnd() && !webpage->hasError())
{
QXmlStreamReader::TokenType token = webpage->readNext();
if (token == QXmlStreamReader::StartDocument)
continue;
if (token == QXmlStreamReader::StartElement)
{
test << webpage->name().toString();
/*if (webpage->name() == "H1")
{
emit got_webpage(webpage)
}*/
}
}
emit got_webpage(&test.join("\n"));
if (webpage->hasError())
{
// TODO: Error handling...
qDebug() << "XML Parse Error " << webpage->errorString();
}
webpage->clear();
delete webpage;
顾名思义,QXmlStreamReader
就是用来解析XML的。 HTML不是基于XML,所以不能用QXmlStreamReader
解析。
也就是说,如果您可以将 HTML 转换为 XHTML, you will be able to parse it with QXmlStreamReader
. However, Qt has no built-in method of performing this conversion. It is possible to convert arbitrary HTML to XHTML with 3rd party libraries such as tidylib。