我需要做什么才能让 Boost.Beast HTTP 解析器找到正文的结尾?
What do I need to do to make Boost.Beast HTTP parser find the end of the body?
我正在尝试使用 boost::beast::http::parser
解析 HTTPS 响应。
我的解析器是这样定义的:
boost::beast::http::parser<false, boost::beast::http::string_body> response_parser;
异步读取的回调是这样的:
void AsyncHttpsRequest::on_response_read(const boost::system::error_code &error_code, uint32_t bytes_transferred)
{
if (bytes_transferred > 0)
{
response_parser.put(boost::asio::buffer(data_buffer, bytes_transferred), http_error_code);
std::cout << "Parser status: " << http_error_code.message() << std::endl;
std::cout << "Read " << bytes_transferred << " bytes of HTTPS response" << std::endl;
std::cout << std::string(data_buffer, bytes_transferred) << std::endl;
}
if (error_code)
{
std::cout << "Error during HTTPS response read: " << error_code.message() << std::endl;
callback(error_code, response_parser.get());
}
else
{
if (response_parser.is_done())
{
callback(error_code, response_parser.get());
}
else
{
std::cout << "Response is not yet finished, reading more" << std::endl;
read_response();
}
}
}
当响应没有正文时一切正常,response_parser.is_done()
returns true
。但是当响应包含一个正文时它总是 returns false
即使正文被完全读取。 Response 也有一个 Content-Length
header 匹配正文中的字节数,所以没有问题。
Boost 文档说 response_parser.is_done()
应该 return true
如果 消息的语义表明需要一个正文,并且整个正文都被解析了。
当我使用 Connection: keep-alive
发送请求时,我一直在读取响应,因为服务器没有任何东西要发送,而且 response_parser
还没有完成。当我使用 Connection: close
时,我的完成回调被调用,但是 boost::beast::http::message
parsed 内部没有主体。但是,我登录 stdout 显示有正文并且已被完全读取。
当从正文中读取的字节数等于Content-Length
?
你的期望是正确的。
背景、细节和注意事项:
您可以观察到它确实有效:
#include <boost/beast/http.hpp>
#include <iostream>
#include <iomanip>
#include <random>
using boost::system::error_code;
namespace http = boost::beast::http;
int main() {
std::mt19937 prng { std::random_device{}() };
std::uniform_int_distribution<size_t> packet_size { 1, 372 };
std::string const response =
"HTTP/1.1 200 OK\r\n"
"Age: 207498\r\n"
"Cache-Control: max-age=604800\r\n"
"Content-Type: text/html; charset=UTF-8\r\n"
"Date: Sat, 20 Mar 2021 23:24:40 GMT\r\n"
"Etag: \"3147526947+ident\"\r\n"
"Expires: Sat, 27 Mar 2021 23:24:40 GMT\r\n"
"Last-Modified: Thu, 17 Oct 2019 07:18:26 GMT\r\n"
"Server: ECS (bsa/EB15)\r\n"
"Vary: Accept-Encoding\r\n"
"X-Cache: HIT\r\n"
"Content-Length: 1256\r\n"
"\r\n"
"<!doctype html>\n<html>\n<head>\n <title>Example Domain</title>\n\n <meta charset=\"utf-8\" />\n <meta http-equiv=\"Content-type\" content=\"text/html; charset=utf-8\" />\n <meta name=\"viewport\" content=\"width=device-width, initial-scale=1\" />\n <style type=\"text/css\">\n body {\n background-color: #f0f0f2;\n margin: 0;\n padding: 0;\n font-family: -apple-system, system-ui, BlinkMacSystemFont, \"Segoe UI\", \"Open Sans\", \"Helvetica Neue\", Helvetica, Arial, sans-serif;\n \n }\n div {\n width: 600px;\n margin: 5em auto;\n padding: 2em;\n background-color: #fdfdff;\n border-radius: 0.5em;\n box-shadow: 2px 3px 7px 2px rgba(0,0,0,0.02);\n }\n a:link, a:visited {\n color: #38488f;\n text-decoration: none;\n }\n @media (max-width: 700px) {\n div {\n margin: 0 auto;\n width: auto;\n }\n }\n </style> \n</head>\n\n<body>\n<div>\n <h1>Example Domain</h1>\n <p>This domain is for use in illustrative examples in documents. You may use this\n domain in literature without prior coordination or asking for permission.</p>\n <p><a href=\"https://www.iana.org/domains/example\">More information...</a></p>\n</div>\n</body>\n</html>\n";
std::string const input = response + response;
std::string_view emulated_stream = input;
error_code ec;
while (not emulated_stream.empty()) {
std::cout << "== Emulated stream of " << emulated_stream.size()
<< " remaining" << std::endl;
http::parser<false, http::string_body> response_parser;
while (not (ec or response_parser.is_done() or emulated_stream.empty())) {
auto next = std::min(packet_size(prng), emulated_stream.size());
auto consumed = response_parser.put(
boost::asio::buffer(emulated_stream.data(), next), ec);
std::cout << "Consumed " << consumed << std::boolalpha
<< "\tHeaders done:" << response_parser.is_header_done()
<< "\tDone:" << response_parser.is_done()
<< "\tChunked:" << response_parser.chunked()
<< "\t" << ec.message() << std::endl;
if (ec == http::error::need_more)
ec.clear();
emulated_stream.remove_prefix(consumed);
}
auto res = response_parser.release();
std::cout << "== Content length " << res["Content-Length"] << " and body "
<< res.body().length() << std::endl;
std::cout << "== Headers: " << res.base() << std::endl;
}
std::cout << "== Stream depleted " << ec.message() << std::endl;
}
打印例如
== Emulated stream of 3182 remaining
Consumed 101 Headers done:false Done:false Chunked:false need more
Consumed 0 Headers done:false Done:false Chunked:false need more
Consumed 0 Headers done:false Done:false Chunked:false need more
Consumed 0 Headers done:false Done:false Chunked:false need more
Consumed 0 Headers done:false Done:false Chunked:false need more
Consumed 234 Headers done:true Done:false Chunked:false Success
Consumed 305 Headers done:true Done:false Chunked:false Success
Consumed 326 Headers done:true Done:false Chunked:false Success
Consumed 265 Headers done:true Done:false Chunked:false Success
Consumed 216 Headers done:true Done:false Chunked:false Success
Consumed 144 Headers done:true Done:true Chunked:false Success
== Content length 1256 and body 1256
== Headers: HTTP/1.1 200 OK
Age: 207498
Cache-Control: max-age=604800
Content-Type: text/html; charset=UTF-8
Date: Sat, 20 Mar 2021 23:24:40 GMT
Etag: "3147526947+ident"
Expires: Sat, 27 Mar 2021 23:24:40 GMT
Last-Modified: Thu, 17 Oct 2019 07:18:26 GMT
Server: ECS (bsa/EB15)
Vary: Accept-Encoding
X-Cache: HIT
Content-Length: 1256
== Emulated stream of 1591 remaining
Consumed 204 Headers done:false Done:false Chunked:false need more
Consumed 0 Headers done:false Done:false Chunked:false need more
Consumed 0 Headers done:false Done:false Chunked:false need more
Consumed 131 Headers done:true Done:false Chunked:false Success
Consumed 355 Headers done:true Done:false Chunked:false Success
Consumed 137 Headers done:true Done:false Chunked:false Success
Consumed 139 Headers done:true Done:false Chunked:false Success
Consumed 89 Headers done:true Done:false Chunked:false Success
Consumed 87 Headers done:true Done:false Chunked:false Success
Consumed 66 Headers done:true Done:false Chunked:false Success
Consumed 355 Headers done:true Done:false Chunked:false Success
Consumed 28 Headers done:true Done:true Chunked:false Success
== Content length 1256 and body 1256
== Headers: HTTP/1.1 200 OK
Age: 207498
Cache-Control: max-age=604800
Content-Type: text/html; charset=UTF-8
Date: Sat, 20 Mar 2021 23:24:40 GMT
Etag: "3147526947+ident"
Expires: Sat, 27 Mar 2021 23:24:40 GMT
Last-Modified: Thu, 17 Oct 2019 07:18:26 GMT
Server: ECS (bsa/EB15)
Vary: Accept-Encoding
X-Cache: HIT
Content-Length: 1256
== Stream depleted Success
也许
您的流内容实际上不是有效的 HTTP
您的回复根本没有 content-length header。在这种情况下,headers 完成解析后,need_eof()
will be true
的值:
Depending on the contents of the header, the parser may require and
end of file notification to know where the end of the body lies. If
this function returns true it will be necessary to call put_eof
when
there will never be additional data from the input.
你的数据包太小了。如果您将数据包大小分布减少到一个极端,您可以看到这种效果:
std::uniform_int_distribution<size_t> packet_size { 1, 3 };
这将导致任何内容都不会被消费。文档:
In some cases there may be an insufficient number of octets in the
input buffer in order to make forward progress. This is indicated by
the code error::need_more
. When this happens, the caller should place
additional bytes into the buffer sequence and call put again. The
error code error::need_more is special. When this error is returned, a
subsequent call to put may succeed if the buffers have been updated
在您的实际代码中,您不会一直尝试少量重试,因为缓冲区只会累积并最终满足取得进展的要求。
另见
- Why does Boost-Beast give me a partial message exception
- How to read data from Internet using muli-threading with connecting only once?
奖励:简化!
好消息是您不需要经常使用如此复杂的东西。在大多数情况下,您只能 http::read
或 http::async_read
直接进入响应 object.
这将在引擎盖下与解析器一起完成整个过程,而无需您担心细节:
boost::beast::flat_buffer buf;
boost::system::error_code ec;
for (http::response<http::string_body> res; !ec && read(pipe, buf, res, ec); res.clear()) {
std::cout << "== Content length " << res["Content-Length"] << " and body "
<< res.body().length() << std::endl;
std::cout << "== Headers: " << res.base() << std::endl;
}
std::cout << "== Stream depleted " << ec.message() << "\n" << std::endl;
那是全部。仍然打印:
== Content length 1256 and body 1256
== Headers: HTTP/1.1 200 OK
Age: 207498
Cache-Control: max-age=604800
Content-Type: text/html; charset=UTF-8
Date: Sat, 20 Mar 2021 23:24:40 GMT
Etag: "3147526947+ident"
Expires: Sat, 27 Mar 2021 23:24:40 GMT
Last-Modified: Thu, 17 Oct 2019 07:18:26 GMT
Server: ECS (bsa/EB15)
Vary: Accept-Encoding
X-Cache: HIT
Content-Length: 1256
== Content length 1256 and body 2512
== Headers: HTTP/1.1 200 OK
Age: 207498
Cache-Control: max-age=604800
Content-Type: text/html; charset=UTF-8
Date: Sat, 20 Mar 2021 23:24:40 GMT
Etag: "3147526947+ident"
Expires: Sat, 27 Mar 2021 23:24:40 GMT
Last-Modified: Thu, 17 Oct 2019 07:18:26 GMT
Server: ECS (bsa/EB15)
Vary: Accept-Encoding
X-Cache: HIT
Content-Length: 1256
== Stream depleted end of stream
我正在尝试使用 boost::beast::http::parser
解析 HTTPS 响应。
我的解析器是这样定义的:
boost::beast::http::parser<false, boost::beast::http::string_body> response_parser;
异步读取的回调是这样的:
void AsyncHttpsRequest::on_response_read(const boost::system::error_code &error_code, uint32_t bytes_transferred)
{
if (bytes_transferred > 0)
{
response_parser.put(boost::asio::buffer(data_buffer, bytes_transferred), http_error_code);
std::cout << "Parser status: " << http_error_code.message() << std::endl;
std::cout << "Read " << bytes_transferred << " bytes of HTTPS response" << std::endl;
std::cout << std::string(data_buffer, bytes_transferred) << std::endl;
}
if (error_code)
{
std::cout << "Error during HTTPS response read: " << error_code.message() << std::endl;
callback(error_code, response_parser.get());
}
else
{
if (response_parser.is_done())
{
callback(error_code, response_parser.get());
}
else
{
std::cout << "Response is not yet finished, reading more" << std::endl;
read_response();
}
}
}
当响应没有正文时一切正常,response_parser.is_done()
returns true
。但是当响应包含一个正文时它总是 returns false
即使正文被完全读取。 Response 也有一个 Content-Length
header 匹配正文中的字节数,所以没有问题。
Boost 文档说 response_parser.is_done()
应该 return true
如果 消息的语义表明需要一个正文,并且整个正文都被解析了。
当我使用 Connection: keep-alive
发送请求时,我一直在读取响应,因为服务器没有任何东西要发送,而且 response_parser
还没有完成。当我使用 Connection: close
时,我的完成回调被调用,但是 boost::beast::http::message
parsed 内部没有主体。但是,我登录 stdout 显示有正文并且已被完全读取。
当从正文中读取的字节数等于Content-Length
?
你的期望是正确的。
背景、细节和注意事项:
您可以观察到它确实有效:
#include <boost/beast/http.hpp>
#include <iostream>
#include <iomanip>
#include <random>
using boost::system::error_code;
namespace http = boost::beast::http;
int main() {
std::mt19937 prng { std::random_device{}() };
std::uniform_int_distribution<size_t> packet_size { 1, 372 };
std::string const response =
"HTTP/1.1 200 OK\r\n"
"Age: 207498\r\n"
"Cache-Control: max-age=604800\r\n"
"Content-Type: text/html; charset=UTF-8\r\n"
"Date: Sat, 20 Mar 2021 23:24:40 GMT\r\n"
"Etag: \"3147526947+ident\"\r\n"
"Expires: Sat, 27 Mar 2021 23:24:40 GMT\r\n"
"Last-Modified: Thu, 17 Oct 2019 07:18:26 GMT\r\n"
"Server: ECS (bsa/EB15)\r\n"
"Vary: Accept-Encoding\r\n"
"X-Cache: HIT\r\n"
"Content-Length: 1256\r\n"
"\r\n"
"<!doctype html>\n<html>\n<head>\n <title>Example Domain</title>\n\n <meta charset=\"utf-8\" />\n <meta http-equiv=\"Content-type\" content=\"text/html; charset=utf-8\" />\n <meta name=\"viewport\" content=\"width=device-width, initial-scale=1\" />\n <style type=\"text/css\">\n body {\n background-color: #f0f0f2;\n margin: 0;\n padding: 0;\n font-family: -apple-system, system-ui, BlinkMacSystemFont, \"Segoe UI\", \"Open Sans\", \"Helvetica Neue\", Helvetica, Arial, sans-serif;\n \n }\n div {\n width: 600px;\n margin: 5em auto;\n padding: 2em;\n background-color: #fdfdff;\n border-radius: 0.5em;\n box-shadow: 2px 3px 7px 2px rgba(0,0,0,0.02);\n }\n a:link, a:visited {\n color: #38488f;\n text-decoration: none;\n }\n @media (max-width: 700px) {\n div {\n margin: 0 auto;\n width: auto;\n }\n }\n </style> \n</head>\n\n<body>\n<div>\n <h1>Example Domain</h1>\n <p>This domain is for use in illustrative examples in documents. You may use this\n domain in literature without prior coordination or asking for permission.</p>\n <p><a href=\"https://www.iana.org/domains/example\">More information...</a></p>\n</div>\n</body>\n</html>\n";
std::string const input = response + response;
std::string_view emulated_stream = input;
error_code ec;
while (not emulated_stream.empty()) {
std::cout << "== Emulated stream of " << emulated_stream.size()
<< " remaining" << std::endl;
http::parser<false, http::string_body> response_parser;
while (not (ec or response_parser.is_done() or emulated_stream.empty())) {
auto next = std::min(packet_size(prng), emulated_stream.size());
auto consumed = response_parser.put(
boost::asio::buffer(emulated_stream.data(), next), ec);
std::cout << "Consumed " << consumed << std::boolalpha
<< "\tHeaders done:" << response_parser.is_header_done()
<< "\tDone:" << response_parser.is_done()
<< "\tChunked:" << response_parser.chunked()
<< "\t" << ec.message() << std::endl;
if (ec == http::error::need_more)
ec.clear();
emulated_stream.remove_prefix(consumed);
}
auto res = response_parser.release();
std::cout << "== Content length " << res["Content-Length"] << " and body "
<< res.body().length() << std::endl;
std::cout << "== Headers: " << res.base() << std::endl;
}
std::cout << "== Stream depleted " << ec.message() << std::endl;
}
打印例如
== Emulated stream of 3182 remaining
Consumed 101 Headers done:false Done:false Chunked:false need more
Consumed 0 Headers done:false Done:false Chunked:false need more
Consumed 0 Headers done:false Done:false Chunked:false need more
Consumed 0 Headers done:false Done:false Chunked:false need more
Consumed 0 Headers done:false Done:false Chunked:false need more
Consumed 234 Headers done:true Done:false Chunked:false Success
Consumed 305 Headers done:true Done:false Chunked:false Success
Consumed 326 Headers done:true Done:false Chunked:false Success
Consumed 265 Headers done:true Done:false Chunked:false Success
Consumed 216 Headers done:true Done:false Chunked:false Success
Consumed 144 Headers done:true Done:true Chunked:false Success
== Content length 1256 and body 1256
== Headers: HTTP/1.1 200 OK
Age: 207498
Cache-Control: max-age=604800
Content-Type: text/html; charset=UTF-8
Date: Sat, 20 Mar 2021 23:24:40 GMT
Etag: "3147526947+ident"
Expires: Sat, 27 Mar 2021 23:24:40 GMT
Last-Modified: Thu, 17 Oct 2019 07:18:26 GMT
Server: ECS (bsa/EB15)
Vary: Accept-Encoding
X-Cache: HIT
Content-Length: 1256
== Emulated stream of 1591 remaining
Consumed 204 Headers done:false Done:false Chunked:false need more
Consumed 0 Headers done:false Done:false Chunked:false need more
Consumed 0 Headers done:false Done:false Chunked:false need more
Consumed 131 Headers done:true Done:false Chunked:false Success
Consumed 355 Headers done:true Done:false Chunked:false Success
Consumed 137 Headers done:true Done:false Chunked:false Success
Consumed 139 Headers done:true Done:false Chunked:false Success
Consumed 89 Headers done:true Done:false Chunked:false Success
Consumed 87 Headers done:true Done:false Chunked:false Success
Consumed 66 Headers done:true Done:false Chunked:false Success
Consumed 355 Headers done:true Done:false Chunked:false Success
Consumed 28 Headers done:true Done:true Chunked:false Success
== Content length 1256 and body 1256
== Headers: HTTP/1.1 200 OK
Age: 207498
Cache-Control: max-age=604800
Content-Type: text/html; charset=UTF-8
Date: Sat, 20 Mar 2021 23:24:40 GMT
Etag: "3147526947+ident"
Expires: Sat, 27 Mar 2021 23:24:40 GMT
Last-Modified: Thu, 17 Oct 2019 07:18:26 GMT
Server: ECS (bsa/EB15)
Vary: Accept-Encoding
X-Cache: HIT
Content-Length: 1256
== Stream depleted Success
也许
您的流内容实际上不是有效的 HTTP
您的回复根本没有 content-length header。在这种情况下,headers 完成解析后,
need_eof()
will betrue
的值:Depending on the contents of the header, the parser may require and end of file notification to know where the end of the body lies. If this function returns true it will be necessary to call
put_eof
when there will never be additional data from the input.你的数据包太小了。如果您将数据包大小分布减少到一个极端,您可以看到这种效果:
std::uniform_int_distribution<size_t> packet_size { 1, 3 };
这将导致任何内容都不会被消费。文档:
In some cases there may be an insufficient number of octets in the input buffer in order to make forward progress. This is indicated by the code
error::need_more
. When this happens, the caller should place additional bytes into the buffer sequence and call put again. The error code error::need_more is special. When this error is returned, a subsequent call to put may succeed if the buffers have been updated在您的实际代码中,您不会一直尝试少量重试,因为缓冲区只会累积并最终满足取得进展的要求。
另见
- Why does Boost-Beast give me a partial message exception
- How to read data from Internet using muli-threading with connecting only once?
奖励:简化!
好消息是您不需要经常使用如此复杂的东西。在大多数情况下,您只能 http::read
或 http::async_read
直接进入响应 object.
这将在引擎盖下与解析器一起完成整个过程,而无需您担心细节:
boost::beast::flat_buffer buf;
boost::system::error_code ec;
for (http::response<http::string_body> res; !ec && read(pipe, buf, res, ec); res.clear()) {
std::cout << "== Content length " << res["Content-Length"] << " and body "
<< res.body().length() << std::endl;
std::cout << "== Headers: " << res.base() << std::endl;
}
std::cout << "== Stream depleted " << ec.message() << "\n" << std::endl;
那是全部。仍然打印:
== Content length 1256 and body 1256
== Headers: HTTP/1.1 200 OK
Age: 207498
Cache-Control: max-age=604800
Content-Type: text/html; charset=UTF-8
Date: Sat, 20 Mar 2021 23:24:40 GMT
Etag: "3147526947+ident"
Expires: Sat, 27 Mar 2021 23:24:40 GMT
Last-Modified: Thu, 17 Oct 2019 07:18:26 GMT
Server: ECS (bsa/EB15)
Vary: Accept-Encoding
X-Cache: HIT
Content-Length: 1256
== Content length 1256 and body 2512
== Headers: HTTP/1.1 200 OK
Age: 207498
Cache-Control: max-age=604800
Content-Type: text/html; charset=UTF-8
Date: Sat, 20 Mar 2021 23:24:40 GMT
Etag: "3147526947+ident"
Expires: Sat, 27 Mar 2021 23:24:40 GMT
Last-Modified: Thu, 17 Oct 2019 07:18:26 GMT
Server: ECS (bsa/EB15)
Vary: Accept-Encoding
X-Cache: HIT
Content-Length: 1256
== Stream depleted end of stream