我需要做什么才能让 Boost.Beast HTTP 解析器找到正文的结尾?

What do I need to do to make Boost.Beast HTTP parser find the end of the body?

我正在尝试使用 boost::beast::http::parser 解析 HTTPS 响应。 我的解析器是这样定义的:

boost::beast::http::parser<false, boost::beast::http::string_body> response_parser;

异步读取的回调是这样的:

void AsyncHttpsRequest::on_response_read(const boost::system::error_code &error_code, uint32_t bytes_transferred)
{
    if (bytes_transferred > 0)
    {
        response_parser.put(boost::asio::buffer(data_buffer, bytes_transferred), http_error_code);
        std::cout << "Parser status: " << http_error_code.message() << std::endl;
        std::cout << "Read " << bytes_transferred << " bytes of HTTPS response" << std::endl;
        std::cout << std::string(data_buffer, bytes_transferred) << std::endl;
    }
    if (error_code)
    {
        std::cout << "Error during HTTPS response read: " << error_code.message() << std::endl;
        callback(error_code, response_parser.get());
    }
    else
    {
        if (response_parser.is_done())
        {
            callback(error_code, response_parser.get());
        }
        else
        {
            std::cout << "Response is not yet finished, reading more" << std::endl;
            read_response();
        }
    }
}

当响应没有正文时一切正常,response_parser.is_done() returns true。但是当响应包含一个正文时它总是 returns false 即使正文被完全读取。 Response 也有一个 Content-Length header 匹配正文中的字节数,所以没有问题。

Boost 文档说 response_parser.is_done() 应该 return true 如果 消息的语义表明需要一个正文,并且整个正文都被解析了。

当我使用 Connection: keep-alive 发送请求时,我一直在读取响应,因为服务器没有任何东西要发送,而且 response_parser 还没有完成。当我使用 Connection: close 时,我的完成回调被调用,但是 boost::beast::http::message parsed 内部没有主体。但是,我登录 stdout 显示有正文并且已被完全读取。

当从正文中读取的字节数等于Content-Length?

你的期望是正确的。

背景、细节和注意事项:

您可以观察到它确实有效:

Live On Coliru

#include <boost/beast/http.hpp>
#include <iostream>
#include <iomanip>
#include <random>
using boost::system::error_code;
namespace http = boost::beast::http;

int main() {
    std::mt19937 prng { std::random_device{}() };
    std::uniform_int_distribution<size_t> packet_size { 1, 372 };

    std::string const response = 
"HTTP/1.1 200 OK\r\n"
"Age: 207498\r\n"
"Cache-Control: max-age=604800\r\n"
"Content-Type: text/html; charset=UTF-8\r\n"
"Date: Sat, 20 Mar 2021 23:24:40 GMT\r\n"
"Etag: \"3147526947+ident\"\r\n"
"Expires: Sat, 27 Mar 2021 23:24:40 GMT\r\n"
"Last-Modified: Thu, 17 Oct 2019 07:18:26 GMT\r\n"
"Server: ECS (bsa/EB15)\r\n"
"Vary: Accept-Encoding\r\n"
"X-Cache: HIT\r\n"
"Content-Length: 1256\r\n"
"\r\n"
"<!doctype html>\n<html>\n<head>\n    <title>Example Domain</title>\n\n    <meta charset=\"utf-8\" />\n    <meta http-equiv=\"Content-type\" content=\"text/html; charset=utf-8\" />\n    <meta name=\"viewport\" content=\"width=device-width, initial-scale=1\" />\n    <style type=\"text/css\">\n    body {\n        background-color: #f0f0f2;\n        margin: 0;\n        padding: 0;\n        font-family: -apple-system, system-ui, BlinkMacSystemFont, \"Segoe UI\", \"Open Sans\", \"Helvetica Neue\", Helvetica, Arial, sans-serif;\n        \n    }\n    div {\n        width: 600px;\n        margin: 5em auto;\n        padding: 2em;\n        background-color: #fdfdff;\n        border-radius: 0.5em;\n        box-shadow: 2px 3px 7px 2px rgba(0,0,0,0.02);\n    }\n    a:link, a:visited {\n        color: #38488f;\n        text-decoration: none;\n    }\n    @media (max-width: 700px) {\n        div {\n            margin: 0 auto;\n            width: auto;\n        }\n    }\n    </style>    \n</head>\n\n<body>\n<div>\n    <h1>Example Domain</h1>\n    <p>This domain is for use in illustrative examples in documents. You may use this\n    domain in literature without prior coordination or asking for permission.</p>\n    <p><a href=\"https://www.iana.org/domains/example\">More information...</a></p>\n</div>\n</body>\n</html>\n";

    std::string const input = response + response;
    std::string_view emulated_stream = input;

    error_code ec;
    while (not emulated_stream.empty()) {
        std::cout << "== Emulated stream of " << emulated_stream.size()
                  << " remaining" << std::endl;

        http::parser<false, http::string_body> response_parser;

        while (not (ec or response_parser.is_done() or emulated_stream.empty())) {
            auto next     = std::min(packet_size(prng), emulated_stream.size());
            auto consumed = response_parser.put(
                boost::asio::buffer(emulated_stream.data(), next), ec);

            std::cout << "Consumed " << consumed << std::boolalpha
                      << "\tHeaders done:" << response_parser.is_header_done()
                      << "\tDone:" << response_parser.is_done()
                      << "\tChunked:" << response_parser.chunked()
                      << "\t" << ec.message() << std::endl;

            if (ec == http::error::need_more)
                ec.clear();

            emulated_stream.remove_prefix(consumed);
        }

        auto res = response_parser.release();

        std::cout << "== Content length " << res["Content-Length"] << " and body "
                  << res.body().length() << std::endl;
        std::cout << "== Headers: " << res.base() << std::endl;
    }

    std::cout << "== Stream depleted " << ec.message() << std::endl;
}

打印例如

== Emulated stream of 3182 remaining
Consumed 101    Headers done:false  Done:false  Chunked:false   need more
Consumed 0  Headers done:false  Done:false  Chunked:false   need more
Consumed 0  Headers done:false  Done:false  Chunked:false   need more
Consumed 0  Headers done:false  Done:false  Chunked:false   need more
Consumed 0  Headers done:false  Done:false  Chunked:false   need more
Consumed 234    Headers done:true   Done:false  Chunked:false   Success
Consumed 305    Headers done:true   Done:false  Chunked:false   Success
Consumed 326    Headers done:true   Done:false  Chunked:false   Success
Consumed 265    Headers done:true   Done:false  Chunked:false   Success
Consumed 216    Headers done:true   Done:false  Chunked:false   Success
Consumed 144    Headers done:true   Done:true   Chunked:false   Success
== Content length 1256 and body 1256
== Headers: HTTP/1.1 200 OK
Age: 207498
Cache-Control: max-age=604800
Content-Type: text/html; charset=UTF-8
Date: Sat, 20 Mar 2021 23:24:40 GMT
Etag: "3147526947+ident"
Expires: Sat, 27 Mar 2021 23:24:40 GMT
Last-Modified: Thu, 17 Oct 2019 07:18:26 GMT
Server: ECS (bsa/EB15)
Vary: Accept-Encoding
X-Cache: HIT
Content-Length: 1256

== Emulated stream of 1591 remaining
Consumed 204    Headers done:false  Done:false  Chunked:false   need more
Consumed 0  Headers done:false  Done:false  Chunked:false   need more
Consumed 0  Headers done:false  Done:false  Chunked:false   need more
Consumed 131    Headers done:true   Done:false  Chunked:false   Success
Consumed 355    Headers done:true   Done:false  Chunked:false   Success
Consumed 137    Headers done:true   Done:false  Chunked:false   Success
Consumed 139    Headers done:true   Done:false  Chunked:false   Success
Consumed 89 Headers done:true   Done:false  Chunked:false   Success
Consumed 87 Headers done:true   Done:false  Chunked:false   Success
Consumed 66 Headers done:true   Done:false  Chunked:false   Success
Consumed 355    Headers done:true   Done:false  Chunked:false   Success
Consumed 28 Headers done:true   Done:true   Chunked:false   Success
== Content length 1256 and body 1256
== Headers: HTTP/1.1 200 OK
Age: 207498
Cache-Control: max-age=604800
Content-Type: text/html; charset=UTF-8
Date: Sat, 20 Mar 2021 23:24:40 GMT
Etag: "3147526947+ident"
Expires: Sat, 27 Mar 2021 23:24:40 GMT
Last-Modified: Thu, 17 Oct 2019 07:18:26 GMT
Server: ECS (bsa/EB15)
Vary: Accept-Encoding
X-Cache: HIT
Content-Length: 1256

== Stream depleted Success

也许

  • 您的流内容实际上不是有效的 HTTP

  • 您的回复根本没有 content-length header。在这种情况下,headers 完成解析后,need_eof() will be true 的值:

    Depending on the contents of the header, the parser may require and end of file notification to know where the end of the body lies. If this function returns true it will be necessary to call put_eof when there will never be additional data from the input.

  • 你的数据包太小了。如果您将数据包大小分布减少到一个极端,您可以看到这种效果:

     std::uniform_int_distribution<size_t> packet_size { 1, 3 };
    

    这将导致任何内容都不会被消费。文档:

    In some cases there may be an insufficient number of octets in the input buffer in order to make forward progress. This is indicated by the code error::need_more. When this happens, the caller should place additional bytes into the buffer sequence and call put again. The error code error::need_more is special. When this error is returned, a subsequent call to put may succeed if the buffers have been updated

    在您的实际代码中,您不会一直尝试少量重试,因为缓冲区只会累积并最终满足取得进展的要求。

另见

奖励:简化!

好消息是您不需要经常使用如此复杂的东西。在大多数情况下,您只能 http::readhttp::async_read 直接进入响应 object.

这将在引擎盖下与解析器一起完成整个过程,而无需您担心细节:

Live On Coliru

boost::beast::flat_buffer buf;
boost::system::error_code ec;
for (http::response<http::string_body> res; !ec && read(pipe, buf, res, ec); res.clear()) {
    std::cout << "== Content length " << res["Content-Length"] << " and body "
              << res.body().length() << std::endl;
    std::cout << "== Headers: " << res.base() << std::endl;
}

std::cout << "== Stream depleted " << ec.message() << "\n" << std::endl;

那是全部。仍然打印:

== Content length 1256 and body 1256
== Headers: HTTP/1.1 200 OK
Age: 207498
Cache-Control: max-age=604800
Content-Type: text/html; charset=UTF-8
Date: Sat, 20 Mar 2021 23:24:40 GMT
Etag: "3147526947+ident"
Expires: Sat, 27 Mar 2021 23:24:40 GMT
Last-Modified: Thu, 17 Oct 2019 07:18:26 GMT
Server: ECS (bsa/EB15)
Vary: Accept-Encoding
X-Cache: HIT
Content-Length: 1256

== Content length 1256 and body 2512
== Headers: HTTP/1.1 200 OK
Age: 207498
Cache-Control: max-age=604800
Content-Type: text/html; charset=UTF-8
Date: Sat, 20 Mar 2021 23:24:40 GMT
Etag: "3147526947+ident"
Expires: Sat, 27 Mar 2021 23:24:40 GMT
Last-Modified: Thu, 17 Oct 2019 07:18:26 GMT
Server: ECS (bsa/EB15)
Vary: Accept-Encoding
X-Cache: HIT
Content-Length: 1256

== Stream depleted end of stream