试图解析分块传输编码，但它不起作用，我解码的文件完全不可读

Question

我试图解析由 Rest 中的分块传输编码生成的数据 API ，当我尝试在字符串中打印值时，我确实看到数据具有价值，我认为它应该是工作，但是当我尝试将值分配给文件时，文件完全不可读，下面的代码我使用了 boost 库，我将在代码中详细说明我的想法，我们将从代码的响应部分开始，我不知道我做错了什么

   // Send the request.
    boost::asio::write(socket, request);

    // Read the response status line. The response streambuf will automatically
    // grow to accommodate the entire line. The growth may be limited by passing
    // a maximum size to the streambuf constructor.
    boost::asio::streambuf response;
    boost::asio::read_until(socket, response, "\r\n");

    // Check that response is OK.
    std::istream response_stream(&response);
    std::string http_version;
    response_stream >> http_version;
    unsigned int status_code;
    response_stream >> status_code;
    std::string status_message;
    std::getline(response_stream, status_message);
    if (!response_stream || http_version.substr(0, 5) != "HTTP/")
    {
        //std::cout << "Invalid response\n";
        return 9002;
         
    }
    if (status_code != 200)
    {
        //std::cout << "Response returned with status code " << status_code << "\n";
        return 9003;
    }
    
    // Read the response headers, which are terminated by a blank line.
    boost::asio::read_until(socket, response, "\r\n\r\n");

    // Process the response headers.
    //this portion of code I tried to parse the file name in the header of response which the file name is in the  content-disposition of header
    std::string header;
    std::string fullHeader = "";
    string zipfilename="", txtfilename="";
    bool foundfilename = false;
    while (std::getline(response_stream, header) && header != "\r")
    {
        fullHeader.append(header).append("\n");
        std::transform(header.begin(), header.end(), header.begin(),
            [](unsigned char c){ return std::tolower(c); });
        string containstr = "content-disposition";
        string containstr2 = "filename";
        string quotestr = "\"";
        if (header.find(containstr) != std::string::npos && header.find(containstr2) != std::string::npos)
        {
            int countquotes = 0;
            bool foundquote = true;
            
            std::size_t startpos = 0, beginpos, endpos;
            while (foundquote)
            {
                
                std::size_t myfound = header.find(quotestr, startpos);
                if (myfound != std::string::npos)
                {
                    if (countquotes % 2 == 0)
                        beginpos = myfound;
                    else
                    {
                        endpos = myfound;
                        foundfilename = true;
                    }

                    startpos = myfound + 1;
                    
                }
                else
                   foundquote = false;

                countquotes++;
            }

            if (endpos > beginpos && foundfilename)
            {
                size_t zipfileleng = endpos - beginpos;
                zipfilename = header.substr(beginpos+1, zipfileleng-1);
                txtfilename = header.substr(beginpos+1, zipfileleng-5);
            }
            else
                return 9004;

        }
    }

    if (foundfilename == false || zipfilename.length() == 0 || txtfilename.length() == 0)
        return 9005;

     //when the zipfilename has been found, we gonna get the data from the body of response, due to the response was  chunked transfer encoding, I tried to parse it,it's not complicated due to I saw it on the Wikipedia, it just first line was length of data,the next line was data,and it's the loop which over and over again ,all I tried to do was spliting all the data from the body of response by "\r\n" into a vector<string>, and I gonna read the data from that vector

      // Write whatever content we already have to output.
    std::string fullResponse = "";
    if (response.size() > 0)
    {
        std::stringstream ss;
        ss << &response;
        fullResponse = ss.str();
     
    
    }
    //tried split the entire body of response into a vector<string>

     vector<string> allresponsedata;
    split_regex(allresponsedata, fullResponse, boost::regex("(\r\n)+"));
    
    //tried to merge the data of response
    string zipfiledata;
    int myindex = 0;
    for (auto &x : allresponsedata) {
        std::cout << "Split: " << x << std::endl;// I tried to print the data, I did see the value in the variable of x

        if (myindex % 2 != 0)
        {
            zipfiledata = zipfiledata + x;//tried to accumulate the datas
        }


        myindex++;
    }
    
    //tried to write the data into a file
    std::ofstream zipfilestream(zipfilename, ios::out | ios::binary);
    zipfilestream.write(zipfiledata.c_str(), zipfiledata.length());
    zipfilestream.close();

    //afterward, the zipfile was built, but it's unreadable which it's not able to open,the zip utlities software says it's a damaged zip file though

我什至尝试过像这样的其他方法，但这种方法效果不佳，VS 说

  1 IntelliSense: no instance of overloaded function "boost::asio::read" matches the argument list
        argument types are: (boost::asio::ip::tcp::socket, boost::asio::streambuf, boost::asio::detail::transfer_exactly_t, std::error_code)

它只是无法在

的行中编译

size_t n = asio::read(socket, response, asio::transfer_exactly(chunk_bytes_to_read), error);

虽然我已经阅读了 asio::transfer_exactly 的示例，但没有完全像这样的示例 https://www.boost.org/doc/libs/1_57_0/doc/html/boost_asio/reference/transfer_exactly.html

有什么想法吗？

Answer 1

我看你没看对格式：https://en.wikipedia.org/wiki/Chunked_transfer_encoding#Format

在累积完整响应body.

之前，您需要读取块长度（十六进制）和任何可选的块扩展
需要在之前完成，因为你拆分的序列\r\n很容易出现在块数据中。

再一次，我建议只使用野兽的支持，使一切变得简单

http::response<http::string_body> response; boost::asio::streambuf buf; http::read(socket, buf, response);

并且您将 headers 完全解析、解释（包括 Trailer headers！）并将 response.body() 中的内容作为 std::string。

即使服务器不使用分块编码或结合不同的编码选项，它也会做正确的事情。

根本没有理由重新发明轮子。

完整演示

这用 https://jigsaw.w3.org/HTTP/ 中的分块编码测试 url 进行了演示：

#include <boost/process.hpp> #include <boost/beast.hpp> #include <iostream> namespace http = boost::beast::http; using boost::asio::ip::tcp; int main() { http::response<http::string_body> response; boost::asio::io_context ctx; tcp::socket socket(ctx); connect(socket, tcp::resolver{ctx}.resolve("jigsaw.w3.org", "http")); http::write( socket, http::request<http::empty_body>( http::verb::get, "/HTTP/ChunkedScript", 11)); boost::asio::streambuf buf; http::read(socket, buf, response); std::cout << response.body() << "\n"; std::cout << "Effective headers are:" << response.base() << "\n"; }

打印

This output will be chunked encoded by the server, if your client is HTTP/1.1 Below this line, is 1000 repeated lines of 0-9. ------------------------------------------------------------------------- 01234567890123456789012345678901234567890123456789012345678901234567890 01234567890123456789012345678901234567890123456789012345678901234567890 ...996 lines removed ... 01234567890123456789012345678901234567890123456789012345678901234567890 01234567890123456789012345678901234567890123456789012345678901234567890 Effective headers are:HTTP/1.1 200 OK cache-control: max-age=0 date: Wed, 31 Mar 2021 20:09:50 GMT transfer-encoding: chunked content-type: text/plain etag: "1j3k6u8:tikt981g" expires: Wed, 31 Mar 2021 20:09:49 GMT last-modified: Mon, 18 Mar 2002 14:28:02 GMT server: Jigsaw/2.3.0-beta3

试图解析分块传输编码，但它不起作用，我解码的文件完全不可读

Tried to parse chunked transfer encoding,it's not working though, the file which I decoded is totally unreadable

c++

boost

http

boost-asio

chunked-encoding

完整演示