试图解析分块传输编码,但它不起作用,我解码的文件完全不可读
Tried to parse chunked transfer encoding,it's not working though, the file which I decoded is totally unreadable
我试图解析由 Rest 中的分块传输编码生成的数据 API ,当我尝试在字符串中打印值时,我确实看到数据具有价值,我认为它应该是工作,但是当我尝试将值分配给文件时,文件完全不可读,下面的代码我使用了 boost 库,我将在代码中详细说明我的想法,我们将从代码的响应部分开始,我不知道我做错了什么
// Send the request.
boost::asio::write(socket, request);
// Read the response status line. The response streambuf will automatically
// grow to accommodate the entire line. The growth may be limited by passing
// a maximum size to the streambuf constructor.
boost::asio::streambuf response;
boost::asio::read_until(socket, response, "\r\n");
// Check that response is OK.
std::istream response_stream(&response);
std::string http_version;
response_stream >> http_version;
unsigned int status_code;
response_stream >> status_code;
std::string status_message;
std::getline(response_stream, status_message);
if (!response_stream || http_version.substr(0, 5) != "HTTP/")
{
//std::cout << "Invalid response\n";
return 9002;
}
if (status_code != 200)
{
//std::cout << "Response returned with status code " << status_code << "\n";
return 9003;
}
// Read the response headers, which are terminated by a blank line.
boost::asio::read_until(socket, response, "\r\n\r\n");
// Process the response headers.
//this portion of code I tried to parse the file name in the header of response which the file name is in the content-disposition of header
std::string header;
std::string fullHeader = "";
string zipfilename="", txtfilename="";
bool foundfilename = false;
while (std::getline(response_stream, header) && header != "\r")
{
fullHeader.append(header).append("\n");
std::transform(header.begin(), header.end(), header.begin(),
[](unsigned char c){ return std::tolower(c); });
string containstr = "content-disposition";
string containstr2 = "filename";
string quotestr = "\"";
if (header.find(containstr) != std::string::npos && header.find(containstr2) != std::string::npos)
{
int countquotes = 0;
bool foundquote = true;
std::size_t startpos = 0, beginpos, endpos;
while (foundquote)
{
std::size_t myfound = header.find(quotestr, startpos);
if (myfound != std::string::npos)
{
if (countquotes % 2 == 0)
beginpos = myfound;
else
{
endpos = myfound;
foundfilename = true;
}
startpos = myfound + 1;
}
else
foundquote = false;
countquotes++;
}
if (endpos > beginpos && foundfilename)
{
size_t zipfileleng = endpos - beginpos;
zipfilename = header.substr(beginpos+1, zipfileleng-1);
txtfilename = header.substr(beginpos+1, zipfileleng-5);
}
else
return 9004;
}
}
if (foundfilename == false || zipfilename.length() == 0 || txtfilename.length() == 0)
return 9005;
//when the zipfilename has been found, we gonna get the data from the body of response, due to the response was chunked transfer encoding, I tried to parse it,it's not complicated due to I saw it on the Wikipedia, it just first line was length of data,the next line was data,and it's the loop which over and over again ,all I tried to do was spliting all the data from the body of response by "\r\n" into a vector<string>, and I gonna read the data from that vector
// Write whatever content we already have to output.
std::string fullResponse = "";
if (response.size() > 0)
{
std::stringstream ss;
ss << &response;
fullResponse = ss.str();
}
//tried split the entire body of response into a vector<string>
vector<string> allresponsedata;
split_regex(allresponsedata, fullResponse, boost::regex("(\r\n)+"));
//tried to merge the data of response
string zipfiledata;
int myindex = 0;
for (auto &x : allresponsedata) {
std::cout << "Split: " << x << std::endl;// I tried to print the data, I did see the value in the variable of x
if (myindex % 2 != 0)
{
zipfiledata = zipfiledata + x;//tried to accumulate the datas
}
myindex++;
}
//tried to write the data into a file
std::ofstream zipfilestream(zipfilename, ios::out | ios::binary);
zipfilestream.write(zipfiledata.c_str(), zipfiledata.length());
zipfilestream.close();
//afterward, the zipfile was built, but it's unreadable which it's not able to open,the zip utlities software says it's a damaged zip file though
我什至尝试过像这样的其他方法,但这种方法效果不佳,VS 说
1 IntelliSense: no instance of overloaded function "boost::asio::read" matches the argument list
argument types are: (boost::asio::ip::tcp::socket, boost::asio::streambuf, boost::asio::detail::transfer_exactly_t, std::error_code)
它只是无法在
的行中编译
size_t n = asio::read(socket, response, asio::transfer_exactly(chunk_bytes_to_read), error);
虽然我已经阅读了 asio::transfer_exactly 的示例,但没有完全像这样的示例 https://www.boost.org/doc/libs/1_57_0/doc/html/boost_asio/reference/transfer_exactly.html
有什么想法吗?
我看你没看对格式:https://en.wikipedia.org/wiki/Chunked_transfer_encoding#Format
在累积完整响应body.
之前,您需要读取块长度(十六进制)和任何可选的块扩展
需要在之前完成,因为你拆分的序列\r\n
很容易出现在块数据中。
再一次,我建议只使用野兽的支持,使一切变得简单
http::response<http::string_body> response;
boost::asio::streambuf buf;
http::read(socket, buf, response);
并且您将 headers 完全解析、解释(包括 Trailer
headers!)并将 response.body()
中的内容作为 std::string
。
即使服务器不使用分块编码或结合不同的编码选项,它也会做正确的事情。
根本没有理由重新发明轮子。
完整演示
这用 https://jigsaw.w3.org/HTTP/ 中的分块编码测试 url 进行了演示:
#include <boost/process.hpp>
#include <boost/beast.hpp>
#include <iostream>
namespace http = boost::beast::http;
using boost::asio::ip::tcp;
int main() {
http::response<http::string_body> response;
boost::asio::io_context ctx;
tcp::socket socket(ctx);
connect(socket, tcp::resolver{ctx}.resolve("jigsaw.w3.org", "http"));
http::write(
socket,
http::request<http::empty_body>(
http::verb::get, "/HTTP/ChunkedScript", 11));
boost::asio::streambuf buf;
http::read(socket, buf, response);
std::cout << response.body() << "\n";
std::cout << "Effective headers are:" << response.base() << "\n";
}
打印
This output will be chunked encoded by the server, if your client is HTTP/1.1
Below this line, is 1000 repeated lines of 0-9.
-------------------------------------------------------------------------
01234567890123456789012345678901234567890123456789012345678901234567890
01234567890123456789012345678901234567890123456789012345678901234567890
...996 lines removed ...
01234567890123456789012345678901234567890123456789012345678901234567890
01234567890123456789012345678901234567890123456789012345678901234567890
Effective headers are:HTTP/1.1 200 OK
cache-control: max-age=0
date: Wed, 31 Mar 2021 20:09:50 GMT
transfer-encoding: chunked
content-type: text/plain
etag: "1j3k6u8:tikt981g"
expires: Wed, 31 Mar 2021 20:09:49 GMT
last-modified: Mon, 18 Mar 2002 14:28:02 GMT
server: Jigsaw/2.3.0-beta3
我试图解析由 Rest 中的分块传输编码生成的数据 API ,当我尝试在字符串中打印值时,我确实看到数据具有价值,我认为它应该是工作,但是当我尝试将值分配给文件时,文件完全不可读,下面的代码我使用了 boost 库,我将在代码中详细说明我的想法,我们将从代码的响应部分开始,我不知道我做错了什么
// Send the request.
boost::asio::write(socket, request);
// Read the response status line. The response streambuf will automatically
// grow to accommodate the entire line. The growth may be limited by passing
// a maximum size to the streambuf constructor.
boost::asio::streambuf response;
boost::asio::read_until(socket, response, "\r\n");
// Check that response is OK.
std::istream response_stream(&response);
std::string http_version;
response_stream >> http_version;
unsigned int status_code;
response_stream >> status_code;
std::string status_message;
std::getline(response_stream, status_message);
if (!response_stream || http_version.substr(0, 5) != "HTTP/")
{
//std::cout << "Invalid response\n";
return 9002;
}
if (status_code != 200)
{
//std::cout << "Response returned with status code " << status_code << "\n";
return 9003;
}
// Read the response headers, which are terminated by a blank line.
boost::asio::read_until(socket, response, "\r\n\r\n");
// Process the response headers.
//this portion of code I tried to parse the file name in the header of response which the file name is in the content-disposition of header
std::string header;
std::string fullHeader = "";
string zipfilename="", txtfilename="";
bool foundfilename = false;
while (std::getline(response_stream, header) && header != "\r")
{
fullHeader.append(header).append("\n");
std::transform(header.begin(), header.end(), header.begin(),
[](unsigned char c){ return std::tolower(c); });
string containstr = "content-disposition";
string containstr2 = "filename";
string quotestr = "\"";
if (header.find(containstr) != std::string::npos && header.find(containstr2) != std::string::npos)
{
int countquotes = 0;
bool foundquote = true;
std::size_t startpos = 0, beginpos, endpos;
while (foundquote)
{
std::size_t myfound = header.find(quotestr, startpos);
if (myfound != std::string::npos)
{
if (countquotes % 2 == 0)
beginpos = myfound;
else
{
endpos = myfound;
foundfilename = true;
}
startpos = myfound + 1;
}
else
foundquote = false;
countquotes++;
}
if (endpos > beginpos && foundfilename)
{
size_t zipfileleng = endpos - beginpos;
zipfilename = header.substr(beginpos+1, zipfileleng-1);
txtfilename = header.substr(beginpos+1, zipfileleng-5);
}
else
return 9004;
}
}
if (foundfilename == false || zipfilename.length() == 0 || txtfilename.length() == 0)
return 9005;
//when the zipfilename has been found, we gonna get the data from the body of response, due to the response was chunked transfer encoding, I tried to parse it,it's not complicated due to I saw it on the Wikipedia, it just first line was length of data,the next line was data,and it's the loop which over and over again ,all I tried to do was spliting all the data from the body of response by "\r\n" into a vector<string>, and I gonna read the data from that vector
// Write whatever content we already have to output.
std::string fullResponse = "";
if (response.size() > 0)
{
std::stringstream ss;
ss << &response;
fullResponse = ss.str();
}
//tried split the entire body of response into a vector<string>
vector<string> allresponsedata;
split_regex(allresponsedata, fullResponse, boost::regex("(\r\n)+"));
//tried to merge the data of response
string zipfiledata;
int myindex = 0;
for (auto &x : allresponsedata) {
std::cout << "Split: " << x << std::endl;// I tried to print the data, I did see the value in the variable of x
if (myindex % 2 != 0)
{
zipfiledata = zipfiledata + x;//tried to accumulate the datas
}
myindex++;
}
//tried to write the data into a file
std::ofstream zipfilestream(zipfilename, ios::out | ios::binary);
zipfilestream.write(zipfiledata.c_str(), zipfiledata.length());
zipfilestream.close();
//afterward, the zipfile was built, but it's unreadable which it's not able to open,the zip utlities software says it's a damaged zip file though
我什至尝试过像这样的其他方法
1 IntelliSense: no instance of overloaded function "boost::asio::read" matches the argument list
argument types are: (boost::asio::ip::tcp::socket, boost::asio::streambuf, boost::asio::detail::transfer_exactly_t, std::error_code)
它只是无法在
的行中编译size_t n = asio::read(socket, response, asio::transfer_exactly(chunk_bytes_to_read), error);
虽然我已经阅读了 asio::transfer_exactly 的示例,但没有完全像这样的示例 https://www.boost.org/doc/libs/1_57_0/doc/html/boost_asio/reference/transfer_exactly.html
有什么想法吗?
我看你没看对格式:https://en.wikipedia.org/wiki/Chunked_transfer_encoding#Format
在累积完整响应body.
之前,您需要读取块长度(十六进制)和任何可选的块扩展需要在之前完成,因为你拆分的序列\r\n
很容易出现在块数据中。
再一次,我建议只使用野兽的支持,使一切变得简单
http::response<http::string_body> response;
boost::asio::streambuf buf;
http::read(socket, buf, response);
并且您将 headers 完全解析、解释(包括 Trailer
headers!)并将 response.body()
中的内容作为 std::string
。
即使服务器不使用分块编码或结合不同的编码选项,它也会做正确的事情。
根本没有理由重新发明轮子。
完整演示
这用 https://jigsaw.w3.org/HTTP/ 中的分块编码测试 url 进行了演示:
#include <boost/process.hpp>
#include <boost/beast.hpp>
#include <iostream>
namespace http = boost::beast::http;
using boost::asio::ip::tcp;
int main() {
http::response<http::string_body> response;
boost::asio::io_context ctx;
tcp::socket socket(ctx);
connect(socket, tcp::resolver{ctx}.resolve("jigsaw.w3.org", "http"));
http::write(
socket,
http::request<http::empty_body>(
http::verb::get, "/HTTP/ChunkedScript", 11));
boost::asio::streambuf buf;
http::read(socket, buf, response);
std::cout << response.body() << "\n";
std::cout << "Effective headers are:" << response.base() << "\n";
}
打印
This output will be chunked encoded by the server, if your client is HTTP/1.1
Below this line, is 1000 repeated lines of 0-9.
-------------------------------------------------------------------------
01234567890123456789012345678901234567890123456789012345678901234567890
01234567890123456789012345678901234567890123456789012345678901234567890
...996 lines removed ...
01234567890123456789012345678901234567890123456789012345678901234567890
01234567890123456789012345678901234567890123456789012345678901234567890
Effective headers are:HTTP/1.1 200 OK
cache-control: max-age=0
date: Wed, 31 Mar 2021 20:09:50 GMT
transfer-encoding: chunked
content-type: text/plain
etag: "1j3k6u8:tikt981g"
expires: Wed, 31 Mar 2021 20:09:49 GMT
last-modified: Mon, 18 Mar 2002 14:28:02 GMT
server: Jigsaw/2.3.0-beta3