为什么我们使用 char* 作为缓冲区，为什么不用 boost::asio 中的字符串？

Question

最近我一直在阅读一本关于使用 boost::asio 进行网络编程的书，据我了解，缓冲区就像程序地址 space 中的任何其他内存 space我们将其分配给套接字，以便我们可以执行 I/O 操作。

首先我不明白的是，为什么我们需要一个单独的东西叫做“缓冲区”？为什么不直接把内容写成一个字符串，然后我们接收的时候把它放到字符串中呢？

第二个我不明白的是，为什么char*或char[]用作缓冲区，为什么不使用int[]，它可以存储所有来的ASCII值通过？毕竟，那只是一段回忆。我觉得我在这里遗漏了一些东西，请帮助我。

第三，为什么C++std::string不能作为缓冲区？每次都必须将它们转换为 C 字符串。

Answer 1

您缺少的是缓冲区的内容实际上可能并不表示字符串，它只是一块可以表示任何内容的内存。这也应该解释为什么 std::string 不应该用作缓冲区。

将 char 用作类型的原因是它（通常）的大小为一个字节，因此缓冲区实际上只是一个字节数组，并且将 char 作为类型可以轻松地对每个字节进行操作该内存的（例如，在缓冲区的特定字节偏移量 into/out 复制内存块等）。

Answer 2

why not just write the content in a string, and then when we receive it put in the string.

因为Boost ASIO是二进制IO的库；不适用于文本 IO。 std::string 用于表示文本。从技术上讲，您可以使用 std::string 作为二进制数据的缓冲区，但这样做会非常规且令人困惑。

why not int[]

因为窄字符类型在C++语言中比较特殊

一般来说，一种类型的对象不能被观察为另一种类型的对象。例如，如果您有一个 short 对象和一个 long long 对象并想通过网络发送它们，您不能将这些对象“观察”为 int 对象的（数组），因为它们不是 int 对象。但是，每个对象都可以被“观察”（通过重新解释）为窄字符对象的数组。这是 char、unsigned char 和 std::byte 的独特之处，这就是它们被用作序列化缓冲区的原因。而且它们的大小正好是一个字节，这是 C++ 内存模型中的基本存储单元。

which can store ASCII value

这在很大程度上与二进制 IO 无关，因为 ASCII 是一种文本编码。使用 16 位（至少；在大多数系统上为 32 位）来表示作为 7 位编码的 ASCII 也是相当浪费的。

Answer 3

我认为这两个答案都给出了反对 string 或 int[] 的论点，但都没有抓住一般要点：

Boost Doesn't Make That Choice For You

换句话说

You Are Free To Use All Of These To Your Taste

演示 Live On Coliru:

#include <boost/asio.hpp>
#include <iostream>

template <typename Buffer>
size_t test_request(Buffer response) {
    using boost::asio::ip::tcp;

    boost::asio::io_context io;
    tcp::socket s(io);
    s.connect({ boost::asio::ip::address_v4{{1,1,1,1}}, 80 }); 

    write(s, boost::asio::buffer("GET / HTTP/1.1\r\n"
         "Host: 1.1.1.1\r\n"
         "Referer: stoackoverflow.com\r\n"
         "\r\n"));

    boost::system::error_code ec;
    auto bytes = read(s, response, ec);
    std::cerr << "test_request: " << ec.message() << " at " << bytes << " bytes\n";

    return bytes;
}

#include <iomanip>
int main() {
    std::string s;
    std::vector<char> vec(4096);
    int ints[1024];

    {
        auto n = test_request(boost::asio::buffer(vec));
        vec.resize(n);
    }

    // or use the ints[]
    test_request(boost::asio::buffer(ints));

    // use a dynamic buffer (that grows):
    test_request(boost::asio::dynamic_buffer(s));

    auto report = [](std::string_view sv) {
        std::cout << sv.length() << " bytes\n"
            << " first: " << std::quoted(sv.substr(0, sv.find_first_of("\r\n"))) << "\n"
            << " last:  " << std::quoted(sv.substr(sv.find_last_of("\r\n", sv.size()-3)+1)) << "\n";
    };

    std::cout << "String response: "; report(s);
    std::cout << "Vector response: "; report({vec.data(), vec.size()});
}

版画

test_request: End of file at 909 bytes
test_request: End of file at 909 bytes
test_request: End of file at 909 bytes
String response: 909 bytes
 first: "HTTP/1.1 301 Moved Permanently"
 last:  "</html>
"
Vector response: 909 bytes
 first: "HTTP/1.1 301 Moved Permanently"
 last:  "</html>
"

总结

重点不是关于文本编码或诸如此类的观点。

重点是

不要为你不使用的东西付费（额外的转化需要分配和成本效益）
非侵入式框架（框架不应该规定你必须使用什么词汇类型）

¹（std::string 适用于 UTF8 或 ASCII7 或二进制数据——它可以很好地处理 NUL 字符）。

为什么我们使用 char* 作为缓冲区，为什么不用 boost::asio 中的字符串？

Why do we use char* as a buffer, why not a string in boost::asio?

c++

buffer

boost-asio

总结