我应该如何使用 C++ 模板来解析网络数据包？

Question

假设我有一个不断从套接字接收字节流的应用程序。我有描述数据包外观的文档。例如，总头大小，总负载大小，数据类型对应不同的字节偏移量。我想将其解析为 struct。我能想到的方法是我将声明一个 struct 并通过使用一些编译器宏来禁用填充，可能类似于：

struct Payload
{
   char field1;
   uint32 field2;
   uint32 field3;
   char field5;
} __attribute__((packed));

然后我可以声明一个缓冲区，memcpy 字节到缓冲区，reinterpret_cast 它到我的结构。我能想到的另一种方法是逐个处理字节并将数据填充到 struct 中。我认为任何一个都应该工作，但它有点老派，可能不安全。

提到的reinterpret_cast方法应该是这样的：

void receive(const char*data, std::size_t data_size)
{
    if(data_size == sizeof(payload)
    {
        const Payload* payload = reinterpret_cast<const Payload*>(data);
       // ... further processing ...
    }
}

我想知道对于这种用例是否有更好的方法（更现代的 C++ 风格？更优雅？）？我觉得使用元编程应该有所帮助，但我不知道如何使用它。

任何人都可以分享一些想法吗？或者给我一些相关的参考资料或资源甚至相关的开源代码，以便我可以看看并了解更多有关如何以更优雅的方式解决此类问题的方法。

Answer 1

有许多不同的方法可以解决这个问题。这是一个：

请记住，从网络流中读取结构在语义上与读取单个值相同，两种情况下的操作看起来应该相同。

请注意，根据您发布的内容，我推断您不会处理具有 non-trivial 默认构造函数的类型。如果是这样的话，我的处理方式会有所不同。

在这种方法中，我们：

定义一个 read_into(src&, dst&) 函数，它接收原始字节源以及要填充的对象。
提供所有算术类型的通用实现，适当时从网络字节顺序切换。
为我们的结构重载函数，按照网络上预期的顺序在每个字段上调用 read_into()。

#include <cstdint>
#include <bit>
#include <concepts>
#include <array>
#include <algorithm>

// Use std::byteswap when available. In the meantime, just lift the implementation from 
// https://en.cppreference.com/w/cpp/numeric/byteswap
template<std::integral T>
constexpr T byteswap(T value) noexcept
{
    static_assert(std::has_unique_object_representations_v<T>, "T may not have padding bits");
    auto value_representation = std::bit_cast<std::array<std::byte, sizeof(T)>>(value);
    std::ranges::reverse(value_representation);
    return std::bit_cast<T>(value_representation);
}

template<typename T>
concept DataSource = requires(T& x, char* dst, std::size_t size ) {
  {x.read(dst, size)};
};

// General read implementation for all arithmetic types
template<std::endian network_order = std::endian::big>
void read_into(DataSource auto& src, std::integral auto& dst) {
  src.read(reinterpret_cast<char*>(&dst), sizeof(dst));

  if constexpr (sizeof(dst) > 1 && std::endian::native != network_order) {
    dst = byteswap(dst);
  }
}

struct Payload
{
   char field1;
   std::uint32_t field2;
   std::uint32_t field3;
   char field5;
};

// Read implementation specific to Payload
void read_into(DataSource auto& src, Payload& dst) {
  read_into(src, dst.field1);
  read_into<std::endian::little>(src, dst.field2);
  read_into(src, dst.field3);
  read_into(src, dst.field5);
}

// mind you, nothing stops you from just reading directly into the struct, but beware of endianness issues:
// struct Payload
// {
//    char field1;
//    std::uint32_t field2;
//    std::uint32_t field3;
//    char field5;
// } __attribute__((packed));
// void read_into(DataSource auto& src, Payload& dst) {
//   src.read(reinterpret_cast<char*>(&dst), sizeof(Payload));
// }

// Example
struct some_data_source {
  std::size_t read(char*, std::size_t size);
};

void foo() {
    some_data_source data;

    Payload p;
    read_into(data, p);
}

另一个 API 可能是 dst.field2 = read<std::uint32_t>(src)，它的缺点是需要明确类型，但如果您必须处理 non-trivial 构造函数则更合适.

在 Godbolt 上查看它的实际效果：https://gcc.godbolt.org/z/77rvYE1qn

我应该如何使用 C++ 模板来解析网络数据包？

How should I approach parsing the network packet using C++ template?

c++

performance

network-programming

low-latency

template-meta-programming