为什么struct little endian中最先打包的数据，其余的都是big endian？

Question

import struct
port = 1331
fragments = [1,2,3,4]
flags = bytes([64])
name = "Hello World"

data = struct.pack('HcHH', port, flags, len(fragments), len(name))

print(int.from_bytes(data[3:5], byteorder='big'))
print(int.from_bytes(data[5:7], byteorder='big'))
print(int.from_bytes(data[0:2], byteorder='little'))

当我像这样打印它们时，它们打印正确。似乎端口是小端，而 len(fragments) 和 len(name) 是大端。如果我也在端口上做大端，它会得到错误的值。

那么为什么结构会这样呢？还是我遗漏了什么？

Answer 1

由于 'H' 中间的 'c'，发生了一些有趣的对齐。用calcsize可以看到：

>>> struct.calcsize('HcHH')
8
>>> struct.calcsize('HHHc')
7

所以你的数据并没有像你想象的那样对齐。正确的解包是：

print(int.from_bytes(data[4:6], byteorder='little'))
# 4
print(int.from_bytes(data[6:], byteorder='little'))
# 11

结果偶然发现'c'的添加字节是'\x00'，使你的字节链在big-endian中正确：

>>> data
b'3\x05@\x00\x04\x00\x0b\x00'
        ^^^^
        this is the intruder

Answer 2

默认情况下，您对 pack 的调用等同于以下内容：

struct.pack('@HcHH', port, flags, len(fragments), len(name))

结果如下所示（使用 '.'.join(f'{x:02X} for x in data') 打印）：

33.05.40.00.04.00.0B.00
 0  1  2  3  4  5  6  7

数字 4 在字节 4 和 5 中编码，采用小端字节序，11 在字节 6 和 7 中编码。字节 3 是填充字节，由 pack 插入以正确对齐以下 shorts 在偶数边界上。

根据 docs:

Note By default, the result of packing a given C struct includes pad bytes in order to maintain proper alignment for the C types involved; similarly, alignment is taken into account when unpacking. This behavior is chosen so that the bytes of a packed struct correspond exactly to the layout in memory of the corresponding C struct. To handle platform-independent data formats or omit implicit pad bytes, use standard size and alignment instead of native size and alignment: see Byte Order, Size, and Alignment for details.

要删除对齐字节并在保持本机字节顺序的同时证明您对字节位置的假设是正确的，请使用

struct.pack('=HcHH', port, flags, len(fragments), len(name))

您也可以使用 < 或 > 作为前缀来使用固定的字节顺序。

"correct" 解决方案是使用 unpack 取回您的号码，这样您就不必担心字节序、填充或其他任何问题，真的。

为什么struct little endian中最先打包的数据，其余的都是big endian？

Why is the first packed data in struct little endian, but the rest is big endian?

python

struct

endianness

python-3.x