在Python中如何解析3个字节的第11位和第12位？

Question

如果我有 3 个字节 b'\x00\x0c\x00'，可以用位 00000000 00001100 00000000 表示，那么我如何最有效地解析第 11 位和第 12 位 11？

这里职位：

             **
00000000 11111110 22222111 tens
87654321 65432109 43210987 ones
|||||||| |||||||| ||||||||
00000000 00001100 00000000
             **

我有以下代码：

bytes_input = b'\x00\x0c\x00'
for byte in bytes_input:
    print(byte, '{:08b}'.format(byte), bin(byte))
bit_position = 11-1
bits_per_byte = 8
floor = bit_position//bits_per_byte
print('floor', floor)
byte = bytes_input[floor]
print('byte', byte, type(byte))
modulo = bit_position%bits_per_byte
print('modulo', modulo)
bits = bin(byte >> modulo & 3)
print('bits', bits, type(bits))

哪个returns:

0 00000000 0b0
12 00001100 0b1100
0 00000000 0b0
floor 1
byte 12 <class 'int'>
modulo 2
bits 0b11 <class 'str'>

是否有计算速度更快的方法来获取不需要我计算底数和模数的信息？

为了把事情放在上下文中，我正在解析这种文件格式： http://pngu.mgh.harvard.edu/~purcell/plink/binary.shtml

2015 年 2 月 1 日更新：

感谢@Dunes 我读到 documentation on from_bytes and found out that I can avoid doing divmod by just doing int.from_bytes with byteorder=small. The final function I adapted into my code 是 fsmall。我无法让 timeit 工作，所以我不确定函数的相对速度。

bytes_input = b'\x00\x0c\x00'
bit_position = 11-1
bpb = bits_per_byte = 8

def foriginal(bytes_input, bit_position):
    floor = bit_position//bpb
    byte = bytes_input[floor]
    modulo = bit_position%bpb
    return byte >> modulo & 0b11

def fdivmod(bytes_input, bit_position):
    div, mod = divmod(bit_position, bpb)
    return bytes_input[div] >> mod & 0b11

def fsmall(bytes_input, bit_position):
    int_bytes = int.from_bytes(bytes_input, byteorder='little')
    shift = bit_position
    bits = int_bytes >> shift & 0b11
    return bits

Answer 1

Is there a computationally faster way for me to get the information that doesn't require me to calculate floor and modulo?

不是真的。但是有divmod().

>>> divmod(10, 8)
(1, 2)

Answer 2

可以在Python中进行二元运算：

将您关心的字节 [s] 转换为整数 - 在这种情况下您只关心中间字节，因此我们将只解析它：

>>> bytes_input = b'\x00\x0c\x00'
>>> middle_byte = bytes_input[1]
>>> middle_byte
12

现在您可以使用 & 运算符执行二进制 AND：

>>> middle_int & 0x0C
12

为了更广泛地扩展这一点，您可以将任意二进制 [string] 转换为其整数值，例如：

>>> int.from_bytes(b'\x00\x0c\x00')
3072

现在您可以再次应用位掩码了：

>>> string_to_int(b'\x00\x0c\x00') & 0x000C00
3072

Answer 3

最快的方法是将字节字符串转换为可以使用位掩码检查的数字类型：

def check(b, checkbits):
    # python2 use ord(bb)
    bits = sum([bb << (8 * (len(b) - i)) for i, bb in enumerate(b,1)])
    mask = sum([2 ** (b-1) for b in checkbits])
    return bits, bits & mask == mask

bytes_input = b'\x00\x0c\x00'
checkbits = (11, 12)
bits, is_set = check(bytes_input, checkbits)
print bits, bin(bits), is_set
3072 0b110000000000 True
%timeit check(bytes_input, checkbits)
100000 loops, best of 3: 3.24 µs per loop

我不确定你的代码的时间安排，因为我无法让它工作。

更新：原来有一个更快的 check() 实现：

 def check2(b, mask):
    bits = 0
    i = 0
    for bb in b[::-1]:
        # python2 use ord(bb)
        bits |= bb << i
        i += 8
    return bits, bits & mask == mask
# we now build the mask directly
# note this is the same as 2**10 | 2**11
mask = (2**11 | 2**12) >> 1
%timeit check2(bytes_input, mask)
1000000 loops, best of 3: 1.82 µs per loop

更新 2：采用 Dunes 的整个事情变成了双线（注意我的测试在 Python 2 中运行，显然比 Dune 的 Python3 慢得多）：

#python2 from_bytes = lambda str: int(str.encode('hex'), 16)
mask = (2**11 | 2**12) >> 1
check = lambda b, mask: int.from_bytes(b) & mask
%timeit check(bytes_input, mask)
100000 loops, best of 3: 2.1 µs per loop

Answer 4

你可以试试：

(int.from_bytes(bytes_input, 'big') >> bit_position) & 0b11

不过似乎并没有更快，只是更简洁。

但是，int.from_bytes(bytes_input, 'big') 是该代码片段中最耗时的部分，其比例为 2 比 1。如果您可以一次将数据从 bytes 转换为 int，则程序的开头，然后你会看到更快的位掩码操作。

In [52]: %timeit n = int.from_bytes(bytes_input, 'big')
1000000 loops, best of 3: 237 ns per loop

In [53]: %timeit n >> bit_position & 0b11
10000000 loops, best of 3: 107 ns per loop

在Python中如何解析3个字节的第11位和第12位？

In Python how do I parse the 11th and 12th bit of 3 bytes?

python

parsing

bit-manipulation

bin

python-3.x