给定一个 8 位字节,以字节的二进制表示形式生成 "ones" 的序列和计数(使用最小化查找 table)

Given an 8-bit byte, produce the sequence and count of "ones" in the binary representation of the byte (using a minimized lookup table)

我想使用查找 table 来解决问题。构建 table 后,我希望能够执行以下操作:

# lookup "0110 1001"
byte = 0b01101001
offset = offsets[byte]
count = counts[byte]
sequence = table[offset:offset+count]

print "Sequence for {:08b}: {:d} one(s) at positions".format(byte, count), sequence
>> Sequence for 01101001: 4 one(s) at positions [0, 3, 5, 6]

我目前的努力很简单并且产生了正确的结果,但是 table 没有达到应有的 space 效率。以下代码生成了一个正确的 table,其中包含 1024 个项目。

table = []
offsets = []
counts = []
current_offset = 0

for byte in range(256):
    b = [j for j in range(8) if byte & (1 << j)]
    table += b
    offsets.append(current_offset)
    current_offset += len(b)
    counts.append(len(b))

我知道存在一个更小的 table 因为...

列表 [0, 1, 2, 3, 4, 5, 6, 7] 中的子列表涵盖了 byte 的许多可能值。

我正在寻找的是一种构建最小 table 可能(根据元素数量)的方法。我还应该说 tableoffsetscounts 都需要是列表或序列(没有花哨的 Python 对象),因为我的最终目标是硬编码tables 到微控制器中。

编辑:验证用户 2357112 的回答。 table减少到320个元素!

# given an 8 bit byte in binary, return a list and count of ones.
table = []
offsets = []
counts = []

for byte in range(64):
    table += [0] + [j+1 for j in range(6) if byte & (1 << j)]+ [7]

for byte in range(256):
    b = [j for j in range(8) if byte & (1 << j)]
    counts.append(len(b))
    for i in xrange(len(table)):
        # should be len(table)-len(b), but I want to throw an 
        #IndexError exception if the pattern isn't found
        try:
            if table[i:i+len(b)] == b:
                offsets.append(i)
                break
        except IndexError:
            print b, " not found in table."
        except:
            rasie
# verify
good_count = 0
bad_count = 0
for byte in range(256):
    offset = offsets[byte]
    count = counts[byte]
    sequence = table[offset:offset+count]
    check = 0
    for bit in sequence:
        check = check | (1 << bit)
    if byte != check:
        print "Error: {:08b} != {:08b}".format(byte, check), sequence
        bad_count += 1
    else:
        good_count += 1
print "Good: {:d} Bad: {:d}".format(good_count, bad_count)
>> Good: 256 Bad: 0

如果位顺序很重要,这实际上比乍听起来要容易得多,因为您只需将所有 8 位数字的序列与 MSB 和 LSB 集连接起来(因此所有序列都以 0 开头并以 7).

结尾

要看出这是最优的,首先请注意,无法重叠两个从 0 开始到 7 结束的递增序列。因此,所有此类序列必须在我们的 table 中单独表示。

然后,请注意,这些序列涵盖了我们需要表示的所有其他序列,因为任何不以 0 开头或不以 7 结尾的递增序列都是以 0 开头并以 7 结尾的序列的子序列。


如果 [3, 4, 1] 对于 0b00011010[1, 3, 4] 一样好,那么您可以进一步压缩,但这样做很棘手。我没有找到最佳解决方案的简单方法,但我 beam search。这是 length-81 table beam search 给出的:

[0, 1, 2, 3, 4, 5, 6, 7, 0, 1, 4, 2, 5, 7, 3, 0, 1, 6, 2, 7, 4, 0, 5, 3, 1, 6, 4
, 0, 2, 5, 3, 6, 7, 1, 4, 3, 0, 2, 6, 7, 5, 1, 4, 6, 2, 3, 7, 1, 5, 0, 6, 4, 3,
7, 0, 2, 1, 5, 3, 4, 7, 2, 1, 5, 6, 3, 0, 7, 4, 6, 2, 5, 0, 7, 4, 5, 0, 1, 2, 6,
 3]

这里是我写的快速简单的波束搜索发现:

# increase this to consume more memory in hopes of a better solution
# particularly low values may never find a solution
beam_width = 10000

# initialize the sequence to prevent wasting beam width on symmetric options
# I have no concrete proof that starting the sequence this way is optimal,
# but it's intuitively reasonable and it improved the solution quality.
initialseq = [0, 1, 2, 3, 4, 5, 6, 7]
initialset = set()
for i in range(9):
    for j in range(i, 9):
        initialset.add(frozenset(initialseq[i:j]))

seqs = [initialseq]
sets = [initialset]

while max(map(len, sets)) < 256:
    newseqs = []
    newsets = []
    for i, (seq, set_) in enumerate(zip(seqs, sets)):
        for newelem in range(8):
            newseq = seq + [newelem]
            newset = set_.copy()
            for slicestart in range(-8, 0):
                slice_ = newseq[slicestart:]
                if len(slice_) == len(set(slice_)):
                    newset.add(frozenset(slice_))
            newseqs.append(newseq)
            newsets.append(newset)
    seqs_and_sets = zip(newseqs, newsets)
    seqs_and_sets.sort(key=lambda x: len(x[1]), reverse=True)
    seqs_and_sets = seqs_and_sets[:beam_width]
    seqs, sets = zip(*seqs_and_sets)