使用位操作确定多热编码的有效性

Determining the validity of a multi-hot encoding using bit manipulation

假设我有 N 个项目和一个表示结果中包含这些项目的二进制数:

N = 4

# items 1 and 3 will be included in the result
vector = 0b0101

# item 2 will be included in the result
vector = 0b0010

我还提供了一个列表冲突,指出哪些项目不能同时包含在结果中:

conflicts = [
  0b0110, # any result that contains items 1 AND 2 is invalid
  0b0111, # any result that contains AT LEAST 2 items from {1, 2, 3} is invalid
]

鉴于此冲突列表,我们可以确定早期 vectors 的有效性:

# invalid as it triggers conflict 1: [0, 1, 1, 1]
vector = 0b0101

# valid as it triggers no conflicts
vector = 0b0010

在此上下文中,如何使用位操作来确定一个向量或大量向量的有效性以防止冲突列表?

提供的解决方案 已经让我们完成了大部分工作,但我不确定如何使其适应整数用例(以完全避免 numpy 数组和 numba)。

N = 4

# items 1 and 3 will be included in the result
vector = 0b0101

# item 2 will be included in the result
vector = 0b0010

conflicts = [
  0b0110, # any result that contains items 1 AND 2 is invalid
  0b0111, # any result that contains AT LEAST 2 items from {1, 2, 3} is invalid
]

def find_conflict(vector, conflicts):
    found_conflict = False
    for v in conflicts:
        result = vector & v # do a logical AND operation
        if result != 0: # there are common elements
            number_of_bits_set = bin(result).count("1") # count number of common elements
            if number_of_bits_set >= 2: # check common limit for detection of invalid vectors
                found_conflict = True
                print(f"..Conflict between {bin(vector)} and {bin(v)}: {bin(result)}")
    if found_conflict:
        print(f"Conflict found for {bin(vector)}.")
    else:
        print(f"No conflict found for {bin(vector)}.")

# invalid as it triggers conflict 1: [0, 1, 1, 1]
vector = 0b0101
find_conflict(vector, conflicts)

# valid as it triggers no conflicts
vector = 0b0010
find_conflict(vector, conflicts)
$ python3 pythontest.py
..Conflict between 0b101 and 0b111: 0b101
Conflict found for 0b101.
No conflict found for 0b10.
$